Wolfram|Alpha

I've spent a fair amount of time using Wolfram|Alpha recently. Here are my impressions.

The weeks leading up to the launch of Wolfram|Alpha have created incredible amounts of hype surrounding the project. The creator, Stephen Wolfram, has not helped to lessen this hype. Even the project's description sets a rather lofty goal

Wolfram|Alpha's long-term goal is to make all systematic knowledge immediately computable and accessible to everyone.


The comparions to Google were immediate and plentiful. After all, Google describes their goal as making all the world's information searchable and catalogued. However, Alpha is not Google. Here the term computable can nearly be thought of as making all of the data formatted in such a way that the computer can read, evaluate, and manipulate the data. A great example of this is to do something like produce a graph for a data set over time (such as population). Google, on the other hand, is best used as a catalogue that ultimately points you to data. It does very little of it's own analysis other than what is required to build their index.

Alpha is a calculator, only a calculator that can understand natural language, access the relevant data and present it in a form that is easily understood by humans. As such, it's an interesting tool. Alpha's ability to display data on a webpage is particularly impressive. Take a look at what you get when you query Alpha about the International Space Station.

Alpha displaying the current location of the ISS. Alpha displaying the current location of the ISS.

The results are impressive when Alpha knows how to handle your input. In the cases where it doesn't (and there are many) leave much to be desired. For instance, searching for general terms does not yield much beyond basic information, and sometimes that information is only related to the core concept. Consider the next image where I searched for Computer Science. The result is no information other than some queries that Alpha can display results for.

cs Alpha's results for the term "Computer Science."

If you are interested in results that are analytic or mathematical in nature, Alpha might be a really good resource. For encyclopedia like results, Wikipedia still rules the day. Of course, Alpha is not positioning itself to compete with Google or Wikipedia, but I think the comparisons are fair along the lines that all the mentioned services intend to provide information to people seeking information.

 

This brings me to the question, "exactly how useful is Wolfram|Alpha?" I though this might be the type of question that Alpha itself could help me answer. I started by searching for information about the United States' current population.




US Population Results. US Population Results.

These results are good and quite useful. Now, I wanted to get some information about education in the US.

education_usa

Not as much as I'd like to see, and the numbers are a bit out of date. Here I'm a little dissappointed. I'd assume that certainly raw numeric data such as this could be computed upon to reach more conclusions. In my next search, I try something a little more advanced: let's try to calculate the current portion of the population that has obtained a college degree. This should be a simple calculation, given the data. The term for this value is "education attainment," something that I learned through a Google search that lead to the US Census Bureau's page.


edu_attainment

Nothing. Ok, maybe I need to be more verbose and use terms that I know already lead to data. So, I searched for "education usa, population usa."

edu_comparison

What? The calculation isn't even complete. Education enrollment has no result, even though I've already found those numbers before. Even if all the numbers had appeared, no useful calculation or comparison has been done.

The number that I'm actually searching for is 28%. This represents the portion of the population over the age of 25 that has obtained at least Bachelor's degree. This information comes from a Census Bureau page that turned up in a Google search. It would have saved me a lot of time if Alpha had produced this number. Given the data that it has access too (US Census Bureau is listed as one of Alpha's resources), the calculation is fairly straight forward as well.

The reason I'm searching for this number, 28%, is that I believe that the majority of people interested in the types of results Alpha can provide likely have a background in mathematics or science. Most of the impressive results are related to these fields. This is not hard to understand when you realize that "computable" human knowledge is much more likely to come from these fields than say, literature. Compare the results when searching for Poisson distribution to Gulliver's Travels.

Search: "poisson distribution" Search: "poisson distribution"

Search: "Gulliver's Travels" Search: "Gulliver's Travels"

I think you can see the vast difference in the quality of the result returned. Now, back to that 28% number. I'd assume that people with a background in math or science are likely to have a college degree. Of the people with a college degree, only a portion will have studied math and/or science, but let's just keep using the 28% number. What is 28% of the US population? Alpha can answer that.

college_pop

The number: 85.63 million people. I'd conclude that this is a high estimate of the number of people who might ever be interested in the Alpha project. I think the majority of the population might type a query, get a graph, determine this is not what they want and move on.

People who are interested in these types of results will likely want to dig in deeper, as I tried to do above. I think at that point they will become disappointed in Alpha's inability to understand what information they are trying to elicit. Further more, if they are able to obtain just the data they need, I wonder of what use it is other than as a cure for their own curiosity? It is difficult to impossible to determine where, exactly, Alpha got the source information it needed to reach its conclusions. Even if this source information was available, the user still has no idea exactly what equations, algorithms, or process was used on the data. Without this information, I find it difficult to believe that anyone would be able to use Alpha's results in any sort of document or report.

I think a fundamental improvement to Alpha would be to return precise source information that ultimate points to the raw data along with a bit of Mathematica (or other) code that performs the calculation. Then, the user could take this information, verify its validity, modify it as necessary, and incorporate it into their own work. Until this is possible, I'm afraid Alpha will only be used for curiosity's sake.

Despite these criticisms, I find Alpha to be a fascinating project, if only for it's data presentation and cataloguing engine. The project is in an early stage, so perhaps the flaws I see are simply due to this. Perhaps the desired features are already in the pipeline. I hope that they are, because I'm a fan of what Wolfram is trying to do here. If they can improve the project to the point where these points are no longer an issue, I can see Alpha becoming the world changing product what Wolfram would have you believe it is. Until then however, you might be better off searching somewhere else.