Machine Learning
Machine learning is a subproblem of the general field of artificial intelligence. The general idea is to assimilate some information, and use it to make some predictions about the world, react, etc. We as humans do it all the time. The brain can be thought as a giant database, with over 20 years of information with which we can make inferences about what will happen tomorrow and how we get around in the world. How do we do this? We build internal models of the world. Mathematically, it is some function f applied to some number of parameters, x1...xn. As an example, consider the stock market. People are constantly trying to predict what prices will be tomorrow, based on today?s earnings reports, company news, etc.
Information Access
In today?s world, there is way too much disorganized information on the Internet. Over time, some systems have evolved to attempt to structure the information. Xerox PARC came up with the original idea, with its Scatter/Gather system. At time went on, hierarchies were developed to provide richer information. Hierarchical file systems became a part of the OS, Yahoo! started cataloging Internet links, and biological/medical texts were classified by professionals. All of these solutions were implemented to make the information you want more easily accessible.
The New Problem
Although hierarchies greatly improved information access, they were inefficient in that they needed to be manually constructed and maintained. As such, searches through these hierarchies were extremely time-consuming and difficult to use in general.
The Vision
Automating the generation and maintenance of hierarchies would greatly improve the current situation. Using these automatic hierarchies for contextual guidance, a user could easily browse a large collection and dynamically organize query results. Such a system would populate hierarchies automatically, and allow the user to tailor them to suit his/her needs.
A current system that uses automated hierarchies has been developed, called SONIA. SONIA queries a number of search engines, and classifies the pages its found according to subject, after eliminating dead links. The user chooses the number of "clusters" he/she wants to group the results in. For example, a search on "Saturn" returned a number of pages. When the user chose to group them in 3 clusters, SONIA automatically separated pages on the planet, the car, and the video game system. Further refinement is possible (pictures of the planet vs. satellite missions), and new queries can be conducted based on the current hierarchy. Links can be moved from cluster to cluster, in case SONIA made a mistake. The overall result is a system that is relatively easy to use, and provides the user with a greatly increased ability to find relevant and useful information amidst the chaos of the Internet.
Behind the Scenes
Mehran Sahami, a former lecturer in CS106A and CS201, focused his Ph.D. work on this problem. He currently works at Ephiphany, a company he describes as "using machine learning techniques to develop new business tools and help people make intelligent decisions". He suggests getting a Ph.D. to be able to have the time and the knowledge to develop these new "cool" technologies.