The Future of�
Search
The Web

Improving Search�
The same, only more of it
Faster spiders
Bigger indexes (and directories)
More Boolean operators (like Lexis-Nexis)
The same, but improved functionality
Better query processors
Better translation engines
Faster delivery of Web pages, e.g., Inktomi

Improving Search:Inktomi
Started at UC-Berkeley in 1996 by an engineering professor & one of his graduate students as a faster spider+query processor
Just another Web site, until they realized they could speed up delivery of pages by caching
Caching is storing of frequently-used information�

Inktomi�s secret:Caching
Caching is storing frequently-used information:
on your PC
on a Web server
at a search engine
in a whole corporation
in the �server farm� of an ISP (e.g., AOL)
Inktomi�s real secret: Turning cache into cash
now owned by Yahoo!

Improving Search for Text
Natural-language queries
This is the AskJeeves model on steroids
Applied more narrowly in certain fields, e.g. finance and investing: http://www.iphrase.com
Understand the user � keywords, phrases, sentences, and extended prose including e-mails and IM sessions. But these can be incomplete, obscure, or �dirty� (e.g., misspellings)

Improving Search for Text
Natural-language queries
This is the AskJeeves model on steroids
Applied more narrowly in certain fields, e.g. finance and investing: http://www.iphrase.com
Understand the user
Search on context, not �silo�
Guide the user � dynamically build a page on-the-fly around search terms, not just a list of hits

Improving Search for Text
Natural-language queries
This is the AskJeeves model on steroids
Applied more narrowly in certain fields, e.g. finance and investing:http://www.iphrase.com
Blue Cross/Blue Shield
TD Waterhouse
Lexis-Nexis (!!!)
Motorola
Purchased by IBM announced 11/1/05

Improving Search for Text
Text mining � also called �context analysis�
But to understand Text mining, we need to look at Data Mining

Improving Search for Text
Digression on Data mining
Example:American Express special offers
As computing power and databases improved, AE could compare buying patterns and choose cardholders who might buy, say, a leather calendar
What began as a way to identify a few hundred customers, soon found a few tens, then a few�
And finally, circa 1995, a target population of one!

Improving Search for Text
Text mining � analogous to Data mining
Compare �word patterns� instead of buying patterns
Examine unstructured data (outside DBs) such as email, corporate portal (Web) pages, help files, Word documents, Excel spreadsheets, etc.
Identify patterns, hence information and knowledge, that the company didn�t know it knew!
One of the tools of �knowledge management�

Improving Search for Text
Text mining the Web ????
The Web is a giant collection of unstructured pages
Could text mining find knowledge that the Web (and directories and indexes) doesn�t know it has?
This would be the equivalent of asking, �What can you tell me about this topic that I don�t know to ask?�
Would it be done within an index?Or by an outside user with a text mining application program?

Improving Search for Text
Text mining the Web ????
There are no answers at the moment, but there are some prominent companies with software that might be players:
ClearForest ClearTags (www.clearforest.com)
Entrieva�s SemioMap (www.entrieva.com)
Inxight�s Categorizer (www.inxight.com), which can also work with data�

Improving Search for Data
Structured information � some of the �stuff� that neither a typical search engine nor you, as a typical searcher, is equipped to handle�
Financial reports (public-domain)
Sports statistics
Web-based tools could give access to these kinds of data that is currently only possible for users who know highly specialized query languages.

Improving Search for Images
Remember:current search capability, such as is offered by AltaVista and Google, is still largely based on text associated with the images:
Filenames, e.g., mydog.jpg
�alt� attributes within an image tag
The text on a Web page near where the image is displayed.

Improving Search for Images
Wait a minute!Why not just turn loose all this computing power we have?Let some server somewhere decide what the various images are.
Easier said than done, because unfortunately�

Improving Search for Images
A computer cannot distinguish between this:
And this:
although it is obvious to us.

Improving Search for Images
Technologies to tackle this problem have been around for some time and at least one, from Virage, was available on AltaVista in 1996.It allowed the user to vary and try to match:
Color � general color impression
Composition � spatial arrangement of color
Texture � �patterns,� e.g., wood, granite, clouds
Structure � shape of objects in the image

The Virage Image Engine

Improving Search for Images
The problem to be solved is that image recognition is visual, not semantic.Yet the thing we can do best on the Web is word-based searching.
Said differently: We still cannot distinguish an orange on a green tablecloth from a basketball on green grass.
But the problem is under attack�

Improving Search for Images
Three of the current players in the field are:
Pixlogic (www.pixlogic.com) - object-level info
BioImagene (www.bioimagene.com) - content
Visfinity (www.visfinity.com) - image management

Connections and Context
Four alternatives to conventional search:
Alexa (www.alexa.com)
Intelligent agents
Mapuccino (from IBM)
TouchGraph � www.touchgraph.com -- visualization of results from Google and Amazon