Searching the Web
Finding information on the Web�
What it is not:idle browsing
What it is:purposeful searching

Searching the Web
Web Directories vs. Web Indexes
Spiders and Crawlers
Finding the needle in the haystack � keywords

Directories vs. Indexes
One possible path down the tree:
animal
cat�� dog�� gerbil�� hamster
Collie�� Dachsund�� German Shepherd
Toy�� Miniature�� Full

Directories vs. Indexes
A directory has a hierarchical or tree structure,
which looks like this in Yahoo Directory
http://dir.yahoo.com/�

Slide 5

Directories vs. Indexes
A directory has a hierarchical or tree structure� like a table of contents
It is context-based�meaning that �adjacent� information is related
This offers efficient and effective browsing

Directories vs. Indexes
An index has no inherent structure�other than words, hence it is like, well, an index
It has granularity� meaning a detailed breakdown of where words are on the Web, without context or a sense of surroundings
This offers efficient and effective searching

Directories: Characteristics
Similar to a library or bookstore, with familiar categories, e.g., pets, history

Directories: Characteristics
Similar to a library or bookstore, with familiar categories
Arranged by subject or topic
And then subtopic and sub-subtopic�

Directories: Characteristics
Similar to a library or bookstore, with familiar categories
Arranged by subject or topic
And then subtopic and sub-subtopic�
Uses hyperlinks effectively to move �down� the topics� use your mouse, not your feet!

Directories: Characteristics
Similar to a library or bookstore, with familiar categories
Arranged by subject or topic
And then subtopic and sub-subtopic�
Uses hyperlinks effectively to move �down� the topics�hence well-suited to purposeful browsing

Directories: Characteristics
Context and hyperlinks work together:
Topic:Animals or pets
Subtopic:Dogs
Sub-subtopic:Australian Shepherds
Target information:Finding a breeder, or training, or cost�

Directories: Issues
Because sites/links are chosen by editors,their scope � breadth and depth � is limited
Editing can introduce bias, personal or corporate
Editing can give unbalanced coverage, over- or underemphasizing topics
Currency requires editorial checking of content, link rot, etc.
Some directories charge for �favorable� listings

Directories: Examples
The cream of the crop:Yahoo !� It is a �closed� directory, meaning that its editors are its own employees
Open Directory Project uses unpaid editors and is used by Google (and formerly AltaVista); it is �open�
About.com is a half-open, half-closed hybrid

Indexes: Characteristics
An index is a database�like a dictionary or thesaurus that lists URLs of words and phrases instead of their definitions
It is machine-created, not human-built
Like any database, it is structured for efficient machine use, not human use
Hence, it is ideally suited for searching� and speed!

Indexes: Issues
Because all sites/links are included,their scope � breadth and depth � is unlimited
Financial costs can limit scope/content, e.g., frequency of revisiting pages already indexed
Indexing programs offer no quality review
Requires high user proficiency�

Indexes: Issues
Because all sites/links are included,their scope � breadth and depth � is unlimited
Financial costs can limit scope/content, e.g., frequency of revisiting pages already indexed
Indexing programs offer no quality review
Requires high user proficiency
Text-focused, less useful for images, sound

Indexes: Examples
Google is now the frontrunner
But there may be reasons to use others:selective coverage, ease-of-use, comfort � all of which is driven by past experience� same as preference for a browser
Despite market share of Google, we will also look at AltaVista because of its historical and technological innovations

Indexes: Spiders and Allies
Automatic �spiders� (also robots, crawlers) find Web pages by following hyperlinks
They retrieve some portion of each page (title, first lines, full text)
Indexer adds the results to the database, calculates �relevancy�
Query processor responds to search requests

Keywords: An Overview
In Minerva, you can search fields � title, author, subject, title keywords, subject keywords�but these are like Yahoo! topics� a librarian has chosen them
In Web search engines such as AltaVista and Google, you can search full page content, as represented in the indexed database
This requires a very different skill set�

Slide 21

Keywords: An Overview
Choosing keywords is equivalent to starting at the �bottom� of a directory:
Topic:Animals or pets
Subtopic:Dogs
Sub-subtopic:Australian Shepherds
Target information:Finding a breeder, or training, or cost�

Context vs. Keywords
Topic:Animals or pets
Subtopic:Dogs
Sub-subtopic:Australian Shepherds
Target information:Finding a breeder, or training, or cost�
Directory tree
���������� Index search string