What is it? | |
Who is to blame? | |
Why/where is it? | |
When will it be visible? |
�Pages� (text, images, other files, other info) accessible over the Internet with a Web browser that search engines do not include in their indexes, because either | ||
they are technically inaccessible, or | ||
they are excluded by choice. |
No one and everyone: | ||
Directories don�t care about completeness | ||
Indexes can�t keep up with growth of pages | ||
Webmasters may not welcome spiders | ||
We don�t want to pay for information |
Pages are ordinary, but inaccessible. | ||
�In the context of the Bow Tie Theory they are �disconnected.� | ||
Hence spiders/crawlers cannot find them. |
Pages are ordinary, but excluded by | ||
Webmaster:� Robots Exclusion Protocol, | ||
which is implemented by including a file such as www.mysite.com/robots.txt, containing | ||
User-agent:� * | ||
Disallow:� / |
Pages are ordinary, but excluded by | ||
Webmaster:� Robots Exclusion Protocol | ||
Webmaster:� Robots META tag, which is implemented by putting a line like this in the <head> section of the HTML code: | ||
<meta name=�robots" content=�noindex, nofollow"> |
Pages are ordinary, but excluded by | |||
Robots Exclusion Protocol or META tag, because | |||
Content changes frequently | |||
Extra load on server | |||
Older content is archived/pay-only | |||
Search engine: | |||
some content is �unworthy� | |||
some content is too deep |
Pages are ordinary*, but incomprehensible | ||
images (.gif, .jpg files) | ||
audio (.wav files) | ||
video (.mpg, .mov files) | ||
*definition of �ordinary� in this context: | ||
images display in a browser; audio/video files require a widely-available plug-in (e.g., MS Media Player) |
Pages are ordinary, but ephemeral � a faster version of the �newspaper archive� problem | ||
weather data | ||
stock-market data | ||
flight arrival/departure data |
Pages are extraordinary though accessible | ||
PDF (Portable Document Format) files � Adobe Acrobat Reader [has become �ordinary�] | ||
Postscript files (the choice in computer science) | ||
Flash, Shockwave, dynamic graphics | ||
programs (executables, .exe) | ||
compressed files (.zip, .tar) | ||
Technically, these can be indexed; economically, they cannot. |
Pages are extraordinary and not accessible� the Really Invisible Web� at least for now. | |||
Pages that (may) require a sign-in, for example | |||
The New York Times (required by site) | |||
eBay, Amazon.com, Travelocity, TowerRecords (required by visitor) |
Pages are extraordinary and not accessible� the Really Invisible Web� at least for now. | |||
Pages that (may) require a sign-in | |||
Data (really �databases�) that must be reached through forms (text boxes, radio buttons, etc.) in a Web page, for example | |||
towerrecords.com | |||
amazon.com� �The Infinite Regress� problem |
Well, a few years ago, Google didn�t exist. | |
AltaVista had no images in its index. | |
Now both offer images as �ordinary.� | |
Surely, PDF, Postscript, and the like are just around the corner, at least in Internet years. | |
Can databases be far behind? |