The Invisible Web
What is it?
Who is to blame?
Why/where is it?
When will it be visible?

What is the Invisible Web?
�Pages� (text, images, other files, other info) accessible over the Internet with a Web browser that search engines do not include in their indexes, because either
they are technically inaccessible, or
they are excluded by choice.

Who is to blame?
No one and everyone:
Directories don�t care about completeness
Indexes can�t keep up with growth of pages
Webmasters may not welcome spiders
We don�t want to pay for information

Why/Where is it?
Pages are ordinary, but inaccessible.
In the context of the Bow Tie Theory they are �disconnected.�
Hence spiders/crawlers cannot find them.

Why/Where is it?
Pages are ordinary, but excluded by
Webmaster:Robots Exclusion Protocol,
which is implemented by including a file such as www.mysite.com/robots.txt, containing
User-agent:*
Disallow:/

Why/Where is it?
Pages are ordinary, but excluded by
Webmaster:Robots Exclusion Protocol
Webmaster:Robots META tag, which is implemented by putting a line like this in the <head> section of the HTML code:
<meta name=�robots" content=�noindex, nofollow">

Why/Where is it?
Pages are ordinary, but excluded by
Robots Exclusion Protocol or META tag, because
Content changes frequently
Extra load on server
Older content is archived/pay-only
Search engine:
some content is �unworthy�
some content is too deep

Why/Where is it?
Pages are ordinary*, but incomprehensible
images (.gif, .jpg files)
audio (.wav files)
video (.mpg, .mov files)
*definition of �ordinary� in this context:
images display in a browser; audio/video files require a widely-available plug-in (e.g., MS Media Player)

Why/Where is it?
Pages are ordinary, but ephemeral � a faster version of the �newspaper archive� problem
weather data
stock-market data
flight arrival/departure data

Why/Where is it?
Pages are extraordinary though accessible
PDF (Portable Document Format) files Adobe Acrobat Reader [has become �ordinary�]
Postscript files (the choice in computer science)
Flash, Shockwave, dynamic graphics
programs (executables, .exe)
compressed files (.zip, .tar)
Technically, these can be indexed; economically, they cannot.

Why/Where is it?
Pages are extraordinary and not accessible� the Really Invisible Web� at least for now.
Pages that (may) require a sign-in, for example
The New York Times (required by site)
eBay, Amazon.com, Travelocity, TowerRecords (required by visitor)

Why/Where is it?
Pages are extraordinary and not accessible� the Really Invisible Web� at least for now.
Pages that (may) require a sign-in
Data (really �databases�) that must be reached through forms (text boxes, radio buttons, etc.) in a Web page, for example
towerrecords.com
amazon.com� �The Infinite Regress� problem

When will it be visible?
Well, a few years ago, Google didn�t exist.
AltaVista had no images in its index.
Now both offer images as �ordinary.�
Surely, PDF, Postscript, and the like are just around the corner, at least in Internet years.
Can databases be far behind?