Automating information discovery within the invisible web

Edwina Sweeney, Kevin Curran, Ermai Xie

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

A Web crawler or spider crawls through the Web looking for pages to index, and when it locates a new page it passes the page on to an indexer. The indexer identifies links, keywords, and other content and stores these within its database. This database is searched by entering keywords through an interface and suitable Web pages are returned in a results page in the form of hyperlinks accompanied by short descriptions. The Web, however, is increasingly moving away from being a collection of documents to a multidimensional repository for sounds, images, audio, and other formats. This is leading to a situation where certain parts of the Web are invisible or hidden. The term known as the “Deep Web” has emerged to refer to the mass of information that can be accessed via the Web but cannot be indexed by conventional search engines. The concept of the Deep Web makes searches quite complex for search engines. Google states that the claim that conventional search engines cannot find such documents as PDFs, Word, PowerPoint, Excel, or any non-HTML page is not fully accurate and steps have been taken to address this problem by implementing procedures to search items such as academic publications, news, blogs, videos, books, and real-time information. However, Google still only provides access to a fraction of the Deep Web. This chapter explores the Deep Web and the current tools available in accessing it.

Original languageEnglish
Title of host publicationAdvanced Information and Knowledge Processing
PublisherSpringer-Verlag London Ltd
Pages167-181
Number of pages15
DOIs
Publication statusPublished - 2015

Publication series

NameAdvanced Information and Knowledge Processing
Volume46
ISSN (Print)1610-3947
ISSN (Electronic)2197-8441

Fingerprint

Dive into the research topics of 'Automating information discovery within the invisible web'. Together they form a unique fingerprint.

Cite this