
Remote/Hosted Search Engine Services
by Guest Blogger, 2/17/2002
One of the benefits of the World Wide Web is the ability to navigate among different sources and types of information. The central element of this navigability are search engines, the mechanisms through which users enter certain words or phrases, that return lists of documents containing those terms. The success of search engines rests, in part on the tools themselves, and, in part, on the sources of content.
Search engines consist of a server machine that hosts the actual engine, an indexer, a search form, and the results page. The server sends commands to a small program called a “robot” (also known as a “bot” or “spider”) to periodically look at all the links on defined set of web pages or websites periodically, and collect any documents it finds. Another program, called an “indexer,” then reads every page, storing each and every word into an “index file”. Search engines utilize a form through which a user types a request for documents matching certain words and phrases.
After sending the request to the search engine server, the indexer program checks to see what documents match, and generates page of results for the user to review. Results are usually sorted according to pre-defined preferences, although some engines allow users to either determine the sort order, or to refine the results generated to narrow or include the available pool of content, as well as how it is presented.
Until now, hosting a search engine used to require a significant investment
in time, equipment, administration and maintenance, and programming. In
the past three years, however, there have been a number of remotely hosted
search engine services that allow small- to- mid-sized organizational websites
to incorporate a search engine in a relatively seamless manner, with little
if any administrative burdens.
These remote services are basically web-based services that entail a
registration process of some sort. Once completed, you are then given a
range of options as to what portions of your website to index, how frequently
to update the index, the type of search engine form to use, and how the
results are returned.
Usually when your site (or the portion you designate) is first indexed,
the services provide an initial assessment of missing or broken links,
and explain what types of file extensions it cannot recognize. For example,
many of the basic robots cannot recognize files online that end in
“.rtf” (Rich Text Format), “.ppt” (Microsoft PowerPoint), or various multimedia,
Java and JavaScript, DHTML, ActiveX, image maps, or select metatag files.
Establshing the index can range from a few minutes to a day, depending
upon the speed of the service and the size of your site.
After establishing the index, users can then select how frequently it
is updated, usually daily, weekly, or monthly. The consideration as to
the frequency of updates depends upon how often a website is update. More
frequently updated websites should entail more frequent indexing to keep
search results as current as possible.
After establishing the index parameters, users then can determine the
type of search engine interface to use. Most services provide one or more
of the following:
- search field: a small box in which people can enter words
- simple search: usually a small form that allows for people to search on “all” or “any” of the words entered into it
- advanced search: a more involved forms allowing for phrase searches, sorting of results by date or relevancy, or other features
