Remote/Hosted Search Engine Services

One of the benefits of the World Wide Web is the ability to navigate among different sources and types of information. The central element of this navigability are search engines, the mechanisms through which users enter certain words or phrases, that return lists of documents containing those terms. The success of search engines rests, in part on the tools themselves, and, in part, on the sources of content. Search engines consist of a server machine that hosts the actual engine, an indexer, a search form, and the results page. The server sends commands to a small program called a “robot” (also known as a “bot” or “spider”) to periodically look at all the links on defined set of web pages or websites periodically, and collect any documents it finds.  Another program, called an “indexer,” then reads every page, storing each and every word into an “index file”.  Search engines utilize a form through which a user types a request for documents matching certain words and phrases. After sending the request to the search engine server, the indexer program checks to see what documents match, and generates page of results for the user to review. Results are usually sorted according to pre-defined preferences, although some engines allow users to either determine the sort order, or to refine the results generated to narrow or include the available pool of content, as well as how it is presented. Until now, hosting a search engine used to require a significant investment in time, equipment, administration and maintenance, and programming. In the past three years, however, there have been a number of remotely hosted search engine services that allow small- to- mid-sized organizational websites to incorporate a search engine in a relatively seamless manner, with little if any administrative burdens.  These remote services are basically web-based services that entail a registration process of some sort. Once completed, you are then given a range of options as to what portions of your website to index, how frequently to update the index, the type of search engine form to use, and how the results are returned.  Usually when your site (or the portion you designate) is first indexed, the services provide an initial assessment of missing or broken links, and explain what types of file extensions it cannot recognize. For example, many of the basic robots cannot recognize files online  that end in “.rtf” (Rich Text Format), “.ppt” (Microsoft PowerPoint), or various multimedia, Java and JavaScript, DHTML, ActiveX, image maps, or select metatag files. Establshing the index can range from a few minutes to a day, depending upon the speed of the service and the size of your site. After establishing the index, users can then select how frequently it is updated, usually daily, weekly, or monthly. The consideration as to the frequency of updates depends upon how often a website is update. More frequently updated websites should entail more frequent indexing to keep search results as current as possible. After establishing the index parameters, users then can determine the type of search engine interface to use. Most services provide one or more of the following:
  • search field: a small box in which people can enter words
  • simple search: usually a small form that allows for people to search on “all” or “any” of the words entered into it
  • advanced search: a more involved forms allowing for phrase searches, sorting of results by date or relevancy, or other features
More advanced services offer the ability to perform searches in multiple languages or using special characters. Now the best part: you don’t have to do much else to make your search engine work. The actual search engine form is simply HTML code that is generated online, and is then cut/copied and pasted into a home page or other web page. This code can then be modified at a later time. All the work for processing search requests is done on the remot server. Moreover, you can customize the look of the search engine form to position it just about anywhere on any page or pages you choose. You can also determine how the actual search engine results page looks. Other than the search engine from code, you do not need to add any additional pages to your website. As an additional bonus, the services provide a report that tracks what the most frequently requested search terms and/or document are on your site. This is an extremely useful way to track the currency and popularity of your information, and can inform the organization of content. The benefits of these hosting services are twofold: users can access a tool to locate information quickly and efficiently, and organizations can utilize powerful link checking and reporting services to gain a better understanding of how best to organize and present information on their sites. The disadvantages? In order to provide a free service, engine sponsors incorporate the logo (but not always a banner ad) of that engine into the search engine and results page. Also, websites are dependent upon the fickle nature of an outside server. There is also a limit as to how many pages can be indexed, most can only accommodate 5,000 or less pages.  Each service, of course, does offer a for-fee version of their services allowing you a greater range of features, including privacy, customizable search engine dictionaries, and multi-site indexing. The more prominent services include:The more prominent services include: Atomz Has a capacity of 500 pages and can perform weekly and on-demand indexing. You only have to include the logo and no banner ads in your search engine. This engine can be somewhat overwhelming in how much information it finds and lets users search for, but it supports a range of features, including the ability to pre-determine relevance rankings, customizable theasuarus, homonym searching, indexing of Acrobat PDF documents, some 20 language search options, and Dreamweaver connectivity. (http://www.atomz.com) FreeFind This handles an incredble amount of information for a free service, around 32 MB (2,000 pages). It does, however, allow for daily indexing of a site, and provides users with the ability to search either the idexed website, or the World Wide Web from the engine interface. It also automatically generates a site map and a listing of changes or additions to your site each time it is updated, a valuable time-saver for more complex websites, and a good way to draw attention to what’s new. It does not generate an indexing report, contains a limited range of reporting ability on user searches, is limited in customization features of the search result page, can be slow, and generates banner ads on the search result page.   IndexMySite It is a free service with no advertising for the first six months of use, and then switches to a banner-ad supported free service (banners can be remove by paying a one-time $29 fee). It has a 500-page limit. It can generate a site map, as well as reports of unsuccessful searches. It can also let designers copy customizable search options into the body of the search engine itself. It does not, however, generate HTML result pages. This means that users cannot follow any of the links generated in a search result. The service does not automatically schedule an index update, and is a little more cumbersome in its administration. Moreover, if the pages on you site do not contain metatag summaries, blank lines instead of summary text will appear. intraSearch Supports up to 1,000 pages, allows for multiple site indexing, indexing of password-protected pages, and a listing of synonymous search terms. It can also support searches for only for exact matches. It does not allow for scheduled updates of indexing, and is not flexible when it comes search options. It does feature a scripting language to customize search results. MiniSearch This is a banner-ad supported site, that is very easy to set up. It does, however, use JavaScript extensively which might jumble your search results, cannot support coplex searching, does not schedule index updates, and does not make a distinction between matching keywords or text in the search results. PicoSearch Supports 5,000 pages with no ads (as long as you only use the default layout for search results). PicoSearch can index password-protected pages, can support a number of languages for search forms and results, and can index pages with CGI scripts in the URL (for sites running interactive scripts, for example), and does really well with frames on web pages.  It is not as selective as it could be in the search results, the result pages are not that customizable, and index scheduling is not automatic. It has also been reported to search sites that have not been indexed. Paid subscription allows more customization of reporting, layout, automated reindexing, and the ability to index more file types. SearchButton Supports up to 1,000 pages with once-a-month index updates monthly. It provides an advanced search form and search results report form (along with access to the actual report log). Once a month indexing however, may not be frequent enough for most sites, and the level of customization for the search results and result page are low. SiteMiner Can support up to 10,000 pages (with a JavaScript banner ad on the search result pages). Can handle complex web URLs (such as those that redirect users to other URLs). Provides instant search results, and on-demand index updates, and supports searches of exact matches of all search terms. It is a JavaScript heavy search tool, however, and does not support scheduled updates of the index. It also is limited in search result reporting and customization. Webinator Supports up to 5,000 pages with a company logo. It can index image map and other complex web pages, pages matching all search words only, bi-weekly automatic and on-demand index updates. It is the hardest of the lot to administer, does not allow for customization of search results, scheduled updates, or detailed search result reporting. For more information, please review Search Tools Consulting's "Review of Remote Search Engine Services" Links Cited Atomz FreeFind IndexMySite intraSearch MiniSearch PicoSearch SearchButton SiteMiner Webinator
back to Blog