Home   |   Guides and Tutorials   |   What's New?   |   Comments   |   About
 

Installation and Configuration: Internet Search Engine

by Tammy Fox
Last Modified: Wednesday, 19-May-2004 11:54:50 EDT

Introduction
    htdig is a webpage search engine licensed under the GNU Public License. It uses a very simple configuration file to allow it to search only the webpages you specify. For example, you can exclude the cgi-bin or a testing directory from the search engine. In addition to installing it on a webserver, some programs use it as a search engine plugin such as Glade, the GTK+ User Interface Builder. In addition, it will create a searchable database of any website. You just supply to URL.
Installing htdig
  1. Download the latest version from the htdig ftp server.
  2. tar -xvfz htdig-3.1.5.tar.gz
  3. cd htdig-3.1.5
  4. ./configure
  5. make
  6. make install
Configuring htdig
    Once you have htdig installed, you must make a few changes to the configuration file and the HTML templates into which the search results are embedded.

Configuration File

    The configuration file for htdig is located at /opt/www/htdig/conf/htdig.conf. It is pretty self-explanitory. The main attributes you need to configure are as follows. It will work if you leave the defaults for the other options or change them if you wish.
Attribute Value Example
start_url URL of your site http://www.mywebsite.com
exclude_urls Directories you do not want searched separated by white spaces /cgi-bin/ /testing/
maintainer Email address of maintainer maintainer@mywebsite.com
search_results_header HTML file to be used as header of search results. Only use this if you don't want to use the default location for the header file: /opt/www/htdig/common/header.html /home/httpd/search/header.html
search_results_footer HTML file to be used as footer of search results. Only use this if you don't want to use the default location for the header file: /opt/www/htdig/common/footer.html /home/httpd/search/footer.html
nothing_found_file HTML file to be displayed if there is no match to search string entered. Only use this if you don't want to use the default location for the header file: /opt/www/htdig/common/nomatch.html /home/httpd/search/nomatch.html
syntax_error_file HTML file to be displayed if there is a syntax error in the search string entered. Only use this if you don't want to use the default location for the header file: /opt/www/htdig/common/syntax.html /home/httpd/search/syntax.html

HTML Templates

    If you don't want to use the default look-and-feel of htdig, you can edit the following files to use the look-and-feel of your website. The paths may be different if you choose to change the paths of them in your configuration file.
  • /opt/www/htdig/common/header.html
  • /opt/www/htdig/common/footer.html
  • /opt/www/htdig/common/nomatch.html
  • /opt/www/htdig/common/syntax.html

Post-installation and configuration
  1. Next, you must setup the search database by running the script /opt/www/htdig/bin/rundig.
  2. Copy the default search.html and images from /opt/www/htdocs/htdig to a directory named htdig off of your webRoot. If the images are not in this directory, they will not appear unless you configure it otherwise it htdig.conf.
  3. Copy /opt/www/cgi-bin/htsearch to the cgi-bin for your webserver.
  4. Test the search engine by opening search.html in your browser and entering a search string.
  5. Because the search engine uses a database to return results, the database must be rebuilt with the rundig command used in step 1 every time any pages are added to the website.
  6. If you want to configure anything else, refer the the htdig website. Pretty much everything is configurable with htdig.

Where to Download
What's Related


All Rights Reserved Linux Headquarters © 2000-2007
Linux is a registered trademark of Linus Torvalds
All logos are registered trademarks of their respective owners
Last modified: Wednesday, May 19, 2004