Sample from Udmurtia search engine
We would like people to send us
various indexer.conf and search.htm files.
We will publish it here and include in distribution.
Running indexer
Just run indexer once a week (a day, a hour ...) to find the latest
modifications in your web sites. indexer will reindex expired
documents:
sh$ indexer
If you want to reindex all documents (irrelevant if it is expired or not),
please use -a option. indexer also have -t, -u, -s
options to limit indexing to only a part of the database.
To clear the whole database, use indexer -C. You can also clear
database only partially by using -t, -u, -s options.
Run indexer -S, to view database statistics,
including total and expired documents count for each status.
-t, -u, -s filters can be used in this mode as well.
The meaning of status is:
- 0 - new (not indexed yet) URL
If status is not 0, then it is HTTP response code.
Some of HTTP codes are here:
504 - "Gateway Timeout" (read timeout when retrieving document)
If mnoGoSearch founds URL with HTTP 301,302,303 code it will index URL
given in Location: http://www.somewhere.com"
field of HTTP-header instead. This feature is called redirection.
HTTP 401 means that this URL is password protected. You can
use AuthBasic command in indexer.conf to set
login:password for this URL or URLs.
HTTP 404 means that you have incorrect reference in you document
(reference to resource that does not exist). Check referrer field
in url table. You can also check such referrers by indexer -I -s 404.
If you have bad connection with HTTP server,
you can run several indexer processes simultaneously
with the same indexer.conf file.
We have successfully tested 30 simultaneous indexer processes.
Notes for several indexer processes at the same time:
- You can run several indexer processes with different
configuration files on different MySQL databases.
- It is not recommended to use the same MySQL database with different
indexer.conf files! First process could add something but second
could delete it, and it will never stop.
You can also insert indexer into your crontab job.
Performing search
Open search.cgi in your browser:
http://your.web.server/path/to/search.cgi
Or, if you prefer PHP3:
http://your.web.server/path/to/search.php3 if you have handler for php3
documents in your HTTP server configuration or, if you have PHP3 as CGI,
http://your.web.server/cgi-bin/php.cgi/path/to/search.php3
To find something just type words you want to find and press SUBMIT button.
For example, mysql odbc. mnoGoSearch will find all documents
containing the word "mysql" or the word "odbc". Best matching documents will be displayed
first.
If you prefer more advanced results and use PHP you can use query language.
search.cgi does not have advanced search yet. It is on TODO.
search.php3 understands the following commands:
& - logical AND.
For example, "mysql & odbc". mnoGoSearch will find any URLs that
contain both "mysql" and "odbc".
| - logical OR. For example "mysql|odbc". It's just the same as "mysql odbc".
Space " " is equal to "|". mnoGoSearch will find any URLs, that contain
word "mysql" or word "odbc".
~ - logical NOT. For example "mysql & ~odbc".
mnoGoSearch will find URLs that contain word "mysql" and at the same time do not contain
word "odbc". Note that ~ just excludes word from the result.
Query "~odbc" will find nothing! mnoGoSearch compose WHERE condition in
mysql query using all of the words in search query (to make search very quick):
.... WHERE word in ('all','of','the','words','have','been','typed')
() - group command to compose more complex queries.
For example "(mysql | msql) & ~postgresql".
Query language is simple (and powerful).
Just consider query as common logical expression.
Bugs
Bug Reporting
If you think you've found a bug in mnoGoSearch, you can report it to
mnoGoSearch Developers Team
When reporting a bug, please fully specify you platform,
MySQL version, PHP version (if you use it). Database
statistics (count of records in all tables) and contents of indexer.conf
file would also be helpful.