Next: new Up: FreeWAIS-sf Previous: Increasing the index


Here is a somewhat adhoc list of things to fix and features to add:

Some might have noticed that the X clients do not (yet) compile with the new X11 release. Porting should not be too difficult for someone with some X11 Knowledge
Code currently does not pass a strikt ANSI Compiler. We intend to switch to the Prototyping scheme known from the WWW library:
#ifdef __STDC__
#define ARGS1(t,a) \
                (t a)
#else  /* not ANSI */
#define ARGS1(t,a) (a) \
                t a;
#endif /* __STDC__ (ANSI) */
waisindex BUGS

Waisindex has still problems with filenames. E.g. files with apostophes or asterics in them are not handled properly. Filenames with wildcards may enter the filename table despite the fact, that they do not exists.
The -a flag is not handled properly. Adding a file, which contains only a subset of the declared fields causes the other fields to be ignored by the server until a ``complete'' document is added.

Compressed Indexes
There are several know methods for compressing inverted files which could save us disc space and significatly improve search speed.
Spatial Indexes
(Notes from Doug Nebert)

We would like to add a field type into the SF software which would allow for the parsing of and indexing of geographic coordinates that describe the outline of a data set or document. Software has been written outside of SF to do the parsing (using flex), and the indexing and overlay routines have been included into the freeWAIS-0.3 code. Now we need to integrate the code so that we can perform full field searching of text, dates, numbers, and geography in one indexing system.

(Notes from Doug Nebert)

It seems to me that if the SF crowd can consistently use the .fde file incorporated into the available .src file that a functionality like "explain" can be developed to allow the client to determine what attributes are being used and formulate a query window to match it. probably easier would be to have a "form" resource file which could be retrieved from the server (e.g. query.html) by a "smart" http client...

Relevance Feedback
Notice, that the thing build in freeWAIS* is not ``Relevance Feedback''. It is rathersome kind of query expansion. Real Relevance Feedback is proved to produce much more effective ranking.
(Notes from Marc Edgar)

How about having a script that could automatically update a the database. That is, a record would be kept of which files were in the database. This record (.rec or some such) could be used instead of having to remember or find the command that created it. That is, it would support something like,

       waisindex -update filename.rec
to rebuild the database.

format files
(Notes from Marc Edgar)

Programs like this become immortal when they do not take any special knowlege to use. Format files for common data types would make FreeWAIS-sf more accessible. Maybe you've already done this but format files like, FAQ.fmt, email.fmt, usenet.fmt would be very helpful, (and probably not that hard to write.) Maybe creating an incoming directory on your ftp server would be useful, so that users could post their .fmt files and save you from having to do the work.

Z39.50 V2
(Notes from Doug Nebert)

It seems that the functionality you have provided matches very well the basic abilities of Z39.50 V2 and V3 in terms of fields and search. If there were a way to identify registered attributes then the construction of a gateway from ZDIST to an FreeWAIS-sf store of data would be possible, allowing people to keep their data in one format and serve the V1 and non V1 communities.

My thoughts regarding a linkage between FreeWAIS-sf and a full Z39.50 V2-3 release such as ZDIST were to provide a link into the new capabilities and other "compliant" clients out there. But I think much of the API work could be done with the help of CNIDR personnel - their "linkage" back into freeWAIS-0.3 disabled some of the functionality whereas FreeWAIS-sf is more on the same level of sophistication as V2 and should be easier to connect to. If such a connection can be made it would allow you all to maintain and enhance the existing code and have some partners out here work on maintaining the API connection, taking the load off you except in consultation ...

Note from Alberto Accomazzi (Darin McKeever proposed similar features).

First of all, when indexing the documents, the user should be able to specify the following for each field to be indexed:

This could be done by allowing the entries in the format file to look like:
        <field> /^Authors: /
        au TEXT BOTH minchars 2 word /[^ ;\n=()]+/
            stop abstracts_field_au.stop syn abstracts_field_syn.syn
Other things such as headline length should be specifiable in the .fmt file as well.
Counting the mails I receive every day leads to the conclusion that there is a lack of documentation.
The online manuals are out of date.
document specs
Many people have difficulties in building document specifications. Either there should be a nicer input format or someone should provide a compiler (+checking and testing?) for some prettier specification format.
other systems
There should be more info on: How do i use FreeWAIS-sf with Gopher, Mosaic, httpd, perl, ...

Next: new Up: FreeWAIS-sf Previous: Increasing the index


Ulrich Pfeifer
Thu May 25 16:37:04 MET DST 1995