Gemeinsame Javaklassen fürs IR aus unserer Gruppe



The inofficial projects hosts common Java classes developed by and used in our group. These classes mainly deal with IR issues, and include the PIRE retrieval system. The project is available as Open Source Software under the Apache Licence 2.0, which allows easy integration into other projects.

We also have a public mailing list for discussing everything around "java-unidu" and advertising new releases. PIRE has its own public mailing list.

Our project "java-unidu" contains code for the following application areas (we only list some of them, see the source code and the JavaDocs for others):

IR engine PIRE:

One of the major components of java-unidu is PIRE, an extensible, logic-based probabilistic indexing and retrieval engine. PIRE can be extended for performing simple XML retrieval by implementing a generic interface for IR engines (accepting XML documents and XIRQL queries), so that different IR engines can be called with the same code (besides PIRE, HyREX be used by this).

Property maps:

Property maps are an extension to the map implementations in the Java API. Values can be set and retrieved as strings, ints, longs, doubles and booleans. Values can also reference other values; and the property map can be configured to store more than one value for a key. The package contains classes for saving and loading maps from streams and files as well.

Text filters:

Text filters are used to modify objects (in most cases, strings) in a uniform way. Currently there are filters for parsing text, for splitting text into tokens, for stemming and stopword removal, for removing tags and for couting terms.

General utility classes:

This part contains classes for using different character encodings, for string processing, for locating files, for managing collections, and some other features which are not documented here.

IR evaluation:

TBA

GUI:

TBA

Database support:

PIRE has support for connecting to databases (class de.unidu.is.util.DB) and for formatting general SQL statements.

pDatalog++:

The pDatalog++ implementation is described under PIRE, see here.

General expressions:

The general expressions are described under PIRE, see here.

Gnuplot connection:

The class de.unidu.is.gnuplot.Gnuplot allows for using GnuPlot for plotting and learning parameters from Java. The plotting can be used without knowledge of the GnuPlot syntax.

Parameter learning:

Parameters of functions can be learned via de.unidu.is.learning.LearnerFactory.


Software