de.unidu.is.retrieval.pire
Class PIRE

java.lang.Object
  extended byde.unidu.is.retrieval.pire.PIRE

public class PIRE
extends java.lang.Object

An IR engine based on probabilistic Datalog ("Probabilistic Datalog IR Engine").

This IR engine uses a document models consisting of attributes (which refer to a datatype). Datatypes are modelled by special classes in the package de.unidu.is.retrieval.pire.dt, where the class name is the datatype name plus DT.

Probabilistic Datalog is used in an Index. Currently a PDatalogIndex is used, which used the code in de.unidu.is.pdatalog. If another pDatalog implementation should be used, only the method newIndex(String) has to be overwritten in a subclass.

Since:
2003-08-16
Version:
$Revision: 1.30 $, $Date: 2005/03/18 22:02:33 $
Author:
Henrik Nottelmann

Field Summary
protected  RelationBase base
          The relation base.
protected  java.lang.String collectionName
          The name of this collection.
protected  java.util.Map conditions
          Map with temporary conditions for retrieval with weighted-sum queries.
protected  PropertyMap counts
          Property map with temporary counter for sub-queries.
protected  Index dummyIndex
          Dummy index used for retrieval.
protected  java.util.Map indexes
          Indexes, specified by their name.
protected  java.util.Map rules
          Map with temporary rules for retrieval with Boolean-style queries.
protected  Schema schema
          Schema.
 
Constructor Summary
PIRE(DB db, java.lang.String collectionName)
          Creates a new instance.
 
Method Summary
 void addCondition(java.lang.String queryID, java.lang.String attName, java.lang.String operator, double weight, java.lang.Object value)
          Adds a condition for a weighted sum query.
 void addCondition(java.lang.String queryID, WeightedQueryCondition cond)
          Adds a condition for a weighted sum query.
 void addConjunction(java.lang.String queryID, QueryCondition[] conditions)
          Adds a conjunction for a Boolean-style query in disjunctive form.
 void addMomentsCondition(java.lang.String queryID, java.lang.String attName, java.lang.String operator, double weight, java.lang.Object value)
          Adds a query condition.
 void addMomentsCondition(java.lang.String queryID, WeightedQueryCondition cond)
          Adds a query condition.
 void addToIndex(java.lang.String docID)
          Add the document id to the corresponding table.
 void addToIndex(java.lang.String docID, java.lang.String attName, java.lang.Object value)
          Add the document value of the specified attribute to the index.
 void closeQuery(java.lang.String queryID)
          Finishes the processing of this query and frees used resources.
 void computeIndex()
          Computes the index, based on the document values added.
 void computeMoments()
          Computes the moments of the indexing weights.
 void computeProbs(java.lang.String queryID)
          Computes probabilities of relevance based on the documents' RSV.
 java.lang.String getCollectionName()
          Return the name of this collection.
protected  java.util.List getConditionList(java.lang.String queryID)
          Returns a list of conditions for the specified query id.
protected  DT getDT(SchemaElement element)
          Returns a data type object for the specified schema element
protected  SchemaElement getElement(java.lang.String attName)
          Returns the attribute with the specified name.
protected  Index getIndex(java.lang.String key)
          Returns the index specified by its name.
 Index getIndex(java.lang.String attName, java.lang.String operator)
          Returns the index specified by its name (created based on the attribute name and the operator name).
 Moments getMoments(java.lang.String queryID)
          Returns the expectation and the variance of the RSVs.
 double getRD(java.lang.String attName, java.lang.String operator, java.lang.String key)
          Returns the value corresponding to the specified key in the rd relation in the specified index.
 java.util.List getResult(java.lang.String queryID, int numDocs)
          Returns the probabilities of relevance for the top-ranked documents in decreasing order.
protected  java.util.List getRuleList(java.lang.String queryID)
          Returns a list of rules for the specified query id.
 void initIndex()
          Inits the index.
 void initQuery(java.lang.String queryID)
          Inits the query.
protected  Index newIndex(java.lang.String key)
          Creates a new PDatalog index with the specified name.
 void registerAttribute(java.lang.String attName, java.lang.String datatype, java.util.List operators)
          Registers the specified attribute.
 void removeIndex()
          Removes the index.
 void setRD(java.lang.String attName, java.lang.String operator, java.lang.String key, double value)
          Sets the value corresponding to the specified key in the rd relation in the specified index.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

collectionName

protected java.lang.String collectionName
The name of this collection.


indexes

protected java.util.Map indexes
Indexes, specified by their name.


schema

protected Schema schema
Schema.


conditions

protected java.util.Map conditions
Map with temporary conditions for retrieval with weighted-sum queries. Keys are query IDs (as strings), values are lists of rules (as WeightedQueryCondition objects).


rules

protected java.util.Map rules
Map with temporary rules for retrieval with Boolean-style queries. Keys are query IDs (as strings), values are lists of rules (as Rule objects).


counts

protected PropertyMap counts
Property map with temporary counter for sub-queries. Keys are query IDs (as strings), values are the counters (as integers).


dummyIndex

protected Index dummyIndex
Dummy index used for retrieval.


base

protected RelationBase base
The relation base.

Constructor Detail

PIRE

public PIRE(DB db,
            java.lang.String collectionName)
Creates a new instance.

Parameters:
db - database parameters
collectionName - name of the collection
Method Detail

getDT

protected DT getDT(SchemaElement element)
Returns a data type object for the specified schema element

Parameters:
element - schema element
Returns:
data type object

getIndex

public Index getIndex(java.lang.String attName,
                      java.lang.String operator)
Returns the index specified by its name (created based on the attribute name and the operator name). If no index exists so far, a new one will be created a added to the indexes map.

Parameters:
attName - attribute name
operator - name
Returns:
corresponding index

getIndex

protected Index getIndex(java.lang.String key)
Returns the index specified by its name. If no index exists so far, a new one will be created a added to the indexes map.

Parameters:
key - index name
Returns:
corresponding index

newIndex

protected Index newIndex(java.lang.String key)
Creates a new PDatalog index with the specified name.

Parameters:
key - index name
Returns:
new, empty index

getElement

protected SchemaElement getElement(java.lang.String attName)
Returns the attribute with the specified name.

Parameters:
attName - attribute name
Returns:
attribute definition

registerAttribute

public void registerAttribute(java.lang.String attName,
                              java.lang.String datatype,
                              java.util.List operators)
Registers the specified attribute. This only has to be called once per attribute after creating this IR engine.

Parameters:
attName - attribute name
datatype - corresponding datatype
operators - list of operators

getRD

public double getRD(java.lang.String attName,
                    java.lang.String operator,
                    java.lang.String key)
Returns the value corresponding to the specified key in the rd relation in the specified index.

Parameters:
attName - schema attribute name
operator - search operator
key - value key
Returns:
value

setRD

public void setRD(java.lang.String attName,
                  java.lang.String operator,
                  java.lang.String key,
                  double value)
Sets the value corresponding to the specified key in the rd relation in the specified index.

Parameters:
attName - schema attribute name
operator - search operator
key - value key
value - value

initIndex

public void initIndex()
Inits the index.


addToIndex

public void addToIndex(java.lang.String docID)
Add the document id to the corresponding table.

Parameters:
docID - document id

addToIndex

public void addToIndex(java.lang.String docID,
                       java.lang.String attName,
                       java.lang.Object value)
Add the document value of the specified attribute to the index. Parsing the value is left to the datatype implementation; so, for text this can be a fulltext string.

Parameters:
docID - document id
attName - attribute name
value - attribute value

computeIndex

public void computeIndex()
Computes the index, based on the document values added.


computeMoments

public void computeMoments()
Computes the moments of the indexing weights.


removeIndex

public void removeIndex()
Removes the index.


initQuery

public void initQuery(java.lang.String queryID)
Inits the query.

Parameters:
queryID - query id

addCondition

public void addCondition(java.lang.String queryID,
                         java.lang.String attName,
                         java.lang.String operator,
                         double weight,
                         java.lang.Object value)
Adds a condition for a weighted sum query. These conditions are collected and evaluated later in computeProbs(String), as conditions for the same attribute/operator pair have to be evaluated together.

Parameters:
queryID - query id
attName - schema attribute name
operator - search operator
weight - condition weight
value - comparison value

addCondition

public void addCondition(java.lang.String queryID,
                         WeightedQueryCondition cond)
Adds a condition for a weighted sum query. These conditions are collected and evaluated later in computeProbs(String), as conditions for the same attribute/operator pair have to be evaluated together.

Parameters:
queryID - query id
cond - weighted query condition

addConjunction

public void addConjunction(java.lang.String queryID,
                           QueryCondition[] conditions)
Adds a conjunction for a Boolean-style query in disjunctive form. The conjunction forms one term in the disjunctive form. This method computes both the RSV and the probabilities of relevance for the given conditions, and adds a rule for computing the conjunction (of the probabilities of relevance) to the dummy index. These rules are later evaluated together in computeProbs(String).

Parameters:
queryID - query id
conditions - conditions forming a conjunction

computeProbs

public void computeProbs(java.lang.String queryID)
Computes probabilities of relevance based on the documents' RSV.

Parameters:
queryID - query id

getResult

public java.util.List getResult(java.lang.String queryID,
                                int numDocs)
Returns the probabilities of relevance for the top-ranked documents in decreasing order. Before this, computeProbs() has to be called.

Parameters:
queryID - query id
numDocs - number of documents to retrieve
Returns:
list of ProbDoc instances

closeQuery

public void closeQuery(java.lang.String queryID)
Finishes the processing of this query and frees used resources.

Parameters:
queryID - query id

addMomentsCondition

public void addMomentsCondition(java.lang.String queryID,
                                java.lang.String attName,
                                java.lang.String operator,
                                double weight,
                                java.lang.Object value)
Adds a query condition. The results for that condition are inserted into the results (for expectation and variance).

Parameters:
queryID - query id
attName - schema attribute name
operator - search operator
weight - condition weight
value - comparison value

addMomentsCondition

public void addMomentsCondition(java.lang.String queryID,
                                WeightedQueryCondition cond)
Adds a query condition. The results for that condition are inserted into the results (for expectation and variance).

Parameters:
queryID - query id
cond - weighted query condition

getMoments

public Moments getMoments(java.lang.String queryID)
Returns the expectation and the variance of the RSVs.

Parameters:
queryID - query id
Returns:
moments

getCollectionName

public java.lang.String getCollectionName()
Return the name of this collection.

Returns:
name of this collection

getRuleList

protected java.util.List getRuleList(java.lang.String queryID)
Returns a list of rules for the specified query id.

Parameters:
queryID - query id
Returns:
list of temporary rules

getConditionList

protected java.util.List getConditionList(java.lang.String queryID)
Returns a list of conditions for the specified query id.

Parameters:
queryID - query id
Returns:
list of temporary conditions