de.unidu.is.retrieval.pire.dt
Class TextDT

java.lang.Object
  extended byde.unidu.is.retrieval.pire.dt.AbstractDT
      extended byde.unidu.is.retrieval.pire.dt.TextDT
All Implemented Interfaces:
DT

public class TextDT
extends AbstractDT

A class for the IR datatype "text", containing the operators:

  1. "contains": with stemming and stop word removal, BM25 indexing weights, experimental use only
  2. "stemen": English stemming and stop word removal, BM25 indexing weights
  3. "nostem": without stemming, but with stop word removal, BM25 indexing weights
  4. "stemen_tf": English stemming and stop word removal, normalised TF indexing weights
  5. "nostem_tf": without stemming, but with stop word removal, normalised TFindexing weights

By default, the RSVs are divided by the maximum RSV, and no logistic mapping function is applied. Both can be changed.

Since:
2003-08-16
Version:
$Revision: 1.21 $, $Date: 2005/03/14 17:33:13 $
Author:
Henrik Nottelmann

Field Summary
static java.lang.String CONTAINS_EXP
          Operator name "contains" for experiments.
static java.lang.String NAME
          The name of this datatype.
static java.lang.String NOSTEM
          Operator name "nostem" (no stemming, stopword removal, BM25).
static java.lang.String NOSTEM_TF
          Operator name "nostem_tf" (no stemming, stopword removal, normalised TF).
static java.lang.String NOSTEM_TFIDF
          Operator name "nostem_tfidf" (no stemming, stopword removal, normalised TF.IDF).
static java.lang.String PLAIN_EXP
          Operator name "contains" for experiments.
static java.lang.String STEMEN
          Operator name "stemen" (English stemming, stopword removal, BM25).
static java.lang.String STEMEN_TF
          Operator name "stemen_tf" (English stemming, stopword removal, normalised TF).
static java.lang.String STEMEN_TFIDF
          Operator name "stemen_tfidf" (English stemming, stopword removal, normalised TF.IDF).
 
Constructor Summary
TextDT()
           
 
Method Summary
 void computeIndex(Index index, java.lang.String operator)
          Computes the indexing weights for the specified index and the operator.
protected  Filter getFilter(java.lang.String operator)
          Returns a filter for converting a document value into tokens/token frequency tuples.
 java.lang.String getProbsTemplate(Index index, java.lang.String queryID, java.lang.String suffix, java.lang.String operator)
          Returns a template for computing probabilities of relevance.
protected  Filter getQueryFilter(java.lang.String operator)
          Returns a filter for converting a condition comparison value into tokens/token frequency tuples.
static WordSplitterFilter getWordSplitterFilter()
          Returns the word splitter filter.
 void removeIndex(Index index, java.lang.String operator)
          Removes the index.
static void setDoMapForBM25(boolean flag)
          Sets the flag for enabling or disabling the RSV scaling.
static void setDoScaleForBM25(boolean flag)
          Sets the flag for enabling or disabling the logistic mapping function.
static void setMap(boolean map)
          Deprecated.  
static void setScale(boolean scale)
          Deprecated.  
 
Methods inherited from class de.unidu.is.retrieval.pire.dt.AbstractDT
addProbRules, addRSVRules, addToIndex, convertOperator, getIndexTokens, storedRSVs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAME

public static final java.lang.String NAME
The name of this datatype.

See Also:
Constant Field Values

CONTAINS_EXP

public static final java.lang.String CONTAINS_EXP
Operator name "contains" for experiments.

See Also:
Constant Field Values

PLAIN_EXP

public static final java.lang.String PLAIN_EXP
Operator name "contains" for experiments.

See Also:
Constant Field Values

STEMEN

public static final java.lang.String STEMEN
Operator name "stemen" (English stemming, stopword removal, BM25).

See Also:
Constant Field Values

NOSTEM

public static final java.lang.String NOSTEM
Operator name "nostem" (no stemming, stopword removal, BM25).

See Also:
Constant Field Values

STEMEN_TF

public static final java.lang.String STEMEN_TF
Operator name "stemen_tf" (English stemming, stopword removal, normalised TF).

See Also:
Constant Field Values

NOSTEM_TF

public static final java.lang.String NOSTEM_TF
Operator name "nostem_tf" (no stemming, stopword removal, normalised TF).

See Also:
Constant Field Values

STEMEN_TFIDF

public static final java.lang.String STEMEN_TFIDF
Operator name "stemen_tfidf" (English stemming, stopword removal, normalised TF.IDF).

See Also:
Constant Field Values

NOSTEM_TFIDF

public static final java.lang.String NOSTEM_TFIDF
Operator name "nostem_tfidf" (no stemming, stopword removal, normalised TF.IDF).

See Also:
Constant Field Values
Constructor Detail

TextDT

public TextDT()
Method Detail

getFilter

protected Filter getFilter(java.lang.String operator)
Returns a filter for converting a document value into tokens/token frequency tuples.

Specified by:
getFilter in class AbstractDT
Parameters:
operator - operator name
Returns:
filter

getQueryFilter

protected Filter getQueryFilter(java.lang.String operator)
Returns a filter for converting a condition comparison value into tokens/token frequency tuples.

Specified by:
getQueryFilter in class AbstractDT
Parameters:
operator - operator name
Returns:
filter

computeIndex

public void computeIndex(Index index,
                         java.lang.String operator)
Computes the indexing weights for the specified index and the operator.

The indexing weights are computed using a BM25 weighting scheme.

Specified by:
computeIndex in interface DT
Overrides:
computeIndex in class AbstractDT
Parameters:
index - underlying index
operator - operator name

removeIndex

public void removeIndex(Index index,
                        java.lang.String operator)
Removes the index.

Specified by:
removeIndex in interface DT
Overrides:
removeIndex in class AbstractDT

getProbsTemplate

public java.lang.String getProbsTemplate(Index index,
                                         java.lang.String queryID,
                                         java.lang.String suffix,
                                         java.lang.String operator)
Returns a template for computing probabilities of relevance.

The template string is an expression which contains the key ${PROB}.

Overrides:
getProbsTemplate in class AbstractDT
Parameters:
index - underlying index
queryID - query id
operator - operator name
suffix - relation suffix
Returns:
template for computing probabilities of relevance

setScale

public static void setScale(boolean scale)
Deprecated.  

Sets the scaling flag for experiments.

Parameters:
scale - scaling flag

setMap

public static void setMap(boolean map)
Deprecated.  

Sets the mapping function flag for experiments.

Parameters:
map - mapping function flag

setDoMapForBM25

public static void setDoMapForBM25(boolean flag)
Sets the flag for enabling or disabling the RSV scaling.

Parameters:
flag - flag for enabling or disabling the RSV scaling

setDoScaleForBM25

public static void setDoScaleForBM25(boolean flag)
Sets the flag for enabling or disabling the logistic mapping function.

Parameters:
flag - flag for enabling or disabling the logistic mapping function

getWordSplitterFilter

public static WordSplitterFilter getWordSplitterFilter()
Returns the word splitter filter.

Returns:
word splitter filter