de.unidu.is.text
Class StopwordFilter

java.lang.Object
  extended byde.unidu.is.text.AbstractFilter
      extended byde.unidu.is.text.AbstractSingleItemFilter
          extended byde.unidu.is.text.StopwordFilter
All Implemented Interfaces:
Filter, SingleItemFilter

public class StopwordFilter
extends AbstractSingleItemFilter

This filter is used for removing stop words.

The stopwords are taken from conf/common_words.

Since:
2003-07-04
Version:
$Revision: 1.17 $, $Date: 2005/03/14 17:33:14 $
Author:
Henrik Nottelmann

Field Summary
 
Fields inherited from class de.unidu.is.text.AbstractFilter
nextFilter
 
Constructor Summary
StopwordFilter(Filter nextFilter)
          Creates a new instance and sets the next filter in the chain.
StopwordFilter(Filter nextFilter, java.util.Set stopwords)
          Creates a new instance and sets the next filter in the chain.
StopwordFilter(Filter nextFilter, java.lang.String fileName)
          Creates a new instance and sets the next filter in the chain.
 
Method Summary
static java.util.Set getDefaultStopwords()
          Returns the stopword list, and leads it if required.
static int getMinWordLength()
          Returns the minimum length for words used as stop words.
 java.util.Set getStopwordsSet()
          Returns a set containing all stopwords.
 boolean isStopword(java.lang.String term)
          Tests if term is (after stemming) a stopword.
 boolean isStopwordStemmed(java.lang.String term)
          Tests if term is a stopword.
static java.util.Set readStopwords(java.lang.String fileName)
          Returns the stopword list, and leads it if required.
 java.lang.Object run(java.lang.Object value)
          Returns null if the specified value is a stopword, and the specified value else.
static void setMinWordLength(int minWordLength)
          Sets the minimum length for words used as stop words.
 
Methods inherited from class de.unidu.is.text.AbstractSingleItemFilter
filter
 
Methods inherited from class de.unidu.is.text.AbstractFilter
apply, apply
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StopwordFilter

public StopwordFilter(Filter nextFilter)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain

StopwordFilter

public StopwordFilter(Filter nextFilter,
                      java.util.Set stopwords)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain
stopwords - set of stopwords used instead of the default set

StopwordFilter

public StopwordFilter(Filter nextFilter,
                      java.lang.String fileName)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain
fileName - name of file with stopwords
Method Detail

getDefaultStopwords

public static java.util.Set getDefaultStopwords()
Returns the stopword list, and leads it if required.


readStopwords

public static java.util.Set readStopwords(java.lang.String fileName)
Returns the stopword list, and leads it if required.

Parameters:
fileName - file name with stop word list

run

public java.lang.Object run(java.lang.Object value)
Returns null if the specified value is a stopword, and the specified value else.

Parameters:
value - string to be tested
Returns:
null iff the value is a stopword, and the value else

getStopwordsSet

public java.util.Set getStopwordsSet()
Returns a set containing all stopwords.

Returns:
set containing all stopwords

isStopword

public boolean isStopword(java.lang.String term)
Tests if term is (after stemming) a stopword.

Parameters:
term - term to test
Returns:
true if term is (after stemming) a stopword

isStopwordStemmed

public boolean isStopwordStemmed(java.lang.String term)
Tests if term is a stopword.

Parameters:
term - term to test
Returns:
true if term is a stopword

getMinWordLength

public static int getMinWordLength()
Returns the minimum length for words used as stop words.

Returns:
minimum length for words used as stop words

setMinWordLength

public static void setMinWordLength(int minWordLength)
Sets the minimum length for words used as stop words. If the set of stopwords has already been loaded, then this set of loaded again (but only worls for new filters).

Parameters:
minWordLength - minimum length for words used as stop words