|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectde.unidu.is.text.AbstractFilter
de.unidu.is.text.AbstractSingleItemFilter
de.unidu.is.text.StopwordFilter
This filter is used for removing stop words.
The stopwords are taken from conf/common_words.
| Field Summary |
| Fields inherited from class de.unidu.is.text.AbstractFilter |
nextFilter |
| Constructor Summary | |
StopwordFilter(Filter nextFilter)
Creates a new instance and sets the next filter in the chain. |
|
StopwordFilter(Filter nextFilter,
java.util.Set stopwords)
Creates a new instance and sets the next filter in the chain. |
|
StopwordFilter(Filter nextFilter,
java.lang.String fileName)
Creates a new instance and sets the next filter in the chain. |
|
| Method Summary | |
static java.util.Set |
getDefaultStopwords()
Returns the stopword list, and leads it if required. |
static int |
getMinWordLength()
Returns the minimum length for words used as stop words. |
java.util.Set |
getStopwordsSet()
Returns a set containing all stopwords. |
boolean |
isStopword(java.lang.String term)
Tests if term is (after stemming) a stopword. |
boolean |
isStopwordStemmed(java.lang.String term)
Tests if term is a stopword. |
static java.util.Set |
readStopwords(java.lang.String fileName)
Returns the stopword list, and leads it if required. |
java.lang.Object |
run(java.lang.Object value)
Returns null if the specified value is a stopword, and the specified value else. |
static void |
setMinWordLength(int minWordLength)
Sets the minimum length for words used as stop words. |
| Methods inherited from class de.unidu.is.text.AbstractSingleItemFilter |
filter |
| Methods inherited from class de.unidu.is.text.AbstractFilter |
apply, apply |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public StopwordFilter(Filter nextFilter)
nextFilter - next filter in the filter chain
public StopwordFilter(Filter nextFilter,
java.util.Set stopwords)
nextFilter - next filter in the filter chainstopwords - set of stopwords used instead of the default set
public StopwordFilter(Filter nextFilter,
java.lang.String fileName)
nextFilter - next filter in the filter chainfileName - name of file with stopwords| Method Detail |
public static java.util.Set getDefaultStopwords()
public static java.util.Set readStopwords(java.lang.String fileName)
fileName - file name with stop word listpublic java.lang.Object run(java.lang.Object value)
value - string to be tested
public java.util.Set getStopwordsSet()
public boolean isStopword(java.lang.String term)
term - term to test
public boolean isStopwordStemmed(java.lang.String term)
term - term to test
public static int getMinWordLength()
public static void setMinWordLength(int minWordLength)
minWordLength - minimum length for words used as stop words
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||