Package de.unidu.is.text

Provides filters for text processing.

See:
          Description

Interface Summary
Filter A filter is used to modify objects (in most cases, objects) in a uniform way.
SingleItemFilter A filter which converts each object into exactly one object (or into null).
 

Class Summary
AbstractFilter This is an abstract filter implementation which allows for chaining filters.
AbstractSingleItemFilter This is an abstract filter implementation which converts every object into exactly one object (or into null), and which allows for chaining filters.
CodecSoundexFilter This filter converts a specified string into its soundex representation, using code from Apache Jakarta Commons Codec.
CounterFilter This filter counts occurences of objects, and returns a list of object-frequency pairs.
GermanStemmerFilter This filter converts a specified German string into stemmed version.
HTMLFilter This filter extracts all text from a specified HTML string, and returns the text content in a single string.
LowercaseFilter This filter converts a specified string into lowercase.
ParserFilter This filter splits a string into tokens (by converting all non-letter characters are converted into whitespaces, splitting the resulting string is split into tokens with whitespaces as token boundaries, and considering only tokens with at least 3 characters), converts the tokens into lowercase, computes the stems of the tokens, and removed stopwords.
SoundexFilter This filter converts a specified string into its soundex representation.
StemmerFilter This filter converts a specified string into stemmed version.
StopwordFilter This filter is used for removing stop words.
TokenSplitterFilter This filter splits a string into tokens.
UntagFilter This filter removes XML/HTML tags from a specified string.
WordConcatenatorFilter This filter concatenes all values together (separated by a space).
WordSetConcatenatorFilter This filter concatenes all values together (separated by a space), where each word only occurs once.
WordSplitterFilter This filter splits a string into tokens.
 

Package de.unidu.is.text Description

Provides filters for text processing.