de.unidu.is.text
Class TokenSplitterFilter

java.lang.Object
  extended byde.unidu.is.text.AbstractFilter
      extended byde.unidu.is.text.TokenSplitterFilter
All Implemented Interfaces:
Filter

public class TokenSplitterFilter
extends AbstractFilter

This filter splits a string into tokens. First, character which is not a letter and not a digit is converted into a whitespace. Then, the resulting string is split into tokens (the whitespaces are the token boundaries).

Since:
2003-07-03
Version:
$Revision $, $Date: 2005/02/28 22:27:55 $
Author:
Henrik Nottelmann

Field Summary
 
Fields inherited from class de.unidu.is.text.AbstractFilter
nextFilter
 
Constructor Summary
TokenSplitterFilter(Filter nextFilter)
          Creates a new instance and sets the next filter in the chain.
TokenSplitterFilter(Filter nextFilter, int length)
          Creates a new instance and sets the next filter in the chain.
 
Method Summary
protected  java.util.Iterator filter(java.lang.Object value)
          Applies only this filter on the specified object, without considering the other filters from the filter chain.
 int getLength()
           
protected  void handleBuffer(java.lang.StringBuffer buffer)
          Handles the specified buffer before splitting it into tokens.
 void setLength(int i)
           
 
Methods inherited from class de.unidu.is.text.AbstractFilter
apply, apply
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenSplitterFilter

public TokenSplitterFilter(Filter nextFilter)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain

TokenSplitterFilter

public TokenSplitterFilter(Filter nextFilter,
                           int length)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain
Method Detail

filter

protected java.util.Iterator filter(java.lang.Object value)
Applies only this filter on the specified object, without considering the other filters from the filter chain.

This method splits a string into tokens. First, all non-letter characters are converted into whitespaces. Then, the resulting string is split into tokens (the whitespaces are the token boundaries). Only tokens with at least 3 characters are returned.

Specified by:
filter in class AbstractFilter
Parameters:
value - value to be modified by this filter
Returns:
iterator over the resulting objects
See Also:
AbstractFilter.filter(java.lang.Object)

getLength

public int getLength()
Returns:

setLength

public void setLength(int i)
Parameters:
i -

handleBuffer

protected void handleBuffer(java.lang.StringBuffer buffer)
Handles the specified buffer before splitting it into tokens.

The current implementation replaces every character which is not a letter and not a digit by space.

Parameters:
buffer - string buffer to be handled