de.unidu.is.text
Class WordSplitterFilter

java.lang.Object
  extended byde.unidu.is.text.AbstractFilter
      extended byde.unidu.is.text.WordSplitterFilter
All Implemented Interfaces:
Filter

public class WordSplitterFilter
extends AbstractFilter

This filter splits a string into tokens. First, all non-letter characters are converted into whitespaces. Then, the resulting string is split into tokens (the whitespaces are the token boundaries). Only tokens with at least 3 characters are returned.

Since:
2003-07-03
Version:
$Revision: 1.11 $, $Date: 2005/02/21 17:29:28 $
Author:
Henrik Nottelmann

Field Summary
 
Fields inherited from class de.unidu.is.text.AbstractFilter
nextFilter
 
Constructor Summary
WordSplitterFilter(Filter nextFilter)
          Creates a new instance and sets the next filter in the chain.
WordSplitterFilter(Filter nextFilter, int length)
          Creates a new instance and sets the next filter in the chain.
 
Method Summary
protected  java.util.Iterator filter(java.lang.Object value)
          Applies only this filter on the specified object, without considering the other filters from the filter chain.
 int getLength()
           
protected  void handleBuffer(java.lang.StringBuffer buffer)
          Handles the specified buffer before splitting it into tokens.
 boolean isAllowDigits()
          Returns whether digits are allowed in the output.
 void setAllowDigits(boolean allowDigits)
          Sets whether digits are allowed in the output.
 void setLength(int i)
           
 
Methods inherited from class de.unidu.is.text.AbstractFilter
apply, apply
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordSplitterFilter

public WordSplitterFilter(Filter nextFilter)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain

WordSplitterFilter

public WordSplitterFilter(Filter nextFilter,
                          int length)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain
Method Detail

filter

protected java.util.Iterator filter(java.lang.Object value)
Applies only this filter on the specified object, without considering the other filters from the filter chain.

This method splits a string into tokens. First, all non-letter characters are converted into whitespaces. Then, the resulting string is split into tokens (the whitespaces are the token boundaries). Only tokens with at least 3 characters are returned.

Specified by:
filter in class AbstractFilter
Parameters:
value - value to be modified by this filter
Returns:
iterator over the resulting objects
See Also:
AbstractFilter.filter(java.lang.Object)

getLength

public int getLength()
Returns:

setLength

public void setLength(int i)
Parameters:
i -

isAllowDigits

public boolean isAllowDigits()
Returns whether digits are allowed in the output.

Returns:
true iff digits are allowed in the output

setAllowDigits

public void setAllowDigits(boolean allowDigits)
Sets whether digits are allowed in the output.

Parameters:
allowDigits - if true, digits are allowed in the output

handleBuffer

protected void handleBuffer(java.lang.StringBuffer buffer)
Handles the specified buffer before splitting it into tokens.

The current implementation replaces every non-letter character by a space.

Parameters:
buffer - string buffer to be handled