de.unidu.is.text
Class HTMLFilter

java.lang.Object
  extended byde.unidu.is.text.AbstractFilter
      extended byde.unidu.is.text.AbstractSingleItemFilter
          extended byde.unidu.is.text.HTMLFilter
All Implemented Interfaces:
Filter, SingleItemFilter

public class HTMLFilter
extends AbstractSingleItemFilter

This filter extracts all text from a specified HTML string, and returns the text content in a single string.

Since:
2003-07-04
Version:
$Revision: 1.9 $, $Date: 2005/03/09 08:59:15 $
Author:
Henrik Nottelmann

Field Summary
 
Fields inherited from class de.unidu.is.text.AbstractFilter
nextFilter
 
Constructor Summary
HTMLFilter(Filter nextFilter)
          Creates a new instance and sets the next filter in the chain.
 
Method Summary
 java.lang.Object run(java.lang.Object value)
          Extracts the text from HTML, removes all tags, replaces well-known entities and removes the rest of them, and returns a single strring.
protected  boolean substringStartsWith(java.lang.StringBuffer buffer, int index, java.lang.String str)
          Tests whether the specified string buffer starts (from the specified index) with the specified string (where all letters are converted to lowercase).
 
Methods inherited from class de.unidu.is.text.AbstractSingleItemFilter
filter
 
Methods inherited from class de.unidu.is.text.AbstractFilter
apply, apply
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLFilter

public HTMLFilter(Filter nextFilter)
Creates a new instance and sets the next filter in the chain.

Parameters:
nextFilter - next filter in the filter chain
Method Detail

run

public java.lang.Object run(java.lang.Object value)
Extracts the text from HTML, removes all tags, replaces well-known entities and removes the rest of them, and returns a single strring.

Parameters:
value - HTML string
Returns:
text content of the HTML string

substringStartsWith

protected boolean substringStartsWith(java.lang.StringBuffer buffer,
                                      int index,
                                      java.lang.String str)
Tests whether the specified string buffer starts (from the specified index) with the specified string (where all letters are converted to lowercase).

Parameters:
buffer - string buffer to tested
index - starting index
str - string to test
Returns:
true if the specified string buffer starts (from the specified index) with the specified string