de.unidu.is.util
Class StringUtilities

java.lang.Object
  extended byde.unidu.is.util.StringUtilities

public class StringUtilities
extends java.lang.Object

This class provides some convenient static methods for handling strings. Some of the methods simply use filters from de.unidu.is.text.

Since:
2003-07-05
Version:
$Revision: 1.8 $, $Date: 2005/02/21 17:29:29 $
Author:
Henrik Nottelmann

Constructor Summary
StringUtilities()
           
 
Method Summary
static java.lang.String extractFromHTML(java.lang.String content)
          Extracts the text from HTML, removes all tags, replaces well-known entities and removes the rest of them.
static java.lang.String fromXML(java.lang.String text)
          Resolves some entities in an XML string.
static java.lang.String getSoundex(java.lang.String text)
          Returns the soundex representation of a string.
static java.lang.String implode(java.util.Collection collection, java.lang.String separator)
          Implodes the collection by concatenating all elements, separated by the specified string.
static java.lang.String implode(java.lang.Object[] array, java.lang.String separator)
          Implodes the array by concatenating all elements, separated by the specified string.
static boolean isStopword(java.lang.String term)
          Tests whether the specified (unstemmed) term is a stopword.
static boolean isStopwordStemmed(java.lang.String term)
          Tests whether the specified (already stemmed) term is a stopword.
static java.util.Iterator parseText(java.lang.String text)
          Returns an iterator over all (stemmed) terms embedded in the specified string, after removing stopwords.
static java.lang.String remove(java.lang.String str, java.lang.String matchStr)
          Removes all occurences of a string from another string.
static java.lang.String removeTags(java.lang.String str)
          Removes all tags from a string.
static java.lang.String replace(java.lang.String str, java.lang.String matchStr, java.lang.String replaceStr)
          Replaces all occurences of a string by another string.
static java.lang.String stem(java.lang.String term)
          Returns the stemmed term.
static java.lang.String toString(int num, int length)
          Formats the specified number.
static java.lang.String toXML(java.lang.String text)
          Converts some characters in a string into entities: These characters are converted: ß " < > &
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StringUtilities

public StringUtilities()
Method Detail

implode

public static java.lang.String implode(java.lang.Object[] array,
                                       java.lang.String separator)
Implodes the array by concatenating all elements, separated by the specified string.

Parameters:
array - array whose elements have to be concatenated
separator - string used for separating the array elements

implode

public static java.lang.String implode(java.util.Collection collection,
                                       java.lang.String separator)
Implodes the collection by concatenating all elements, separated by the specified string.

Parameters:
collection - collection whose elements have to be concatenated
separator - string used for separating the colleciton elements

toXML

public static java.lang.String toXML(java.lang.String text)
Converts some characters in a string into entities: These characters are converted:

Parameters:
text - text
Returns:
converted text

fromXML

public static java.lang.String fromXML(java.lang.String text)
Resolves some entities in an XML string. These entities are resolved:

Parameters:
text - XML text
Returns:
converted text

remove

public static java.lang.String remove(java.lang.String str,
                                      java.lang.String matchStr)
Removes all occurences of a string from another string.

Parameters:
str - string to modify
matchStr - string to remove
Returns:
modified string

replace

public static java.lang.String replace(java.lang.String str,
                                       java.lang.String matchStr,
                                       java.lang.String replaceStr)
Replaces all occurences of a string by another string.

Parameters:
str - string to modify
matchStr - string to replace
replaceStr - replacement string
Returns:
modified string

getSoundex

public static java.lang.String getSoundex(java.lang.String text)
Returns the soundex representation of a string.

Parameters:
text - string to convert
Returns:
soundex representation

removeTags

public static java.lang.String removeTags(java.lang.String str)
Removes all tags from a string.

Parameters:
str - string to modify
Returns:
modified string

extractFromHTML

public static java.lang.String extractFromHTML(java.lang.String content)
Extracts the text from HTML, removes all tags, replaces well-known entities and removes the rest of them.

Parameters:
content - HTML string
Returns:
text embedded in the HTML

stem

public static java.lang.String stem(java.lang.String term)
Returns the stemmed term.

Parameters:
term - string to be stemmed
Returns:
stemed string

isStopword

public static boolean isStopword(java.lang.String term)
Tests whether the specified (unstemmed) term is a stopword.

Parameters:
term - term to be tested
Returns:
true iff the term is a stopword

isStopwordStemmed

public static boolean isStopwordStemmed(java.lang.String term)
Tests whether the specified (already stemmed) term is a stopword.

Parameters:
term - term to be tested
Returns:
true iff the term is a stopword

parseText

public static java.util.Iterator parseText(java.lang.String text)
Returns an iterator over all (stemmed) terms embedded in the specified string, after removing stopwords. Every non-letter character is considered as a whitespace, and terms with less than 3 characters are discarded.

Parameters:
text - text to be parsed
Returns:
iterator over the terms in the specified text

toString

public static java.lang.String toString(int num,
                                        int length)
Formats the specified number. If the resulting string is shorter than the specified lenght, it is filled (at the beginning) with zeros.

Parameters:
num - number to format
length - exact length of the resulting string
Returns:
formatted number as a string, filled with zeros if needed