Zitationsschlüssel:
Klas/Fuhr:00
Titel:
A new Effective Approach for Categorizing Web Documents
Autor(en):
Claus-Peter Klas
Norbert Fuhr
In:
Proceedings of the 22th BCS-IRSG Colloquium on IR Research
Jahr:
2000

Zusammenfassung:
Categorization of Web documents poses a new challenge for automatic classification methods. In this paper, we present the megadocument approach for categorization. For each category, all corresponding document texts from the training sample are concatenated to a megadocument, which is indexed using standard methods. In order to classify a new document, the most similar megadocument determines the category to be assigned. Our evaluations show that for Web collections, the megadocument method clearly outperformes other classification methods. In contrast, for the Reuters collection, we only achieve mediocre results. Thus, our method seems to be well suited for heterogeneous document collections.

BibTeX-Eintrag

Volltext als PS