Universität Duisburg-Essen
Startseite Arbeitsgruppe Informationsysteme

XML Clustering and Browsing - Notes

Notes

Semantics of XML markup, slide 3

  • Order of elements:
    Not so much about structure (order of element tags), but more about order of elements of the same type. E.g. the frames in an MPEG video: there, the order is relevant.

Similarity: nesting, slide 5

  • Consider the structure only, i.e. only the nesting as such. Do not look at the element names here.

Similarity: coordination, slide 6

  • Treat (see) coordinated elements more or less like tuples.

Similarity: importance, slide 9

  • There is no theoretic foundation for the weighted sum.
  • By modeling importance from the user's point of view, we would arrive at a product (rather than a sum), with the weights as powers of the factors.
  • Once upon a time, Norbert already experimented with learning such weights via regression techniques.

General considerations, slide 12

  • A clear partition will not always be the most meaningful one. Thus, we should not state that as a goal, but rather: finding partitions with semantic interpretations.

Clustering and perspectives, slide 17

  • Order:
    E.g. with the list of ingredients for food, the order of elements is relevant, and thus should be considered for clustering, too.
  • Position:
    With the papers from certain fields of research, the order of the authors is relevant. A user might want to cluster with the first author, only, for example.

Others

  • We did not consider links at all.