Universität Duisburg-Essen
Startseite Arbeitsgruppe Informationsysteme

Pepper

Peer-to-Peer Architectures for Federated Search of Complex Digital Libraries


Duration:
From 01. 11. 2003 until 31. 12. 2006
Contact Persons:
Involved Persons:
Sponsored by:
  • DFG
  • NSF
Reference number:
  • DFG: BIB47 DOuv 02-01
  • UDE: 15311523 (ka00043c)
Participating Institutions:

The set of providers of Digital Libraries and services on the Web is growing both in absolute numbers and in terms of diversity. From a user point of view, there should be a single virtual library (``one stop shop'') comprising all relevant sources for their information needs. Peer-to-peer architectures have been effective at integrating large numbers of very simple DLs, for example, for file sharing. This project research will demonstrate the use of peer-to-peer architectures for federated search across large numbers of complex digital libraries that are integrated only very loosely.

The project is based on the assumption that it is neither possible nor desirable to enforce homogeneity in a large-scale federation of complex digital libraries. DL providers will differ in terms of their schema used, the quality of the data and their degree of cooperation. We will develop transformation methods that take into account the intrinsic imprecision and vagueness of mappings between different schemas. For this purpose, appropriate methods for describing DL schemas and the (uncertain) mappings between them must be developed.

There is a growing number of Web services that can be used for improving retrieval results from DLs; mapping services help in bridging heterogeneity, and enhancing services provide functions for retrieving additional, relevant documents. We will develop methods for dynamic incorporation of these services into the P2P retrieval system, by developing appropriate methods for both service description and service selection.

Large-scale peer-to-peer networks require routing services so that messages are routed to desired destinations efficiently. We will develop content-based routing services (resource description, resource selection, and data fusion) for peer-to-peer networks. Content-based routing services raise a variety of new issues in the peer-to-peer environment, for example partial representations of DL contents, and a more complex process for deciding whether to satisfy messages locally or route them to another node.

In order to make our implementations of these methods available for other researchers and developers, we will implement all methods by using the JXTA framework, which currently is used by a number of other projects in the DL and peer-to-peer areas.


Publications

Henrik Nottelmann; Gudrun Fischer (2007).
Search and browse services for heterogeneous collections with the peer-to-peer network Pepper. Information Processing & Managementt 43

Nottelmann, Henrik; Fuhr, Norbert (2007).
A Decision-Theoretic Model for Decentralised Query Routing in Hierarchical Peer-To-Peer Networks. In ECIR:07

Nottelmann, Henrik; Aberer, Karl; Callan, Jamie; Nejdl, Wolfgang (2006).
The CIKM 2005 Workshop on Information Retrieval in Peer-to-Peer Networks. SIGIR Forum 40(1)

Nottelmann, Henrik; Fuhr, Norbert (2006).
Comparing different architectures for query routing in peer-to-peer networks. In ECIR:06

Nottelmann, Henrik; Straccia, Umberto (2006).
A Probabilistic, Logic-based Framework for Automated Web Directory Alignment. In: Zongmin Ma (ed.):

Henrik Nottelmann; Umberto Straccia (2006).
Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43

Gudrun Fischer; André Nurzenski (2005).
Towards Scatter/Gather Browsing in a Hierarchical Peer-to-Peer Network. In P2PIR:05

H. Nottelmann (2005).
PIRE: An extensible IR engine based on probabilistic Datalog. In ECIR:05

Henrik Nottelmann (2005).
Inside PIRE: An extensible, open-source IR engine based on probabilistic logics. Technical Report, University of Duisburg-Essen

Henrik Nottelmann; Gudrun Fischer; Alexej Titarenko; André Nurzenski (2005).
An integrated approach for searching and browsing in heterogeneous peer-to-peer networks. In HDIR:05

H. Nottelmann; N. Fuhr (2006).
Adding Probabilities and Rules to OWL Lite Subsets based on Probabilistic Datalog. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 14(1)

H. Nottelmann; U. Straccia (2005).
sPLMap: A probabilistic approach to schema matching. In ECIR:05

Henrik Nottelmann; Umberto Straccia (2005).
Information retrieval and machine learning for probabilistic schema matching (poster). In CIKM:05

Henrik Nottelmann; Karl Aberer; Jamie Callan; Wolfgang Nejdl (eds.) (2005).
Proceedings of the 2005 ACM Workshop on Information Retrieval in Peer-to-Peer Networks (P2PIR 2005), Bremen, Germany, November 4, 2005.

H. Nottelmann; N. Fuhr (2004).
Combining CORI and the decision-theoretic approach for advanced resource selection. In ECIR:04

Henrik Nottelmann; Norbert Fuhr (2004).
pDAML+OIL: A probabilistic extension to DAML+OIL based on probabilistic Datalog. In IPMU:04

H. Nottelmann; N. Fuhr (2004).
A logic-based approach for computing service executions plans in peer-to-peer networks. In P2PIR:04

N. Fuhr; C.-P. Klas (2001).
Combining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries. In DELOS-Interoperability:01

Jie Lu; Jamie Callan (2004)
Merging retrieval results in hierarchical peer-to-peer networks. (poster description) Proceedings of the Twenty-Seventh International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Sheffield, UK, ACM.

Jie Lu; Jamie Callan (2004)
Federated search of text-based digital libraries in hierarchical peer-to-peer networks. Peer-to-Peer IR Workshop of the Twenty-Seventh International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, ACM.

Jie Lu; Jamie Callan (2003)
Content-based information retrieval in peer-to-peer networks. Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM), New Orleans, ACM.


Talks

Norbert Fuhr (2007).
A Decision-Theoretic Model for Decentralised Query Routing in Hierarchical Peer-To-Peer Networks. Talk at the European Conference on Information Retrieval Research, Rome, Italy

Norbert Fuhr (2006).
Comparing different architectures for query routing in peer-to-peer networks. Talk at the Max-Planck-Institute of Informatics (Saarbrücken, Germany)

Henrik Nottelmann (2005).
Pepper - Information Retrieval in hierarchical Peer-to-Peer networks with heterogeneous services. Talk at the 'P2PIR in Germany' workshop (Leipzig)

Henrik Nottelmann (2005).
Decision-theoretic resource selection in hierarchical peer-to-peer networks. Talk at the CMU LTI group meeting

Henrik Nottelmann; Gudrun Fischer; Alexej Titarenko; André Nurzenski (2005).
An integrated approach for searching and browsing in heterogeneous peer-to-peer. Talk at the HDIR 2005 workshop (co-located with SIGIR)

Henrik Nottelmann (2003).
Probabilistic logics for defining and using P2P service descriptions. Workshop on Metadata Management in Grid and Peer-to-Peer Systems (MMGPS), London

Henrik Nottelmann (2003).
Probabilistic logics for defining and using P2P service descriptions. QMIR Seminar, London


Diploma, master and bachelor theses

Only in german!

Information Retrieval im Semantic Web
Finished diploma thesis
Service-Beschreibungen in Peer-to-Peer-Netzen
Finished master thesis
Cluster-basiertes Browsing in Peer-to-Peer-Netzen
Finished diploma thesis
IR im P2P-Netz JXTA
Finished diploma thesis

Related projects

DAFFODIL
Distributed Agents for User-Friendly Access of Digital Libraries
MIND
Resource Selection and Data Fusion for Multimedia International Digital Libraries

Project meetings

November 21/22, 2004, Pittsburgh:
Technical meeting
July 25, 2004, Sheffield:
Technical meeting
March 8/9, 2004, Duisburg:
Technical meeting
November 10/11, 2003, Pittsburgh:
Kick-off meeting

Testbeds

DTF in P2P networks (used in ECIR 2006 paper):
Used in ECIR 2006 paper (300 KB)
Schema mapping:
BIBDB, OAI (3 MB) (down)