INEX 2003

Initiative for the Evaluation of XML Retrieval

Call for Participation

April 2003 - December 2003


The Initiative for the Evaluation of XML retrieval (INEX) invites organisations to participate in the INEX 2003 round. The invitation is open to all research groups with an interest in XML retrieval.
The INEX evaluation initiative is part of a large-scale effort to encourage research in information retrieval and digital libraries. The main goal of INEX is to promote the evaluation of content-oriented XML retrieval by providing a large test collection of XML documents, uniform scoring procedures, and a forum for organisations to compare their results. Now in its second year, INEX invites organisations to participate in this international, coordinated effort to XML retrieval evaluation.
In INEX 2003, participating organisations will be able to compare the retrieval effectiveness of their XML document retrieval systems and will contribute to the construction of a large XML test collection. The test collection will also provide participants a means for future comparative and quantitative experiments. Due to copyright issues, only participating organisations will have access to the constructed test collection.

INEX test collection

The test collection will consist of a set of XML documents, topics and relevance assessments. We plan a collaborative effort to derive the topics and the relevance judgments. Detailed guides and on-line topic submission, retrieval result submission, relevance assessment, and evaluation systems will be provided by INEX.

Documents

The documents in the INEX test collection are scientific articles, marked up in XML, from publications of the IEEE Computer Society covering a range of topics in the field of computer science. The collection, approximately 500 megabytes, contains over twelve thousand articles from 18 magazines/transactions from the period of 1995-2002, where an article on average consists of 1500 XML nodes.

Topics

Each group will be asked to create a set of candidate topics, which are representative of the range of real user needs over the XML collection. The queries may be content-only (CO) or content-and-structure (CAS) queries, and broad or narrow topic queries. CO queries are free text queries, like those used in TREC, for which the retrieval system should retrieve relevant XML elements of varying granularity, while CAS queries contain explicit structural constraints, such as containment conditions. From the pooled set of candidate topics a final 50 topics will be selected to form part of the INEX test collection.

Ad-hoc retrieval

The general task, to be performed with the data and the final 50 topics, will be the ad-hoc retrieval of XML documents. Participants will be able to submit up to 3 runs, each containing the top 1000 retrieval results for each of the 50 topics.

Relevance assessments

Relevance assessments will be provided by the participating groups using INEX's on-line assessment system. Each assessor will judge 1-2 topics, either the topics that they originally created or if these were removed from the final set of topics, then topics that were similar to their original queries. Please note that assessments will take about one person week per topic. Participating groups will gain access to the completed INEX test collection only after they have completed their assessment task.

Evaluation

The evaluation of the retrieval effectiveness of the XML retrieval engines used by the participants will be based on the constructed INEX test collection and uniform scoring techniques, including recall/precision measures, which take into account the structural nature of XML documents, including possible overlap of answers.
Participants will be able to present their approaches and final results at the INEX 2003 workshop in December. All results will be published in the INEX workshop proceedings and on the Web.

Schedule

March 31: Deadline for application to participate (see further down).
April 15: The collection of XML documents will be distributed to participants on receipt of the signed data handling agreement(see further down). Participants will also be provided with detailed instructions and formatting criteria for submitting candidate topics/queries for INEX 2003.
May 15: Submission deadline for candidate topics.
May 31: Distribution of the final set of topics to participants along with detailed information on the formatting requirements for the submission of the official retrieval results.
August 1: Submission deadline of the official retrieval results.
August 15: istribution of the pooled retrieval results to participants for relevance assessments along with a detailed assessment guide.
October 1: Submission deadline for relevance assessments.
November 1: Distribution of the completed INEX test collection to participants, including all pooled assessments. Distribution of the evaluation scores to participants.
December 15-17: Workshop in Schloss Dagstuhl, Germany (http://www.dagstuhl.de/).

Organisers

Project Leaders

Professor Norbert Fuhr
University of Duisburg-Essen
Fak. 5/IIIS
Information Systems
Lotharstr. 65
47048 Duisburg
http://www.is.informatik.uni-duisburg.de
Email: fuhr@uni-duisburg.de

Mounia Lalmas
Department of Computer Science
Queen Mary University of London
Mile End Road
London, E1 4NS
http://www.dcs.qmul.ac.uk/research/imc/
http://qmir.dcs.qmw.ac.uk
Email: mounia@dcs.qmul.ac.uk

Contact person

Saadia Malik
University of Duisburg-Essen
Fak. 5/IIIS
Information Systems
Lotharstr. 65
47048 Duisburg
Email:malik@is.informatik.uni-duisburg.de
Phone: (+49) 203 379 3401
Fax: (+49) 203 379 2549

Application to participate

Organisations wishing to participate should Register by the 31st of March.
Confirmation of the receipt of your application will be sent via email within 3 working days. Any questions should be sent to Saadia Malik:malik@is.informatik.uni-duisburg.de

Data Handling Agreement

In order to have access to the data designated as the IEEE Computer Society XML Retrieval Research Collection, organizations(who didn't sign the agreement last year) participating in the INEX initiative must first fill in a data release Application Form. The signed form must be sent (by express mail) to Saadia Malik at the address above (only the original copies of the forms are accepted, no electronic or fax versions). On receipt of the forms, you will be sent information on how to download the data.
Access to the data by an individual person is to be controlled by that person's organization. The organization may only grant access to people working under its control, i.e. its own members, consultants to the organization, or individuals providing service to the organization. All application forms by individuals to access the data must be signed by a person authorized by your organization for such signatures. The individuals form must be kept by the organization for any persons being involved at its site.