SFgate 5.0

SFgate and Heterogeneous Databases


Heterogeneous Databases

If there is the need to query more than one WAIS database in parallel the one setting up a query interface may be lucky and the fields available in the different databases coincide exactly in each of the databases. But the more common case is that there are different fields in different databases. The schemas of the databases differ.

In most cases this is due to different types of the documents stored in the databases, e.g. the one database holds references to literature while another one holds product descriptions.

For databases with documents not differing to much in their types we present a means to deal with the heterogeneity of the database schemas. SFgate 5.0 is suited to handle heterogeneous databases holding references to literature, like e.g. articles, books, reports.

Mapping of Attributes

The problem with heterogeneous database schemas is to get a translation of queries referring to the external schemas to a query formulation referring to the conceptual schema of each database to query.

The external schema is manifested in a query form, where the user may have the possibility to choose one or more databases to query, and - more crucial - a set of attributes which can be used to formulate a query. There may exist several input fields referring to some of these attributes, but in general the user should have the possibility to use any of the predefined attribute set to formulate his query.

The conceptual level of a database is given by the database schema, i.e. by the fields and their types existing in the database.

Now the task for each database to query is to take the attributes from a given query and map them on the most suitable attributes being part of the database schema.

A simple (but insufficient) solution is to rename the attributes within the databases to those used in queries. But this is sufficient only if all databases include the same attributes (on the semantic level) as used within the external level to formulate a query. So strictly speaking this solution does not deal with heterogeneity.

The solution implemented in SFgate 5.0 is based on a predefined set of attributes for the external level. To do a mapping of attributes there has to be knowledge of how the attributes are related to each other. So we introduced a lattice on these attributes which reflects the specialization relationships between the attributes.

With means of this lattice the mapping process for attributes can be defined with the four operations equality, specialization, generalization and ignorance in the following order:

If a query condition refers to a lattice attribute, which is in the schema of the database to query, no mapping is done. The original query condition is taken to be part of the translated query.
If a query condition refers to a lattice attribute which is not in the database to query, but there are more special attributes within the database schema, the mapping is done by generating one new query conditions for each of the more special attributes. This is done by replacing the original attribute from the external level with the more special one of the conceptual level. The new query conditions (if there are more than one) are connected with the Boolean OR to one new query condition as part of the translated query.
If a query condition refers to an lattice attribute which is not in the database to query and there aren't any attributes being more special within the schema of the database to query, the attribute is mapped on the nearest more general attribute which is part of the database schema.
If neither equality nor specialization nor generalization yielded a translation of the original query condition the whole condition is ignored for the translated query of the actual database.

In future the mapping process will be refined by various means:

Predefined Attribute Lattice

The set of attributes used within the lattice is mainly taken from the Scientific and Technical Attribute Set (STAS) hold by CNIDR. STAS defines standard identifiers for referring to searchable and retrievable fields within scientific and technical databases.

KL-ONE defines a diffs operator which allows for inheritance on attributes. Using the diffs construct a specialization hierarchy has been introduced on a (small) subset of the STAS attributes:

         |  |
         |  |-content
         |  |  |
         |  |  +-full-text
         |  |  |  |
         |  |  |  +-title
         |  |  |  |  |
         |  |  |  |  |-book-title
         |  |  |  |  |-article-title
         |  |  |  |  +-series-title
         |  |  |  |
         |  |  |  |-abstract
         |  |  |  +-subject-descriptor
         |  |  |
         |  |  +-journal-title
         |  |  
         |  |-initiator
         |  |  |
         |  |  |-author-name
         |  |  |-editor-name
         |  |  |-corporate
         |  |  +-conference
         |  |
         |  +-publisher
         |     |
         |     |-publisher-name
         |     +-publisher-address
         |  |
         |  |-entry-date
         |  +-publication-date
The top attribute which subsumes all other attributes.
Subsumes attributes with direct regard to the document.
Information concerning the document content.
Full document text.
Information about the title of an document.
Title of the book.
Title of the article.
Title of the series.
Abstract of a document.
Descriptors of an document.
Title of the journal.
Information concerning the initiator.
The name of an author person.
The name of an editor person.
The name of a corporation or institution.
The name of a conference.
A publishing organization.
The name of a publishing organization.
The address of a publishing organization.
Date information.
The date a record is added to the database.
The date when a document has been published.
Additional information.
International standard serials number.
International standard book number.
Classification according to the ACM Computing Reviews Classification System.
Volume number of a journal.
Number of a journal/serials.
Edition of a document.

WAIS Database Schemas

WAIS database schemas reflect the conceptual level of an application. A schema of a WAIS database contains the available fields with their types. To do the query translation from the external level to the conceptual level there has to be a mapping of each field within a database schema onto one or more attributes of the predefined attribute lattice. For some databases follow these mappings and the schemas themselves.


Bibliographic references on information retrieval
        py     (numeric)       : publication-date
        au     (text, soundex) : author-name
        ti     (stemming)      : title
        cc     (text)          : crc
        jt     (stemming)      : journal-title
        vo     (numeric)       : volume
        no     (numeric)       : number
        global (text)          : keywords


Contents of the computer science department library at the University of Dortmund.
        jahr   (numeric)       : publication-date
        verf   (text, soundex) : initiator
        titel  (stemming)      : title
        verl   (text)          : publisher
        isbn   (text)          : isbn
        global (text)          : keywords


Table of contents of about 180 computer science journals.
        py     (numeric)       : publication-date
        au     (text,soundex)  : author-name
        ab     (stemming)      : abstract
        ti     (stemming)      : article-title
        vo     (numeric)       : volume
        no     (numeric)       : number
        jt     (stemming)      : journal-title
        ed     (numeric)       : entry-date
        global (text)          : keywords


Fuhr, Norbert (1996)
Object-Oriented and Database Concepts for the Design of Networked Information Retrieval Systems
Fuhr, Norbert; Gro?johann, Kai (1995)
Broker Architecture
G?vert, Norbert (1996)
Information Retrieval in vernetzten heterogenen Datenbanken

Norbert G?vert, January, 30 1998