SFgate 5.0

SFgate and Heterogeneous Databases


Contents


Heterogeneous Databases

If there is the need to query more than one WAIS database in parallel the one setting up a query interface may be lucky and the fields available in the different databases coincide exactly in each of the databases. But the more common case is that there are different fields in different databases. The schemas of the databases differ.

In most cases this is due to different types of the documents stored in the databases, e.g. the one database holds references to literature while another one holds product descriptions.

For databases with documents not differing to much in their types we present a means to deal with the heterogeneity of the database schemas. SFgate 5.0 is suited to handle heterogeneous databases holding references to literature, like e.g. articles, books, reports.


Mapping of Attributes

The problem with heterogeneous database schemas is to get a translation of queries referring to the external schemas to a query formulation referring to the conceptual schema of each database to query.

The external schema is manifested in a query form, where the user may have the possibility to choose one or more databases to query, and - more crucial - a set of attributes which can be used to formulate a query. There may exist several input fields referring to some of these attributes, but in general the user should have the possibility to use any of the predefined attribute set to formulate his query.

The conceptual level of a database is given by the database schema, i.e. by the fields and their types existing in the database.

Now the task for each database to query is to take the attributes from a given query and map them on the most suitable attributes being part of the database schema.

A simple (but insufficient) solution is to rename the attributes within the databases to those used in queries. But this is sufficient only if all databases include the same attributes (on the semantic level) as used within the external level to formulate a query. So strictly speaking this solution does not deal with heterogeneity.

The solution implemented in SFgate 5.0 is based on a predefined set of attributes for the external level. To do a mapping of attributes there has to be knowledge of how the attributes are related to each other. So we introduced a lattice on these attributes which reflects the specialization relationships between the attributes.

With means of this lattice the mapping process for attributes can be defined with the four operations equality, specialization, generalization and ignorance in the following order:

Equality
If a query condition refers to a lattice attribute, which is in the schema of the database to query, no mapping is done. The original query condition is taken to be part of the translated query.
Specialization
If a query condition refers to a lattice attribute which is not in the database to query, but there are more special attributes within the database schema, the mapping is done by generating one new query conditions for each of the more special attributes. This is done by replacing the original attribute from the external level with the more special one of the conceptual level. The new query conditions (if there are more than one) are connected with the Boolean OR to one new query condition as part of the translated query.
Generalization
If a query condition refers to an lattice attribute which is not in the database to query and there aren't any attributes being more special within the schema of the database to query, the attribute is mapped on the nearest more general attribute which is part of the database schema.
Ignorance
If neither equality nor specialization nor generalization yielded a translation of the original query condition the whole condition is ignored for the translated query of the actual database.

In future the mapping process will be refined by various means:


Predefined Attribute Lattice

The set of attributes used within the lattice is mainly taken from the Scientific and Technical Attribute Set (STAS) hold by CNIDR. STAS defines standard identifiers for referring to searchable and retrievable fields within scientific and technical databases.

KL-ONE defines a diffs operator which allows for inheritance on attributes. Using the diffs construct a specialization hierarchy has been introduced on a (small) subset of the STAS attributes:

        TOP
         |
         |-keywords
         |  |
         |  |-content
         |  |  |
         |  |  +-full-text
         |  |  |  |
         |  |  |  +-title
         |  |  |  |  |
         |  |  |  |  |-book-title
         |  |  |  |  |-article-title
         |  |  |  |  +-series-title
         |  |  |  |
         |  |  |  |-abstract
         |  |  |  +-subject-descriptor
         |  |  |
         |  |  +-journal-title
         |  |  
         |  |-initiator
         |  |  |
         |  |  |-author-name
         |  |  |-editor-name
         |  |  |-corporate
         |  |  +-conference
         |  |
         |  +-publisher
         |     |
         |     |-publisher-name
         |     +-publisher-address
         |
         |-date
         |  |
         |  |-entry-date
         |  +-publication-date
         |
         +-meta
            |
            |-issn
            |-isbn
            |-crc
            |-volume
            |-number
            +-edition
      
TOP
The top attribute which subsumes all other attributes.
keywords
Subsumes attributes with direct regard to the document.
content
Information concerning the document content.
full-text
Full document text.
title
Information about the title of an document.
book-title
Title of the book.
article-title
Title of the article.
series-title
Title of the series.
abstract
Abstract of a document.
subject-descriptor
Descriptors of an document.
journal-title
Title of the journal.
initiator
Information concerning the initiator.
author-name
The name of an author person.
editor-name
The name of an editor person.
corporate
The name of a corporation or institution.
conference
The name of a conference.
publisher
A publishing organization.
publisher-name
The name of a publishing organization.
publisher-address
The address of a publishing organization.
date
Date information.
entry-date
The date a record is added to the database.
publication-date
The date when a document has been published.
meta
Additional information.
issn
International standard serials number.
isbn
International standard book number.
crc
Classification according to the ACM Computing Reviews Classification System.
volume
Volume number of a journal.
number
Number of a journal/serials.
edition
Edition of a document.

WAIS Database Schemas

WAIS database schemas reflect the conceptual level of an application. A schema of a WAIS database contains the available fields with their types. To do the query translation from the external level to the conceptual level there has to be a mapping of each field within a database schema onto one or more attributes of the predefined attribute lattice. For some databases follow these mappings and the schemas themselves.

bibdb

Bibliographic references on information retrieval
        py     (numeric)       : publication-date
        au     (text, soundex) : author-name
        ti     (stemming)      : title
        cc     (text)          : crc
        jt     (stemming)      : journal-title
        vo     (numeric)       : volume
        no     (numeric)       : number
        global (text)          : keywords
      

BI

Contents of the computer science department library at the University of Dortmund.
        jahr   (numeric)       : publication-date
        verf   (text, soundex) : initiator
        titel  (stemming)      : title
        verl   (text)          : publisher
        isbn   (text)          : isbn
        global (text)          : keywords
      

journals

Table of contents of about 180 computer science journals.
        py     (numeric)       : publication-date
        au     (text,soundex)  : author-name
        ab     (stemming)      : abstract
        ti     (stemming)      : article-title
        vo     (numeric)       : volume
        no     (numeric)       : number
        jt     (stemming)      : journal-title
        ed     (numeric)       : entry-date
        global (text)          : keywords
      

References

Fuhr, Norbert (1996)
Object-Oriented and Database Concepts for the Design of Networked Information Retrieval Systems
Fuhr, Norbert; Gro?johann, Kai (1995)
Broker Architecture
G?vert, Norbert (1996)
Information Retrieval in vernetzten heterogenen Datenbanken


Norbert G?vert, January, 30 1998