DELOS:JPA 2 - WP7
From Wiki
Contents
|
A Digital Library Testbed Framework for the Evaluation of Architectures, Services and Execution Dynamics
Objective of the proposal
| Most DL evaluations use specific systems, which are difficult to compare. The aim of this effort is to provide a standard testbed framework for comparative evaluation of DL systems. Based on a theoretical framework for DL evaluation, we will develop a framework system that can be easily adopted for new application domains or extended by new services. For analyzing applications of this framework as well as for comparison with other systems, we will develop a standard event model of DL services along with a logging standard and corresponding evaluation tools. The feasibility of our approach will be demonstrated in at least 2 different settings. The full proposal is availiable at: JPA2 proposal |
The Evaluation Framework
Description of used Models
Challenges of Evaluation Digital Libraries (Saracevic/Covi)
A system can be considered a set of elements in interaction. A human-made system, such as a digital library, has an added aspect: it has certain objective(s). The elements, or the components, interact to perform certain functions, or processes, to achieve objectives. Furthermore, any system (digital libraries included) exists in an environment or environments (which can also be thought of as systems), and interacts with its environments. It is difficult and even arbitrary to set the boundaries of a system. In evaluation of digital libraries, as in evaluation of any system or process, these difficult questions arise that clearly affect the results: Where does a digital library under evaluation begin? Where does it end? What are the boundaries? What to include? What to exclude? This sets the questions of determining the construct of digital libraries, as discussed below.
In this context, by evaluation we mean an appraisal of the performance or functioning of a system or part thereof, in relation to some objective(s). The performance can be evaluated as to
- Effectiveness: How ell does a system (or any of its parts) perform that for which it was designed?
- Efficency: At what cost? (Costs cound by financial, time or effort)
An evaluation has to specify which of these will be evaluated. From now on, we will mostly discuss evaluation of effectiveness, with a realization that at any evaluation efficiency, and cost-effectiveness can be involved as well. this sets the question of the criteria of evaluation for digital libraries, as discussed below.
As in all systems, objectives occur in hierarchies, and there may be several heirarchies representing different levels - sometimes even in conflict. Shile the objectives may be explicity stated or implicitly derived or assumed, they have to be reflected in an evaluation. Evaluation is not one fixed thing. For the same system, evaluation can be done on different levels, in relation to different choices of objectives, using a variety of methods, and it can be oriented toward different goals and audiences.
To be considered as an evaluation, any evaluation has to meet certain requirements. it must involve selections and decisions related to.
- Construct - What to evaluate? What is actually meant by a digital library? What ist encompassed? What elements (components, parts, processes, etc.) to involve in evaluation?
- Context - selection of a goal, framework, viewpoint or level(s) of evaluation. What is the level of evaluation? What is critical for a selected level? Ultimately: What objective(s) to select for what level?
- Criteria - reflecting performance as related objectives. What parameters of performance to concentrate on? What dimension of characteristic to evaluate?
- Measures - reflecting selected criteria to record the performance. What specific measure(s) to use for a given criterion?
- Methodology - What measuring instruments to use? What samples? What procedures to use for data collection? For data analysis?
Onion Model
The Onion Model illustrates the layers of work that are addresses in CSE analysis of users' requirements for information systems. Each of these layers is addresses through a number of heuristic tools. The layers of work are:
- Means-ends analysis
- Analysis of work organization
- Task situations
- Decision tasks
- Mental strategies
- Actors' knowledge, resources and skills
The aim of analysis by the Onion model 1 is to reach an understanding of the system of work, which will then shape the basis for design and/or for planning evaluation and testing of an information system.
As can be seen, Onion model 2 consists of the same layers to address the perspective of users' work with prototype systems. However, for testing and empirical evaluation of prototypes, the evaluator will start from the core of the Onion: the actores' or users' knowledge, resources and skills. If, for instance, the users so net understand the options/affordances in the interface, the prototype will need refinement before the evaluation and testing can proceed to analysing how the prototype supports the users' mental strategies, decision tasks etc.
...
Interaction Triptych Framework
The Interaction Triptych Framework is a descriptive framweork that indicates the main constructs of an evaluation procedure, as well as their associations. The interaction axes represent three main evaluation foci, which are discussed extensively in the literature. Interaction Triptych Framework pays special attention to the relations between the components of a digital library, i.e. the relations User-Collection, Collection-Technology and User-Technology. The Collection-Technology pair is related to the performance attributes, the User-Technology pair is related to usability aspects, and the User-Collection pair is related to usefulness aspects.
Evaluation Computer - A Model for structuring evaluation activities
The evaluation computer is a systematic (faceted) approach for the description and analysis of digital library evaluation activities. This model is able to provide insights about the distance between different evaluation procedures and to locate white spots. Further the Evaluation Computer provides the "big picture" of the evaluation activity, as well as the context (horizontal direction), while the Interaction Framework goes further to suggest specific metrics (vertical direction).
The two models are connected at the following points:
- Supplement each other at the User, Content and System aspects of the Evaluation Computer and the corresponding components in the Interaction Framework.
- Complete the description of digital library application by demonstrating the contextual factors (Organizational aspects).
Standardized XML schema for Digital Library Logging
Evaluation tools based on Digital Library Logging
Stakeholder
General Interests
There are general statistics that are of interest to all stakeholders. The most common of these is interest in simply viewing the logged sessions of users of a DL. For this purpose, Daffodil provides a special tool that allows any stakeholder to visualize and analyze user behavior in a graph view (depicted in Figure 3) or in a tree view (depicted in Figure 4). This tool can also be used by end-users to analyse their own search tactics and to look for open or missed search directions that could be pursued. Several general questions are of interest to all stakeholders:
- When and how long is the DL used?
- What services are used, how often and for how long?
- Who uses the DL?
Although strict rights management should be instituted to prevent misuse of logging data, this is of particular importance when attempting to identify the characteristics of DL end-users.
System Owners
The management of a DL system may not only have monetary interests, but may also be interested in collecting user and other statistics in order to establish and execute a business plan for the DL system. For example, one concern is the need to develop access and acquisitions policies that will encourage increased usage. The primary questions of interest to management that can be addressed by analysis of logging data include:
- How many users access the DL system in what time period?
- How many DL objects are delivered or sold?
- Which objects are most requested and which of these are not available?
Developers
The developers of a DL system are usually keen to provide a rich set of services in a user-friendly manner. From a system perspective, these services should be effective, efficient and error free. In addition, users should be supported in their attempts to achieve their information goals. Thus developers need information about the system that includes:
- When is the DL system used?
- How long is the DL used?
- Which services of the DL are used?
- How many users canceled a specific service?
- Which services are flawed?
- Are response times satisfactory?
Content Providers
Content providers frequently need to assess the way users manipulate information objects in order to mine current patterns of usage; based on this information, they can decide about necessary revisions of collection or pricing policies. Although content providers may focus on events that indicate inspection and storage of information objects, other research questions may also be posed. For example:
- What formats are most frequently used?
- What levels of information are preferred (e.g., citations, abstracts, full texts)?
- What levels of access are preferred (e.g., open, authorized)?
- What items of information, in terms of titles or subject categories, are preferred?
Libarians
The integrated visualization tools allow even inexperienced users of log files to understand crucial events and user tendencies that affect the performance of their activities. For librarians, these activities will include reference services as well as instructional programs [4]. Typical questions for librarians are:
- What classes of users exist?
- From where do they gain access (e.g., in the library, on campus, from home)?
- What are the predominant patterns of use?
- Do these patterns of use indicate progress in the development of information skills and/or growing understanding of appropriate search behaviours?
- Are users communicating with library personnel and to what degree?
End Users
Log analysis may provide suggestions for users in need of help. Transaction logs may also remind the user of certain work patterns she had previously performed. Collaborative filtering and recommendation tools also find collected log data valuable. Besides documents, also work patterns and result screens may be recommended to a user based on logging data.
Researcher
The experimental framework of Daffodil , including the logging scheme, gives researchers a baseline and a powerful toolbox for evaluation work. For example, at the system level, the efficiency of algorithms or the appropriateness of DL architectures can be evaluated and compared. Research on usability, including the GUI development, will concentrate on both the HCI level and the system levels.
Use Cases
Statistics
- Times - frequences about year, month, week, hour
- Involved Users
- Involved Systems - to analyse more than one system or parts of a system
- Events - frequences about most throwed events
- Services - frequences about most used services
- Terms - most used terms in a query
- Collections - most used collections
- Store Targets - what storing targets has been used (internal, external, printer)?
- Inspected Objects - what are the favor docs?
- Browsemethods - what's the most used browse method (scrolling, moving, etc.)?
- Visualizemethods - what's the most visualize method referring to Shneiderman?
- Communicate - what type of communication has been used in the system?
- Help - how often and what kind of help has been used
Complex Use Cases
- User Behavior - Which path was used by a user at session 1 and session 67?
- Metrics:
- # sessions
- # included events
- included eventtypes
- order of events
- Metrics:
- Resultlist Visualizer - What are the most effective visual methods?
- Metrics:
- # inspected objects
- # stored objects
- time
- Metrics:
- Canceled Events - What are the most canceled events? At what time?
- Metrics:
- # canceled events
- event duration(start application -> cancel)
- Metrics:
- Collaboration - Does collaboration provide better results by the user?
- Metrics:
- # successful tasks
- time
- Metrics:
- Search vs. Navigate - Does a navigate paradigm provide better results than individual formed queries?
- Metrics:
- # search events
- # navigate events
- # inspected/stored objects
- time
- Metrics:
Current Activities & Plans
Evaluations
Ideas
Ideas: Possible Evaluations with Daffodil in Computer Science
Meetings
- Meeting in Duisburg in June 2005
- Meeting at ECDL September 2005 Vienna
- Meeting in Budapest on 9th-10th of January 2006
- Meeting via Skype conference call on 28th of November 2006
