Skip to main content
main-content

Inhaltsverzeichnis

Frontmatter

Invited Papers

Data Ring: Let Us Turn the Net into a Database!

Because of information ubiquity, one observes an important trend towards transferring information management tasks from database systems to networks. We introduce the notion of Data Ring that can be seen as a network version of a database or a content warehouse. A main goal is to achieve better performance for content management without requiring the acquisition of explicit control over information resources. We discuss the main traits of Data Rings and argue that Active XML provides an appropriate basis for such systems.

The collaborating peers that form the Data Ring are autonomous, heterogeneous and their capabilities may greatly vary, e.g., from a sensor to a large database. To support effectively this paradigm of loose integration, the Data Ring enforces a seamless transition between data and metadata and between explicit and intentional data. It does not distinguish between data provided by web pages and data provided by web services, between local (extensional) data and external data obtained via a Web service call. This is achieved using the Active XML technology that is based on exchanging XML documents with embedded service calls both for the logical and physical data model.

Serge Abiteboul, Neoklis Polyzotis

Future Data Management: “It’s Nothing Business; It’s Just Personal.”

Conventional data management occurs primarily in centralized servers or in well-interconnected distributed systems. These are removed from their end users, who interact with the systems mostly through static devices to obtain generic services around main-stream applications: banking, retail, business management, etc. Several recent advances in technologies, however, give rise to a new breed of applications, which change altogether the user experience and sense of data management. Very soon several such systems will be in our pockets, many more in our homes, the kitchen appliances, our clothes, etc. How would these systems operate? Many system and user aspects must be approached in novel ways, while several new issues come up and need to be addressed for the first time. Highlights include personalization, privacy, information trading, annotation, new interaction devices and corresponding interfaces, visualization, etc. In this talk, we take a close look at and give a very personal guided tour to this emerging world of data management, offering some thoughts on how the new technical challenges might be approached.

Yannis Ioannidis

Scalable Similarity Search in Computer Networks

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. Four scalable and distributed similarity search structures will be presented. By exploiting parallelism in a dynamic network of computers, they all achieve practically constant search time for similarity range or nearest neighbor queries in data-sets of arbitrary sizes. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets. Results are used to establish specific pros and cons of individual approaches in different situations.

Pavel Zezula

XML Databases and Semantic Web

An XML Algebra for XQuery

An XML algebra supporting the XQuery query language is presented. The usage of expression constructing operators instead of high-order operations using functions as parameters has permitted us to remain in the limits of first-order structures whose instance is a many-sorted algebra. The set of operators of the presented algebra substantially differs from the set of operators of relation algebra. It is caused by the complex nature of the XML data model comparing with relational one. Actually, only predicative selection is more or less same in both algebra. Yet, the XML algebra in addittion permits selection by node test. The relational projection operator is replaced by the path expression and navigating functions; the join operator is replaced by unnesting join expressions. In addition, a number of node constructing expressions permitting update of the algebra state are defined.

Leonid Novak, Alexandre Zamulin

Satisfiability-Test, Rewriting and Refinement of Users’ XPath Queries According to XML Schema Definitions

Writing correct and precise XPath queries needs much effort from users: the user must be familiar with the complex structure of the queried XML documents and has to compose queries, which must be syntactically and semantically correct and precise. Incorrect queries select no data and thus lead to highly inefficient processing of queries. Unprecise queries might select more data than what the user really wants and thus might lead to unnecessarily high processing and transportation costs. Therefore, we propose a schema-based approach to the satisfiability test and to the refinement of users’ XPath queries. Our schema-based approach checks whether or not an XPath query conforms to the constraints given in the schema, rewrites and refines the XPath query according to the information of the schema. If an XPath query does not conform to the constraints given in the schema, its results will be every time an empty node set, which is a hint for semantic errors in the XPath query. Our rewriting approach for XPath queries replaces wildcards with specific node tests, replaces recursive axes with non-recursive axes, eliminates reverse axes, and redundant location steps. Thus, our rewriting approach generates a query, which contains more information, and can be more easily refined by the user in comparison to the original query. Our performance analysis shows the optimization potential of avoiding the evaluation of unsatisfiable XPath queries and of processing rewritten and refined XPath queries.

Jinghua Groppe, Sven Groppe

X-Warehousing: An XML-Based Approach for Warehousing Complex Data

XML is suitable for structuring complex data coming from different sources and supported by heterogeneous formats. It allows a flexible formalism capable to represent and store different types of data. Therefore, the importance of integrating XML documents in data warehouses is becoming increasingly high. In this paper, we propose an XML-based methodology, named

X-Warehousing

, which designs warehouses at a logical level, and populates them with XML documents at a physical level. Our approach is mainly oriented to users analysis objectives expressed according to an

XML Schema

and merged with XML data sources. The resulted XML Schema represents the logical model of a data warehouse. Whereas, XML documents validated against the analysis objectives populate the physical model of the data warehouse, called the

XML cube

.

Omar Boussaid, Riadh Ben Messaoud, Rémy Choquet, Stéphane Anthoard

SDQNET: Semantic Distributed Querying in Loosely Coupled Data Sources

Web communities involve networks of loosely coupled data sources. Members in those communities should be able to pose queries and gather results from all data sources in the network, where available. At the same time, data sources should have limited restrictions on how to organize their data. If a global schema is not available for such a network, query processing is strongly based on the existence of (hard to maintain) mapping rules between pairs of data sources. If a global schema is available, local schemas of data sources have to follow strict modelling restrictions posed by that schema.

In this paper, we suggest an architecture to provide better support for distributed data management in loosely coupled data sources. In our approach, data sources can maintain diverse schemas. No explicit mapping rules between data sources are needed to facilitate query processing. Data sources can join and leave the network any time, at no cost for the community. We demonstrate our approach, describing SDQNET, a prototype platform to support semantic query processing in loosely coupled data sources.

Eirini Spyropoulou, Theodore Dalamagas

Materialized Views

Multi-source Materialized Views Maintenance: Multi-level Views

In many information systems, the databases that make up the system are distributed in different modules or branch offices according to the requirements of the business enterprise. In these systems, it is often necessary to combine the information of all the organization’s databases in order to perform analysis and make decisions about the global operation. This is the case of Data Warehouse Systems. From a conceptual point of view, a Data Warehouse can be considered as a set of materialized views which are defined in terms of the tables stored in one or more databases. These materialized views store historical data that must be maintained in either real time or periodically by means of batch processes. During the maintenance process the systems must perform selections, projections, joins, etc. that can affect several databases. This is a complex problem since making a join among several tables requires (at least temporarily) having the information from these tables in the same place. This requires the Data Warehouse to store auxiliary materialized views that in many cases contain duplicated information. In this article, we study this problem, and we propose a method that minimizes the duplicated information in the auxiliary materialized views and also reduces the response time of the system.

Josep Silva, Jorge Belenguer, Matilde Celma

Clustering-Based Materialized View Selection in Data Warehouses

Materialized view selection is a non-trivial task. Hence, its complexity must be reduced. A judicious choice of views must be cost-driven and influenced by the workload experienced by the system. In this paper, we propose a framework for materialized view selection that exploits a data mining technique (clustering), in order to determine clusters of similar queries. We also propose a view merging algorithm that builds a set of candidate views, as well as a greedy process for selecting a set of views to materialize. This selection is based on cost models that evaluate the cost of accessing data using views and the cost of storing these views. To validate our strategy, we executed a workload of decision-support queries on a test data warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when storage space is limited.

Kamel Aouiche, Pierre-Emmanuel Jouve, Jérôme Darmont

Non-blocking Materialized View Creation and Transformation of Schemas

In existing systems, user transactions get blocked during materialized view creation and non-trivial database schema transformations. Blocking user transactions is not an option in systems with high availability requirements. A non-blocking method to perform such tasks is therefore needed.

In this paper, we present a method for non-blocking creation of derived tables, suitable for highly available databases. These derived tables can be used to create materialized views and to transform the database schema. Modified versions of well-known crash recovery techniques are used, thus making the method easy to integrate into existing DBMSs. Because the involved tables are not locked, the derived table creation may run as a low priority background process. As a result, the process has little impact on concurrent user transactions.

Jørgen Løland, Svein-Olaf Hvasshovd

Database Modelling

Relationship Design Using Spreadsheet Reasoning for Sets of Functional Dependencies

Entity-Relationship and other common database modeling tools have restricted capabilities for designing a relationship of higher arity. Although a complete and unambiguous specification can be achieved by traditional functional dependencies for relational schemata, use of the traditional formal notation in practice is rare. We propose an alternative way: designing or surveying the properties of a non-binary relationship among object classes or attributes is considered by spreadsheet reasoning methods for functional dependencies. Another representation by the semilattice of closed attribute sets can also be used in parallel due to convenient conversion facilities.

János Demetrovics, András Molnár, Bernhard Thalheim

Modeling and Storing Context-Aware Preferences

Today, the overwhelming volume of information that is available to an increasingly wider spectrum of users creates the need for personalization. In this paper, we consider a database system that supports context-aware preference queries, that is, preference queries whose result depends on the context at the time of their submission. We use data cubes to store the associations between context-dependent preferences and database relations and OLAP techniques for processing context-aware queries, thus allowing the manipulation of the captured context data at different levels of abstractions. To improve query performance, we use an auxiliary data structure, called context tree, which indexes the results of previously computed preference-aware queries based on their associated context. We show how these cached results can be used to process both exact and approximate context-aware preference queries.

Kostas Stefanidis, Evaggelia Pitoura, Panos Vassiliadis

An Integrated Framework for Meta Modeling

Meta modeling is an essential means to systematize, formalize, standardize, integrate, analyze and compare models, techniques, methods and tools. Numerous fields, such as databases, software engineering, software architectures, semantic web, computer-aided tools and method engineering, have benefited from it. The importance of meta modeling is ever increasing along with the emergence of novel approaches, architectures, techniques and languages based on UML and MDA. This paper presents a framework to integrate and compare divergent conceptions of meta modeling in databases, software engineering, and information systems development. The framework is applied to analyze and compare conceptions of meta levels in the literature.

Mauri Leppänen

Implementation of UNIDOOR, a Deductive Object-Oriented Database System

This paper proposes the

DJR

approach for implementing deductive object-oriented database systems(DOOD). This technique is based on classifying DOOD features into three abstract implementation levels. The classified features are then delegated to the

DJR suite

, which is built around the

Data Model, Java

and

Relational

components. The use of the Java virtual machine (JVM) provides essential object-oriented features that were hard to implement and maintain. The implementation of many critical database management features is delegated to the relational back-end. As a result, only a minimal implementation effort is needed to build a very complex system. The

DJR

approach was used to implement our DOOD system

UNIDOOR

. The system was successfully and rapidly built and it supports essential object-oriented features along with the major database management features which were hard to implement in previous DOOD prototypes.

Mohammed K. Jaber, Andrei Voronkov

Web Information Systems and Middleware

Preloading Browsers for Optimizing Automatic Access to Hidden Web: A Ranking-Based Repository Solution

As Web applications grow in terms of quantity and quality, different vertical solutions could make use of them as an important source of information. Nevertheless, obtaining information from web sources becomes a challenging issue because of their complex access due to the hypertext browsing paradigm, and HTML’s semistructured format. Web Automation middleware navigates through web links and fills web forms in an automatic way, so to extract information from the Hidden Web. The main optimization parameter is the time required to navigate through the intermediate pages that lead to the desired results. This work proposes a technique which focuses on improving the browsing time by storing information from previous queries, and using it to preload an adequate subset of the navigational sequence on a specific browser, before the next sequence is launched. It also takes into account the most commonly used sequences, being the ones to be preloaded more often.

Justo Hidalgo, Alberto Pan, José Losada, Manuel Álvarez

A Middleware-Based Approach to Database Caching

Database caching supports declarative query processing close to the application. Using a full-fledged DBMS as cache manager, it enables the evaluation of specific project-select-join queries in the cache. In this paper, we propose significant improvements and optimizations – as compared to the well-known DBCache approach – that make our caching concept truly adaptive. Furthermore, we describe an adaptive constraint-based cache system (ACCache) relying on middleware components as a DBMS-independent realization of this approach.

Andreas Bühmann, Theo Härder, Christian Merker

Integrating Caching Techniques on a Content Distribution Network

Web caching and replication tune capacity with performance and they have become essential components of the Web. In practice, caching and replication techniques have been applied in proxy servers and Content Distribution Networks (CDNs) respectively. In this paper, we investigate the benefits of integrating caching policies on a CDN’ s infrastructure. Using a simulation testbed, our results indicate that there is much room for performance improvement in terms of perceived latency, hit ratio and byte hit ratio. Moreover, we show that the combination of caching with replication fortifies CDNs against flash crowd events.

Konstantinos Stamos, George Pallis, Athena Vakali

Interactive Discovery and Composition of Complex Web Services

Among the most important expected benefits of a global service oriented architecture leveraging web service standards is an increased level of automation in the discovery, composition, verification, monitoring and recovery of services for the realization of complex processes. Most existing works addressing this issue are based on the Ontology Web Language for Services (OWL-S) and founded on description logic. Because the discovery and composition tasks are designed to be fully automatic, the solutions are limited to the realization of rather simple processes. To overcome this deficiency, this paper proposes an approach in which service capability descriptions are based on full first order predicate logic and enable an interactive discovery and composition of services for the realization of complex processes. The proposed approach is well suited when automatic service discovery does not constitute an absolute requirement and the discovery can be done interactively (semi-automatically) with human expert intervention. Such applications are, for instance, often met in e-science. The proposed approach is an extension and adaptation of the compositional information systems development (CISD) method based on the SYNTHESIS language and previously proposed by some of the authors. The resulting method offers a canonical extensible object model with its formal automatic semantic interpretation in the Abstract Machine Notation (AMN) as well as reasoning capabilities applying AMN interactively to the discovery and composition of web services.

Sergey Stupnikov, Leonid Kalinichenko, Stephane Bressan

Query Processing and Indexing

Efficient Processing SAPE Queries Using the Dynamic Labelling Structural Indexes

There are a variety of structural indexes which have been proposed to speed up path expression queries over XML data. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. In most of current structural indexes, the nodes in the same partition have the same label. They are not flexible with queries containing the wild- or alternation cards, and sometimes their size is bigger than the necessity.

In this paper, we introduce the

dynamic labelling

structural indexes. These structural indexes only support a set of frequently used simple alternation path expressions (

SAPE

for short), where expressions may contain wild- or alternation cards. The labels of data nodes in the same partition may be different. The dynamic labelling not only decreases the size of the structural index, but also supports

SAPE’s

better. Every static labelling structural index can be improved by using dynamic labelling. Because of the limitation, in this paper we just study the

DL-1

-index improved from the 1-index, and the

DL-A*(k)

-index improved from the

A

(

k

)-index. The construction and refinement of these indexes are based on our results from the properties of partitions and the

split

operation. Our experiments show that the size of the improved dynamic labelling structural indexes is smaller and the query processing on these indexes is more efficient comparing to the naive ones.

Attila Kiss, Vu Le Anh

ICB-Index: A New Indexing Technique for Continuous Time Sequences

Various application domains require databases to store time sequences. Very often time sequences describe some continuous processes at discrete time points. Many applications require queries to take into consideration not only explicit values of time sequences, but also the values of the processes represented by them (these values can be derived from explicit values by user-defined interpolation functions). For example, a user of industrial process control system may ask the following query: "Find those time intervals during which specified physical value, represented by a series of measurements, was greater than given limit value". We show that conventional secondary indexes are not suitable to support such queries. We also investigate the properties of IP-index – the first index structure supporting queries on time sequences taking into account the interpolation (so-called "queries on continuous time sequences"). We show that IP-index improves the performance of such queries, but its size is enormously big for many real-life sequences. This fact makes it nearly impossible to use IP-index in some application domains. In this paper we present a new indexing technique to support queries on continuous time sequences – ICB-index. ICB-index makes the performance of such queries as high as IP-index does, but it requires substantially less space than IP-index. The effectiveness of ICB-index is verified by experiments on sensor-generated time sequences from a power plant.

Dmitry V. Maslov, Andrew A. Sidorov

Multiple k Nearest Neighbor Query Processing in Spatial Network Databases

This paper concerns the efficient processing of multiple

k

nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries for points of interest that are accessible via the road network. Given multiple

k

nearest neighbor queries, the paper proposes progressive techniques that selectively cache query results in main memory and subsequently reuse these for query processing. The paper initially proposes techniques for the case where an upper bound on

k

is known a priori and then extends the techniques to the case where this is not so. Based on empirical studies with real-world data, the paper offers insight into the circumstances under which the different proposed techniques can be used with advantage for multiple

k

nearest neighbor query processing.

Xuegang Huang, Christian S. Jensen, Simonas Šaltenis

Searching for Similar Trajectories on Road Networks Using Spatio-temporal Similarity

In order to search similar moving object trajectories, the previously used methods focused on Euclidean distance and considered only spatial similarity. Euclidean distance is not appropriate for road network space, where the distance is limited to the space adjacent to the roads. In this paper, we consider the properties of moving objects in road network space and define temporal similarity as well as spatio-temporal similarity between trajectories based on POI (Points of Interest) and TOI (Times of Interest) on road networks. Based on these definitions, we propose methods for searching for similar trajectories in road network space. Experimental results show the accuracy of our methods and the average search time in query processing.

Jung-Rae Hwang, Hye-Young Kang, Ki-Joune Li

Efficient and Coordinated Checkpointing for Reliable Distributed Data Stream Management

Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the combination of stream operators, which may run on different distributed devices, into stream processes. Due to the recent advantages in sensor technologies and wireless communication, DSM is increasingly gaining importance in various application domains. Especially in healthcare, the continuous monitoring of patients at home (telemonitoring) can significantly benefit from DSM. A vital requirement in telemonitoring is however that DSM provides a high degree of reliability. In this paper, we present a novel approach to efficient and coordinated stream operator checkpointing supporting reliable DSM while maintaining the high result quality needed for healthcare applications. Furthermore, we present evaluation results of our checkpointing approach implemented within our process and data stream management infrastructure OSIRIS-SE. OSIRIS-SE supports flexible failure handling and efficient and coordinated checkpointing by means of consistent operator migration. This ensures complete and consistent continuous data stream processing even in the case of failures.

Gert Brettlecker, Heiko Schuldt, Hans-Jörg Schek

Data Mining and Clustering

Towards Automatic Eps Calculation in Density-Based Clustering

Many real-life applications use various kinds of clustering algorithms. Very popular and interesting are applications dealing with spatial data, like on-line map services or traffic tracking systems. A very important branch of spatial systems is telemetry. Our current research is focused on providing an efficient caching structure that will accelerate spatial queries evaluation and improve the ways of storing and processing aggregates. We use a density-based clustering algorithm to create the structure levels. The used clustering algorithm is fast and efficient but it requires a user-defined

Eps

parameter. As we cannot get the

Eps

parameter from the user for every level of the structure, we propose an Automatic

Eps

Calculation (AEC) algorithm which, based on the points distribution characteristics, is able to estimate the

Eps

parameter value. The algorithm is not limited to the telemetry-specific data and can be applied to any set of points located in a two-dimensional space. We describe in detail the algorithm operation, test results and possible algorithm improvements.

Marcin Gorawski, Rafal Malczok

Symbolic Music Genre Classification Based on Note Pitch and Duration

This paper presents a music genre classification system that relies on note pitch and duration features, derived from their respective histograms. Feature histograms provide a simple but yet effective classifier for the purposes of genre classification in intra-classical genres such as sonatas, fugues, mazurkas, etc. Detailed experimental results illustrate the significant performance gains due to the proposed features, compared to existing baseline features.

Ioannis Karydis

PPPA: Push and Pull Pedigree Analyzer for Large and Complex Pedigree Databases

In this paper we introduce a novel push and pull technique to analyze pedigree data. We present the Push and Pull Pedigree Analyzer (PPPA) to organize large and complex pedigrees and investigate the development of genetic diseases. PPPA receives as input a pedigree (ancestry information) of different families. For each person the pedigree contains information about the occurrence of a specific genetic disease. We propose a new solution to arrange and visualize the individuals of the pedigree based on the relationships between individuals and information about the disease. PPPA starts with random positions of the individuals, and iteratively pushes apart non-relatives with opposite diseases patterns and pulls together relatives with identical disease patterns. The goal is a visualization that groups families with homogeneous disease patterns.

We investigate our solution experimentally with genetic data from peoples from South Tyrol, Italy. We show that the algorithm converges independent of the number of individuals

n

and the complexity of the relationships. The runtime of the algorithm is super-linear wrt

n

. The space complexity of the algorithm is linear wrt

n

. The visual analysis of the method confirms that our push and pull technique successfully deals with large and complex pedigrees.

Arturas Mazeika, Janis Petersons, Michael H. Böhlen

Discovering Emerging Topics in Unlabelled Text Collections

As document collections accummulate over time, some of the discussion subjects in them become outfashioned, while new ones emerge. Then, old classification schemes should be updated. In this paper, we address the challenge of finding emerging

and persistent

“themes”, i.e. subjects that live long enough to be incorporated into a taxonomy or ontology describing the document collection. We focus on the identification of cluster labels that “survive” changes in the constitution of the underlying population of documents, including changes in the feature space of dominant words, because the terminology of the document archive also changes over time. We have conducted a set of promising experiments on the identification of themes that manifested themselves in section H2.8 of the ACM digital library and juxtapose them with the classes foreseen in the ACM taxonomy for this section.

Rene Schult, Myra Spiliopoulou

Modelling and Desing Issues

Computational Database Technology Applied to Option Pricing Via Finite Differences

Computational database technology spans the two research fields data-base technology and scientific computing. It involves development of database capabilities that support computational-intensive applications found in science and engineering. This includes support for representing and processing of mathematical models within the database environment without any significant performance loss compared to conventional implementations.

This paper describes how an existing database management system, AMOS II, is extended with capabilities to solve the Black–Scholes equation commonly used in option pricing. The numerical method used is finite differences, and a flexible database framework that can deal with complex mathematical objects and numerical methods is created. We describe how computational data representations and operations are adapted to the database management system and the approach is evaluated with respect to performance, extensibility, and ease of use.

Jöns Åkerlund, Krister Åhlander, Kjell Orsborn

A Framework for Merging, Repairing and Querying Inconsistent Databases

This paper presents a framework for merging, repairing and querying inconsistent databases in the presence of functional dependencies and foreign key constraints and investigates the problem related to the satisfaction of general integrity constraints in the presence of null values. In more details, the approach consists in i) merging the source databases to reduce the set of tuples inconsistent with respect to the constraints defined by the primary keys, ii) repairing the integrated database with respect to functional dependencies and foreign key constraints, and iii) computing consistent answers over repaired database. This paper presents a system prototype,

Rainbow

, developed at the University of Calabria, implementing the proposed framework. The system receives in input an integration operator and a query and outputs the answer to the query. The system currently implements many of the integration operators proposed in the literature.

Luciano Caroprese, Ester Zumpano

An On-Line Reorganization Framework for SAN File Systems

While the cost per megabyte of magnetic disk storage is economical, organizations are alarmed by the increasing cost of managing storage. Storage Area Network (SAN) architectures strive to minimize this cost by consolidating storage devices. A SAN is a special-purpose network that interconnects different data storage devices with servers. While there are many definitions for a SAN, there is a general consensus that it provides access at the granularity of a block and is typically used for database applications.

In this study, we focus on SAN switches that include an embedded storage management software in support of virtualization. We describe an On-line Re-organization Environment, ORE, that controls the placement of data to improve the average response time of the system. ORE is designed for a heterogeneous collection of storage devices. Its key novel feature is its use of “time” to quantify the benefit and cost of a migration. It migrates a fragment only when its net benefit exceeds a pre-specified threshold. We describe a taxonomy of techniques for fragment migration and employ a trace driven simulation study to quantify their tradeoff. Our performance results demonstrate a significant improvement in response time (order of magnitude) for those algorithms that employ ORE’s cost/benefit feature. Moreover, a technique that employs bandwidth of all devices intelligently is superior to one that simply migrates data to the fastest devices.

Shahram Ghandeharizadeh, Shan Gao, Chris Gahagan, Russ Krauss

Towards Multimedia Fragmentation

Database fragmentation is a process for reducing irrelevant data accesses by grouping data frequently accessed together in dedicated segments. In this paper, we address multimedia database fragmentation by extending existing fragmentation algorithms to take into account key characteristics of multimedia objects. We particularly discuss multimedia primary horizontal fragmentation and provide a partitioning strategy based on low-level multi-media features. Our approach particularly emphasizes the importance of multimedia predicates implications in optimizing multimedia fragments. To validate our approach, we have implemented a prototype computing multimedia predicates implications. Experimental results are satisfactory.

Samir Saad, Joe Tekli, Richard Chbeir, Kokou Yetongnon

Content Is Capricious: A Case for Dynamic System Generation

Database modeling is based on the assumption of a high regularity of its application areas, an assumption which applies to both the structure of data and the behavior of users. Content modeling, however, is less strict since it may treat one application entity substantially differently from another depending on the instance at hand, and content users may individually add descriptive or interpretive aspects depending on their knowledge and interests. Therefore, we argue that adequate content modeling has to be

open

to changes, and content management systems have to react to changes

dynamically

, thus making content management a case for dynamic system generation.

In our approach, openness and dynamics are provided through a

compiler framework

which is based on a conceptual model of the application domain. Using a

conceptual modeling language

users can openly express their views on the domain’s entities. Our compiler framework dynamically generates the components of an according software system. Central to the compiler framework is the notion of generators, each generating a particular module for the intended application system. Based on the resulting

modular architecture

the generated systems allow personalized model definition and seamless model evolution.

In this paper we give details of the system modules and describe how the generators which create them are coordinated in the compiler framework.

Hans-Werner Sehring, Sebastian Bossung, Joachim W. Schmidt

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise