Skip to main content

2005 | Buch

Data Management in a Connected World

Essays Dedicated to Hartmut Wedekind on the Occasion of His 70th Birthday

herausgegeben von: Theo Härder, Wolfgang Lehner

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Data management systems play the most crucial role in building large application s- tems. Since modern applications are no longer single monolithic software blocks but highly flexible and configurable collections of cooperative services, the data mana- ment layer also has to adapt to these new requirements. Therefore, within recent years, data management systems have faced a tremendous shift from the central management of individual records in a transactional way to a platform for data integration, fede- tion, search services, and data analysis. This book addresses these new issues in the area of data management from multiple perspectives, in the form of individual contributions, and it outlines future challenges in the context of data management. These contributions are dedicated to Prof. em. Dr. Dr. -Ing. E. h. Hartmut Wedekind on the occasion of his 70th birthday, and were (co-)authored by some of his academic descendants. Prof. Wedekind is one of the most prominent figures of the database management community in Germany, and he enjoys an excellent international reputation as well. Over the last 35 years he greatly contributed to making relational database technology a success. As far back as the early 1970s, he covered—as the first author in Germany— the state of the art concerning the relational model and related issues in two widely used textbooks “Datenbanksysteme I” and “Datenbanksysteme II”. Without him, the idea of modeling complex-structured real-world scenarios in a relational way would be far less developed by now. Among Prof.

Inhaltsverzeichnis

Frontmatter

MOTIVATION AND MODELING ISSUES

Frontmatter
Databases: The Integrative Force in Cyberspace
Abstract
Database technology has come a long way. Starting from systems that were just a little more flexible than low-level file systems, they have evolved into powerful programming and execution environments by embracing the ideas of data independence, non-procedural query languages, extensible type systems, automatic query optimization (including parallel execution and load balancing), automatic control of parallelism, automatic recovery and storage management, transparent distributed execution–to just name a few. Even though database systems are (today) the only systems that allow normal application programmers to write programs that will be executed correctly and safely in a massively parallel environment on shared data, database technology is still viewed by many people as something specialized to large commercial online applications, with a rather static design, something substantially different from the “other” IT components. More to the point: Even though database technology is to the management of persistent data what communication systems are to message-based systems, one can still find many application developers who pride themselves in not using databases, but something else. This is astounding, given the fact that, because of the dramatic decrease in storage prices, the amount of data that needs to be stored reliably (and retrieved, eventually) is growing exponentially–it’s Moore’s law, after all. And what is more: Things that were thought to be genuinely volatile until recently, such as processes, turn into persistent objects when it comes to workflow management, for example.
The paper argues that the technological evolution of database technology makes database systems the ideal candidate for integrating all types of objects that need persistence one way or the other, supporting all the different types of execution that are characteristic of the various application classes. If database systems are to fulfill this integrative role, they will have to adapt to new roles vis-a‘-vis the other system components, such as the operating system, the communication system, the language runtime environment, etc. but those developments are under way as well.
Andreas Reuter
Federating Location-Based Data Services
Abstract
With the emerging availability of small and portable devices which are able to determine their position and to communicate wirelessly, mobile and spatially-aware applications become feasible. These applications rely on information that is bound to locations and managed by so-called location-based data services. Large-scale location-based systems have to cope efficiently with different types of data (mostly spatial or conventional). Each type poses its own requirements to the data server that is responsible for management and provisioning of the data. In addition to efficiency, it is overly important to provide for a combined and integrated usage of that data by the applications.
In this paper we discuss various basic technologies to achieve a flexible, extensible, and scalable management of the context model and its data organized and managed by the different data servers. Based on a classification of location-based data services we introduce a service-oriented architecture that is built on a federation approach to efficiently support location-based applications. Furthermore, we report on the Nexus platform that realizes a viable implementation of that approach.
Bernhard Mitschang, Daniela Nicklas, Matthias Grossmann, Thomas Schwarz, Nicola Hönle
An Agent-Based Approach to Correctness in Databases
Abstract
When defining the schema of a relational database, integrity constraints are included to describe simple syntactic constraints of correctness that can easily be tested in a centralized way when tuples are inserted, deleted or updated. Complex dependencies may exist between different tuples of a relation. The description of them can be difficult with current formalisms. An example for such an inconsistency is the problem of duplicates, the existence of different tuples describing the same real world entity. Duplicates can occur when one makes typographic or other errors while transferring the representation of a real-world entity to the database. In this paper, we describe a new method to detect dependencies of that kind using continuously active agents that check consistency of a database and propose steps to improve the content of the database.
Herbert Stoyan, Stefan Mandl, Sebastian Schmidt, Mario Vogel

INFRASTRUCTURAL SERVICES

Frontmatter
Thirty Years of Server Technology — From Transaction Processing to Web Services
Abstract
Server technology started with transaction-processing systems in the sixties. Database Management Systems (DBMS) soon adopted mechanism like multi-process and multi-threading. In distributed systems, the remote procedure call also needed process structures at the server side. The same is true for file servers, object servers (CORBA), Web servers, application servers, EJB containers, and Web Services. All these systems support a request-response behavior, sometimes enhanced with a session concept. They are facing thousands of requests per second and must manage thousands of session contexts at the same time. While programming the applications that run on the servers and actually process the requests should be as simple as possible, efficiency must still be very high. So a general programming environment should be defined that is easy to use and, on the other hand, allows for the efficient execution of thousands of program instances in parallel. This contribution will identify mechanisms that have been developed in the context of transaction processing and database management. It will then generalize them to server processing of any kind. This includes program structures, context management, multi-tasking and multi-threading, process structures, program management, naming, and transactions. The driving force behind the discussion is to avoid the re-invention of the wheel that far too often occurs in computer science, mostly in ignorance of older and presumably outdated systems.
Klaus Meyer-Wegener
Caching over the Entire User-to-Data Path in the Internet
Abstract
A Web client request traverses four types of Web caches, before the Web server as the origin of the requested document is reached. This client-to-server path is continued to the backend DB server if timely and transaction-consistent data is needed to generate the document. Web caching typically supports access to single Web objects kept ready somewhere in caches up to the server, whereas database caching, applied in the remaining path to the DB data, allows declarative query processing in the cache. Optimization issues in Web caches concern management of documents decomposed into templates and fragments to support dynamic Web documents with reduced network bandwidth usage and server interaction. When fragment-enabled caching of fine-grained objects can be performed in proxy caches close to the client, user-perceived delays may become minimal. On the other hand, database caching uses a full-fledged DBMS as cache manager to adaptively maintain sets of records from a remote database and to evaluate queries on them. Using so-called cache groups, we introduce the new concept of constraint-based database caching. These cache groups are constructed from parameterized cache constraints, and their use is based on the key concepts of value completeness and predicate completeness. We show how cache constraints affect the correctness of query evaluations in the cache and which optimizations they allow. Cache groups supporting practical applications must exhibit controllable load behavior for which we identify necessary conditions. Finally, we comment on future research problems.
Theo Härder
Reweaving the Tapestry: Integrating Database and Messaging Systems in the Wake of New Middleware Technologies
Abstract
Modern business applications involve a lot of distributed data processing and inter-site communication, for which they rely on middleware products. These products provide the data access and communication framework for the business applications.
Integrated messaging seeks to integrate messaging operations into the database, so as to provide a single API for data processing and messaging. Client applications will be much easier to write, because all the logic of sending and receiving messages is within the database. System configuration, application deployment, and message warehousing are simplified, because we don’t have to manage and fine-tune multiple products.
Integrating messaging into a database also provides features like backup, restore, transactionality & recoverability to messages. In this paper, we’ll look at some aspects of messaging systems, and the challenges involved in integrating messaging such as message delivery semantics, transaction management and impact on query processing.
Sangeeta Doraiswamy, Mehmet Altinel, Lakshmikant Shrinivas, Stewart Palmer, Francis Parr, Berthold Reinwald, C. Mohan
Data Management Support for Notification Services
Abstract
Database management systems are highly specialized to efficiently organize and process huge amounts of data in a transactional manner. During the last years, however, database management systems have been evolving as a central hub for the integration of mostly heterogeneous and autonomous data sources to provide homogenized data access. The next step in pushing database technology forward to play the role of an information marketplace is to actively notify registered users about incoming messages or changes in the underlying data set. Therefore, notification services may be seen as a generic term for subscription systems or, more general, data stream systems which both enable processing of standing queries over transient data.
This article gives a comprehensive introduction into the context of notification services by outlining their differences to the classical query/response-based communication pattern, it illustrates potential application areas, and it discusses requirements addressing the underlying data management support. In more depth, this article describes the core concepts of the PubScribe project thereby choosing three different perspectives. From a first perspective, the subscription process and its mapping onto the primitive publish/subscribe communication pattern is explained. The second part focuses on a hybrid subscription data model by describing the basic constructs from a structural as well as an operational point of view. Finally, the PubScribe notification service project is characterized by a storage and processing model based on relational database technology.
To summarize, this contribution introduces the idea of notification services from an application point of view by inverting the database approach and dealing with persistent queries and transient data. Moreover, the article provides an insight into database technology, which must be exploited and adopted to provide a solid base for a scalable notification infrastructure, using the PubScribe project as an example.
Wolfgang Lehner
Search Support in Data Management Systems
Abstract
In consequence of the change in the nature of data management systems the requirements for search support have shifted. In the early days of data management systems, efficient access techniques and optimization strategies for exact match queries had been the main focus. Most of the problems in this field are satisfactorily solved today and new types of applications for data management systems have turned the focus of current research to content-based similarity queries and queries on distributed databases. The present contribution addresses these two aspects. In the first part, algorithms and data structures supporting similarity queries are presented together with considerations about their integration in data management systems, whereas search techniques for distributed data management systems and especially for peer-to-peer networks are discussed in the second part. Here, techniques for exact match queries and for similarity queries are addressed.
Andreas Henrich

APPLICATION DESIGN

Frontmatter
Toward Automated Large-Scale Information Integration and Discovery
Abstract
The high cost of data consolidation is the key market inhibitor to the adoption of traditional information integration and data warehousing solutions. In this paper, we outline a next-generation integrated database management system that takes traditional information integration, content management, and data warehouse techniques to the next level: the system will be able to integrate a very large number of information sources and automatically construct a global business view in terms of “Universal Business Objects”. We describe techniques for discovering, unifying, and aggregating data from a large number of disparate data sources. Enabling technologies for our solution are XML, web services, caching, messaging, and portals for real-time dashboarding and reporting.
Paul Brown, Peter Haas, Jussi Myllymaki, Hamid Pirahesh, Berthold Reinwald, Yannis Sismanis
Component-Based Application Architecture for Enterprise Information Systems
Abstract
The paradigm of reuse is a traditional concept of surviving for humanity that manifests itself in human languages. The words (components) will be taken out of a lexicon (repository) and then combined to sentences (applications) according to the rules of a specific syntax (grammar). The paper points out the parallels between the component-based approach of human languages on the one hand and component-based application-system design in the software-engineering discipline on the other hand. We describe some instruments (e.g., repositories, part lists) for managing component-based system design, and introduce a language-critical middleware framework supporting the development and processing of component-oriented e-commerce applications (e.g., an electronic marketplace for trading software components). Furthermore, we present a classification of component types and a component specification framework. The existence of standards and exchange forums (e.g., market places) is — besides a sophisticated component- and configuration theory — a substantial prerequisite for superior component-based application development and system life-cycle management.
Erich Ortner
Processes, Workflows, Web Service Flows: A Reconstruction
Abstract
The last decade was heavily focusing on process-oriented approaches for application integration. It started with the advent of workflow management technology at the beginning of the nineties. This development has been continued with the definition of flow concepts for Web services. In this article, we discuss the purpose and advantages of process-oriented concepts for application integration. Therefore, workflow technology and Web service flows are briefly introduced. Then we assess both technologies with respect to their role in application integration. Especially, we will reconstruct the fundamental differences between workflows and Web services.
Stefan Jablonski
Pros and Cons of Distributed Workflow Execution Algorithms
Abstract
As an implementation of business processes workflows are inherently distributed. Consequently, there is a considerable amount both of commercial products and research prototypes that address distribution issues in workflow execution and workflow management systems (WfMS). However, most of these approaches provide only results focussed on the properties of a specific workflow model, workflow application, and/or WfMS implementation. An analysis of generic requirements on distributed workflow execution algorithms and their applicability, advantages, and disadvantages in different workflow scenarios is still missing but will be shown in this paper. A comprehensive requirements analysis on distributed workflow execution forms the basis of our discussion of distributed workflow execution. In contrast to existing work that primarily focuses on non-functional requirements, this paper explicitly considers issues that originate in the workflow model as well. Subsequently, four basic algorithms for distributed workflow execution are presented, namely remote access, workflow migration, workflow partitioning, and subworkflow distribution. Existing WfMS approaches use combinations and/or variants of these basic algorithms. The properties of these algorithms with respect to the aforementioned requirements are discussed in detail. As a primary result, subworkflow distribution proves to be a well-suited application-independent and thus generally applicable distributed execution model. Nevertheless, application-specific optimizations can be accomplished by other models.
Hans Schuster
Business-to-Business Integration Technology
Abstract
Business-to-Business (B2B) integration technology refers to software systems that enable the communication of electronic business events between organizations across computer networks like the Internet or specialized networks like SWIFT [19]. A typical example of business events is a create purchase order sent from a buyer to a seller with the intent that the seller delivers the ordered products eventually, or a post invoice sent from a supplier to a buyer with the intent that the buyer fulfills his obligation to pay for delivered products. Business events carry business data as such and the sender’s intent about what it expects the receiver to do. As business events are mission critical for the success of private, public, and government organizations, their reliable and dependable processing and transmission is paramount.
Database technology is a platform technology that has proven to be reliable and dependable for the management of large sets of dynamic data across a huge variety of applications. In recent years, functionality beyond data management was added to database technology making it a feasible platform for business event processing in addition to data processing itself. New functionality like complex data types, audit trails, message queuing, remote message transmission or publish/subscribe communication fulfills basic requirements for business event processing and are all relevant for B2B integration technology.
This contribution investigates the use of database technology for business event processing between organizations. First, a high-level conceptual model for B2B integration is introduced that derives basic business event processing requirements. A B2B integration system architecture outline is provided that defines the B2B integration system boundaries, before specific database functionality is discussed as implementation technology for business event processing. Some future trends as well as some proposals for extended database functionality is presented as a conclusion of this chapter.
Christoph Bussler

APPLICATION SCENARIOS

Frontmatter
Information Dissemination in Modern Banking Applications
Abstract
Requirements for information systems, especially in the banking and finance industry, have drastically changed in the past few years to cope with phenomena like globalization and the growing impact of financial markets. Nowadays flexibility and profitability in this segment of the economy depends on the availability of ready, actual and accurate information at the working place of every single employee. These theses are exemplified by outlining two modern real-life banking applications, each different. Their business value is founded on the rapid dissemination of accurate information in a global, distributed working environment. To succeed technically, they employ a combination of modern database, networking and software engineering concepts. One case study centers on the swift dissemination of structured financial data to hundreds of investment bankers; the other deals with the rapid dissemination of semi-structured and/or unstructured information in a knowledge retrieval context.
Peter Peinl, Uta Störl
An Intermediate Information System Forms Mutual Trust
Abstract
On the Internet, business transactions between anonyms are being made on a minute cycle. How can confidence between such business partners be obtained? For this purpose, an organization called the “credit bureau” exists in all countries having a functioning free market. In Germany, the leading credit bureau is the SCHUFA.
On the one hand, a credit bureau operates an information system which supplies for the credit grantor data about the credit-worthiness of his clients. On the other hand, the credit bureau offers the customer the possibility to document his reliability to the contractor or the credit grantor, respectively. Of its own accord, the credit bureau strictly commits itself to neutrality and only gives data to credit grantors that are relevant for the credit granting itself. This procedure prevents the system from being abused thereby alienating customers.
In many branches, the credit-granting process is highly automated. Via statistical methods the data of the credit bureaus are condensed into scoring systems. Via correlation of scores, equivalence classes of customers are being formed according to their non-payment risk.
The final credit decision is not only based on the data and the score of the customer in question but obviously also on the data which the credit grantor already possessed or which he was collecting since the contract was concluded.
An integrated decision support system for credit processing starts at the point of sale. It supports an appropriate computer-based dialogue and it includes a rule engine in which the rules for risk assessment are integrated. The information system of the credit bureau can be used in an interactive way.
While a credit is used, the non-payment risk and its probability are of substantial interest. For this purpose, a special monitoring process has to be established.
In summary, the credit-bureau system combines several techniques of computer science in an interesting way. You will find everything from database technology, via mathematical/statistical methods and rule-based systems to Web-based communication.
Dieter Steinbauer
Data Refinement in a Market Research Applications’ Data Production Process
Abstract
In this contribution, we will show how empirically collected field data for a market research application are refined in a stepwise manner and enriched into end-user market reports and charts. The collected data are treated by selections, transformations, enrichments, and aggregations to finally derive new market knowledge from the raw data material. Besides data-oriented aspects, process- and organization-related aspects have to be considered as well to ensure the required product quality for GfK Marketing Services’ customers, which have known GfK for decades as a top-10 player in the international market research area. Based on an ongoing example from the panel-based Retail & Technology application domain, we will show how de-centrally collected and pre-processed data are transformed into integrated, global market knowledge in a network of world-wide companies.
Thomas Ruf, Thomas Kirsche
Information Management in Distributed Healthcare Networks
Abstract
Providing healthcare increasingly changes from isolated treatment episodes towards a continuous treatment process involving multiple healthcare professionals and various institutions. Information management plays a crucial role in this interdisciplinary process. By using information technology (IT) different goals are in the focus: To decrease overall costs for healthcare, to improve healthcare quality, and to consolidate patient-related information from different sources.
Consolidation of patient data is ultimately aimed at a lifetime patient record which serves as the basis for healthcare processes involving multiple healthcare professionals and different institutions. To enable seamless integration of various kinds of IT applications into a healthcare network, a commonly accepted framework is needed. Powerful standards and middleware technology are already at hand to develop a technical and syntactical infrastructure for such a framework. Yet, semantic heterogeneity is a limiting factor for system interoperability. Existing standards do support semantic interoperability of healthcare IT systems to some degree, but standards alone are not sufficient to support an evidence-based cooperative patient treatment process across organizational borders.
Medicine is a rapidly evolving scientific domain, and medical experts are developing and consenting new guidelines as new evidence occurs. Unfortunately, there is a gap between guideline development and guideline usage at the point of care. Medical treatment today is still characterized by a large diversity of different opinions and treatment plans. Medical pathways and reminder systems are an attempt to reduce the diversity in medical treatment and to bring evidence to the point of care. Developing such pathways, however, is primarily a process of achieving consensus between the participating healthcare professionals. IT support for pathways thus requires a responsive IT infrastructure enabling a demand-driven system evolution.
This article describes modern approaches for “integrated care” as well as the major challenges that are yet to be solved to adequately support distributed healthcare networks with IT services.
Richard Lenz
Data Managment for Engineering Applications
Abstract
Current database technology has proven to fulfill the requirements of business applications, i.e., processing a high number of short transactions on more or less simple-structured data. Unfortunately, the requirements of engineering applications are quite different. A car’s bill of material, for example, is a deep tree with many branches at every level. Data objects become even more complex if we consider the engineered design objects themselves, as for example a gear box with its parts and how they are related to each other. Supporting complex data objects has many implications for the underlying data management system. It needs to be reflected at nearly any layer, from the API down to the storage system. Besides complex objects, the way design objects are processed in engineering applications differs from business applications. Because engineering is an explorative task, the concept of short transactions does not fit here. Working with design objects is a task of days, which leads to a different programming model for engineering applications. In addition, the data management system needs to support versioning of objects and configuration management. Furthermore, engineering is done in a collaborative team. Hence, sharing of design objects in a team is necessary while, at the same time, their collaborative work has to be synchronized. All those special requirements have to be considered in data management systems for engineering applications. In this contribution, the special requirements, as sketched above, are characterized. Also the approaches developed to cope with these requirements will be described.
Hans-Peter Steiert
Backmatter
Metadaten
Titel
Data Management in a Connected World
herausgegeben von
Theo Härder
Wolfgang Lehner
Copyright-Jahr
2005
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-31654-1
Print ISBN
978-3-540-26295-4
DOI
https://doi.org/10.1007/b137346