Top

2009 | Book

Read chapter Read first chapter

Next Generation Information Technologies and Systems

7th International Conference, NGITS 2009, Haifa, Israel, June 16-18, 2009. Revised Selected Papers

Editors: Yishai A. Feldman, Donald Kraft, Tsvi Kuflik

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

Information technology is a rapidly changing field in which researchers and devel- ers must continuously set their vision on the next generation of technologies and the systems that they enable. The Next Generation Information Technologies and Systems (NGITS) series of conferences provides a forum for presenting and discussing the latest advances in information technology. NGITS conferences are international events held in Israel; previous conferences have taken place in 1993, 1995, 1997, 1999, 2002, and 2006. In addition to 14 reviewed papers, the conference featured two keynote lectures and an invited talk by notable experts. The selected papers may be classified roughly in five broad areas: • Middleware and Integration • Modeling • Healthcare/Biomedical • Service and Information Management • Applications NGITS 2009 also included a demonstration session and an industrial track focusing on how to make software development more efficient by cutting expenses with techn- ogy and infrastructures. This event is the culmination of efforts by many talented and dedicated individuals.

Frontmatter

Keynote Lectures

Searching in the “Real World”

(Abstract of Invited Plenary Talk)

Abstract

For many, "searching" is considered a mostly solved problem. In fact, for text processing, this belief is factually based. The problem is that most "real world" search applications involve "complex documents", and such applications are far from solved. Complex documents, or less formally, "real world documents", comprise of a mixture of images, text, signatures, tables, logos, water-marks, stamps, etc, and are often available only in scanned hardcopy formats. Search systems for such document collections are currently unavailable.

We describe our efforts at building a complex document information processing (CDIP) prototype. This prototype integrates "point solution" (mature) technologies, such as OCR capability, signature matching and handwritten word spotting techniques, search and mining approaches, among others, to yield a system capable of searching "real world documents". The described prototype demonstrates the adage that "the whole is greater than the sum of its parts".

Ophir Frieder

Structured Data on the Web

Abstract

Though search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. I will describe several current projects at Google whose overall goal is to leverage structured data and better expose it to our users.

The first project is on crawling the deep web. The deep web refers to content that resides in databases behind forms, but is unreachable by search engines because there are no links to these pages. I will describe a system that surfaces pages from the deep web by guessing queries to submit to these forms, and entering the results into the Google index [1]. The pages that we generated using this system come from millions of forms, hundreds of domains and over 40 languages. Pages from the deep web are served in the top-10 results on google.com for over 1000 queries per second.

The second project considers the collection of HTML tables on the web. The WebTables Project [2] built a corpus of over 150 million tables from HTML tables on the Web. The WebTables System addresses the challenges of extracting these tables from the Web, and offers search over this collection of tables. The project also illustrates the potential of leveraging the collection of schemas of these tables.

Finally, I’ll discuss current work on computing aspects of queries in order to better organize search results for exploratory queries.

Alon Y. Halevy

Middleware and Integration

Worldwide Accessibility to Yizkor Books

Abstract

Yizkor Books contain firsthand accounts of events that occurred before, during, and after the Holocaust. These books were published with parts in thirteen languages, across six continents, spanning a period of more than 60 years and are an important resource for research of Eastern European Jewish communities, Holocaust studies, and genealogical investigations. Numerous Yizkor Book collections span the globe. One of the largest collections of Yizkor Books is housed within the United States Holocaust Memorial Museum. Due to their rare and often fragile conditions, the original Yizkor Books are vastly underutilized. Ongoing efforts to digitize and reprint Yizkor Books increases the availability of the books, however, the capability to search information about their content is nonexistent. We established a centralized index for Yizkor Books and developed a detailed search interface accessible worldwide, capable of efficiently querying the data. Our interface offers unique features and provides novel approaches to foreign name and location search. Furthermore, we describe and demonstrate a rule set to assist searches based on inaccurate terms. This system is currently under the auspices of the United States Holocaust Memorial Museum.

Rebecca Cathey, Jason Soo, Ophir Frieder, Michlean Amir, Gideon Frieder

Biomedical Information Integration Middleware for Clinical Genomics

Abstract

Clinical genomics, the marriage of clinical information and knowledge about the human or pathogen genome, holds enormous promise for the healthcare and life sciences domain. Based on a more in-depth understanding of the human and pathogen molecular interaction, clinical genomics can be used to discover new targeted drugs and provide personalized therapies with fewer side effects, at reduced costs, and with higher efficacy. A key enabler of clinical genomics is a sound standards-based biomedical information integration middleware. This middleware must be able to de-identify, integrate and correlate clinical, clinical trials, genomic and images metadata from the various systems. We describe MedII, a novel biomedical information integration research technology that some of its components were integrated in IBM Clinical Genomics solution. We also introduce the need for biomedical information preservation to assist in ensuring that the integrated biomedical information can be read and interpreted decades from now.

Simona Rabinovici-Cohen

Modeling

Interpretation of History Pseudostates in Orthogonal States of UML State Machines

Abstract

Inconsistencies and semantic variation points of the UML specification are a source of problems during code generation and execution of behavioral models. We discuss the interpretation of history concepts of UML 2.x state machines. Especially, history in complex states with orthogonal regions was considered. The clarification of this interpretation was proposed and explained by an example. The history issues and other variation points had to be resolved within the Framework for eXecutable UML (FXU). The FXU was the first framework supporting all elements of UML 2.x behavioral state machines in code generation and execution for C# code.

Anna Derezińska, Romuald Pilitowski

System Grokking – A Novel Approach for Software Understanding, Validation, and Evolution

Abstract

The complexity of software systems is continuously growing across a wide range of application domains. System architects are often faced with large complex systems and systems whose semantics may be difficult to understand, hidden, or even still evolving. Raising the level of abstraction of such systems can significantly improve their usability.

We introduce System Grokking - a software architect assistance technology designed to support incremental and iterative user-driven understanding, validation, and evolution of complex software systems through higher levels of abstraction. The System Grokking technology enables semi-automatic discovery, manipulation, and visualization of groups of domain-specific software elements and the relationships between them to represent high-level structural and behavioral abstractions.

Maayan Goldstein, Dany Moshkovich

Refactoring of Statecharts

Abstract

Statecharts are an important tool for specifying the behavior of reactive systems, and development tools can automatically generate object-oriented code from them. As the system is refactored, it is necessary to modify the associated statecharts as well, performing operations such as grouping or ungrouping states, extracting part of a statechart into a separate class, and merging states and transitions. Refactoring tools embedded in object-oriented development environments are making it much easier for developers to modify their programs. However, tool support for refactoring statecharts does not yet exist. As a result, developers avoid making certain changes that are too difficult to perform manually, even though design quality deteriorates.

Methodologically, statecharts were meant to enable a systems engineer to describe a complete system, which would then be refined into a concrete implementation (object-oriented or other). This process is not supported by object-oriented development environments, which force each statechart to be specified as part of a class. Automated tool support for refactoring statecharts will also make this kind of refinement possible.

This paper describes a case study that shows the usefulness of refactoring support for statecharts, and presents an initial catalog of relevant refactorings. We show that a top-down refinement process helps identify the tasks and classes in a natural way.

Moria Abadi, Yishai A. Feldman

Healthcare/Biomedical

Towards Health 2.0: Mashups to the Rescue

Abstract

Over the past few years, we have witnessed a rise in the use of the web for health purposes. Patients have begun to manage their own health data online, use health-related services, search for information, and share it with others. The cooperation of healthcare constituents towards making collaboration platforms available is known today as Health 2.0. The significance of Health 2.0 lies in the transformation of the patient from a healthcare consumer to an active participant in a new environment. We analyze the trend and propose mashups as a leading technology for the integration of relevant data, services, and applications. We present Medic-kIT, a mashup-based patient-centric Extended Personal Health Record system, which adheres to web 2.0 standards. We conclude by highlighting unique aspects that will have to be addressed to enable the development of such systems in the future.

Ohad Greenshpan, Ksenya Kveler, Boaz Carmeli, Haim Nelken, Pnina Vortman

Semantic Warehousing of Diverse Biomedical Information

Abstract

One of the main challenges of data warehousing within biomedical information infrastructures is to enable semantic interoperability between its various stakeholders as well as other interested parties. Promoting the adoption of worldwide accepted information standards along with common controlled terminologies is the right path to achieve that. The HL7 v3 Reference Information Model (RIM) is used to derive consistent health information standards such as laboratory, clinical health record data, problem- and goal-oriented care, public health and clinical research. In this paper we describe a RIM-based warehouse which provides (1) the means for data integration gathered from disparate and diverse data sources, (2) a mixture of XML and relational schemas and (3) a uniform abstract access and query capabilities serving both healthcare and clinical research users. Through the use of constrained standards (templates),,we facilitate semantic interoperability which would be harder to achieve if we only used generic standards in use cases with unique requirements. Such semantic warehousing also lays the groundwork for harmonized representations of data, information and knowledge, and thus enables a single infrastructure to serve analysis tools, decision support applications, clinical data exchange, and point-of-care applications. In addition, we describe the implementation of that semantic warehousing within Hypergenes, a European Commission funded project focused on Essential Hypertension, to illustrate the unique concepts and capabilities of our warehouse.

Stefano Bianchi, Anna Burla, Costanza Conti, Ariel Farkash, Carmel Kent, Yonatan Maman, Amnon Shabo

InEDvance: Advanced IT in Support of Emergency Department Management

Abstract

Emergency Departments (ED) are highly dynamic environments comprising complex multi-dimensional patient-care processes. In recent decades, there has been increased pressure to improve ED services, while taking into account various aspects such as clinical quality, operational efficiency, and cost performance. Unfortunately, the information systems in today’s EDs cannot access the data required to provide a holistic view of the ED in a complete and timely fashion. What does exist is a set of disjoint information systems that provide some of the required data, without any additional structured tools to manage the ED processes. We present a concept for the designof an IT system that provides advanced management functionality to the ED. The system is composed of three major layers: data collection, analytics, and the user interface. The data collection layer integrates the IT systems that already exist in the ED and newly introduced systems such as sensor-based patient tracking. The analytics component combines methods and algorithms that turn the data into valuable knowledge. An advanced user interface serves as a tool to help make intelligent decisions based on that knowledge. We also describe several scenarios that demonstrate the use and impact of such a system on ED management. Such a system can be implemented in gradual stages, enabling incremental and ongoing improvements in managing the ED care processes. The multi-disciplinary vision presented here is based on the authors’ extensive experience and their collective records of accomplishment in emergency departments, business optimization, and the development of IT systems.

Segev Wasserkrug, Ohad Greenshpan, Yariv N. Marmor, Boaz Carmeli, Pnina Vortman, Fuad Basis, Dagan Schwartz, Avishai Mandelbaum

Service and Information Management

Enhancing Text Readability in Damaged Documents

Abstract

Documents can be damaged for various reasons – attempts to destroy the document, aging, natural cause such as floods, etc. In the preliminary work reported herein, we present some results of processes that enhance the visibility of lines, therefore the readability of text in such documents. No attempt is made to interpret the contents – rather, the work intends to aid an analyst that will eventually process the information that is now easier to see and acquire.

Gideon Frieder

ITRA under Partitions

Abstract

In Service Oriented Architecture (SOA), web services may span several sites or logical tiers, each responsible for some part of the service. Most services need to be highly reliable and should allow no data corruption. A known problem in distributed systems that may lead to data corruption or inconsistency is the partition problem, also known as the split-brain phenomena. A split-brain occurs when a network, hardware, or software malfunction breaks a cluster of computer into several separate sub-clusters that reside side by side and are not aware of each other. When, during a session, two or more of these sub-clusters serve the same client, the data may become inconsistent or corrupted.

ITRA – Inter Tier Relationship Architecture [1] enables web services to transparently recover from multiple failures in a multi-tier environment and to achieve continuous availability. However, the ITRA protocol does not handle partitions. In this paper we propose an extension to ITRA that supports continuous availability under partitions. Our unique approach, discussed in this paper, deals with partitions in multi-tier environments using the collaboration of neighboring tiers.

Aviv Dagan, Eliezer Dekel

Short and Informal Documents: A Probabilistic Model for Description Enrichment

Abstract

While lexical statistics of formal text play a central role in many statistical Natural Language Processing (NLP) and Information Retrieval (IR) tasks, there is little known about lexical statistics of informal and short documents. To learn the unique characteristics of informal text, we construct an N-gram study on P2P data, and present the insights, problems, and differences from formal text. Consequently, we apply a probabilistic model for detecting and correcting spelling problems (not necessarily errors) and propose an enrichment method that makes many P2P files better accessible to relevant user queries. Our enrichment results show an improvement in both recall and precision with only a slight increase in the collection size.

Yuval Merhav, Ophir Frieder

Applications

Towards a Pan-European Learning Resource Exchange Infrastructure

Abstract

The Learning Resource Exchange (LRE) is a new service that provides schools with access to educational content from many different origins. From a technical standpoint, it consists of an infrastructure that:

Federates systems that provide learning resources – e.g., learning resource repositories, authoring tools – and
Offers a seamless access to these resources by educational systems that enable their use – e.g., educational portals, virtual learning environments (VLEs).

As the number of connected systems increased over time, this infrastructure had to evolve in order to improve the quality of its search service.

This paper describes the current LRE infrastructure and explains the rationale behind its evolution.

David Massart

Performance Improvement of Fault Tolerant CORBA Based Intelligent Transportation Systems (ITS) with an Autonomous Agent

Abstract

The ITS is a state-of-the-art system, which maximizes mobility, safety, and usefulness through combining existing transport systems with information, communication, computer, and control technologies. The core functions of the ITS are collection, management, and provision of real time transport information, and it can be deployed based on the Common Object Request Broker Architecture (CORBA) of the Object Management Group (OMG) efficiently because it consists of many interconnected heterogeneous systems. Fault Tolerant CORBA (FT-CORBA) supports real time requirement of transport information stably through redundancy by replication of server objects. However, object replication, management, and related protocols of FT-CORBA require extra system CPU and memory resources, and can degrade the system performance both locally and as a whole. This paper proposes an improved architecture to enhance performance of FT-CORBA based ITS by generating and managing object replicas autonomously and dynamically during system operation with an autonomous agent. The proposed architecture is expected to be applicable to other FT-CORBA based systems.

Woonsuk Suh, Soo Young Lee, Eunseok Lee

A Platform for LifeEvent Development in a eGovernment Environment: The PLEDGE Project

Abstract

Providing eGovernment solutions is becoming a matter of great importance for governments all over the world. In order to meet the special requirements of this sort of projects, several attempts have been and are currently developed. This papers proposes its own approach that takes advantage of resources derived from the use of Semantics and from an artifact, deeply discussed on the paper, defined as LifeEvent. On the basis of these premises, a entire software platform is described and a prototype developed, as shown in the paper. Also some conclusions and hints for future projects in the scope are provided.

Luis Álvarez Sabucedo, Luis Anido Rifón, Ruben Míguez Pérez

Online Group Deliberation for the Elicitation of Shared Values to Underpin Decision Making

Abstract

Values have been shown to underpin our attitudes, behaviour and motivate our decisions. Values do not exist in isolation but have meaning in relation to other values. However, values are not solely the purview of individuals as communities and organisations have core values implicit in their culture, policies and practices. Values for a group can be determined by a minority in power, derived by algorithmically merging values each group member holds, or set by deliberative consensus. The elicitation of values for the group by deliberation is likely to lead to widespread acceptance of values arrived at, however enticing individuals to engage in face to face discussion about values has been found to be very difficult. We present an online deliberative communication approach for the anonymous deliberation of values and claim that the framework has the elements required for the elicitation of shared values.

Faezeh Afshar, Andrew Stranieri, John Yearwood

Backmatter

Title: Next Generation Information Technologies and Systems
Editors: Yishai A. Feldman
Donald Kraft
Tsvi Kuflik
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-04941-5
Print ISBN: 978-3-642-04940-8
DOI: https://doi.org/10.1007/978-3-642-04941-5

Springer Professional

About this book

Table of Contents

Frontmatter

Keynote Lectures

Searching in the “Real World”

Structured Data on the Web

Middleware and Integration

Worldwide Accessibility to Yizkor Books

Biomedical Information Integration Middleware for Clinical Genomics

Modeling

Interpretation of History Pseudostates in Orthogonal States of UML State Machines

System Grokking – A Novel Approach for Software Understanding, Validation, and Evolution

Refactoring of Statecharts

Healthcare/Biomedical

Towards Health 2.0: Mashups to the Rescue

Semantic Warehousing of Diverse Biomedical Information

InEDvance: Advanced IT in Support of Emergency Department Management

Service and Information Management

Enhancing Text Readability in Damaged Documents

ITRA under Partitions

Short and Informal Documents: A Probabilistic Model for Description Enrichment

Applications

Towards a Pan-European Learning Resource Exchange Infrastructure

Performance Improvement of Fault Tolerant CORBA Based Intelligent Transportation Systems (ITS) with an Autonomous Agent

A Platform for LifeEvent Development in a eGovernment Environment: The PLEDGE Project

Online Group Deliberation for the Elicitation of Shared Values to Underpin Decision Making

Backmatter