Skip to main content
main-content

Über dieses Buch

The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing (e.g., computing resources, services, metadata, data sources) across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability.

This, the 45th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains eight revised selected regular papers. Topics covered include data analysis, information extraction, blockchains, and big data.

Inhaltsverzeichnis

Frontmatter

Interoperable Data Extraction and Analytics Queries over Blockchains

Abstract
The explosion of interests by diverse organisations for deploying their services on blockchains to exploit decentralize transaction governance and advanced cryptographic protocols, is fostering the emergence of new challenges for distributed ledger technologies (DLTs). The next generation of blockchain services are now extending well beyond cryptocurrencies, accumulating and storing vast amounts of data. Therefore, the need to efficiently extract data over blockchains and subsequently foster data analytics, is more evident than ever. However, despite the wide public interest and the release of several frameworks, efficiently accessing and processing data from blockchains still imposes significant challenges. This article, first, introduces the key limitations faced by organisations in need for efficiently accessing and managing data over DLTs. Afterwards, it introduces Datachain, a lightweight, flexible and interoperable framework deliberately designed to ease the extraction of data hosted on DLTs. Through high-level query abstractions, users connect to underlying blockchains, perform data requests, extract transactions, manage data assets and derive high-level analytic insights. Most importantly, due to the inherent interoperable nature of Datachain, queries and analytic jobs are reusable and can be executed without alterations on different underlying blockchains. To illustrate the wide applicability of Datachain, we present a realistic use-case on top of Hyperledger and BigchainDB.
Demetris Trihinas

Exploiting Twitter for Informativeness Classification in Disaster Situations

Abstract
Disaster management urgently requires mechanisms for achieving situation awareness (SA) in a timely manner, allowing authorities to react in an appropriate way to reduce the impact on affected people and infrastructure. In such situations, no matter if they are human-induced like shootings or natural ones like earthquakes or floods, social media such as Twitter are frequently used communication channels, making them a highly valuable additional data source for enhancing SA. The challenge is, however, to identify out of the tremendous mass of irrelevant and non informative social media data those messages being really “informative”, i.e., contributing to SA in a certain disaster situation. Existing approaches on machine-learning driven informativeness classification most often focus on specific disaster types, such as shootings or floods, thus lacking general applicability and falling short in classification of new disaster events. Therefore, this article puts forward the following three contributions: First, in order to better understand the underlying social media data source, an in-depth analysis of existing Twitter data on 26 different disaster events is provided along temporal, spatial, linguistic, and source dimensions. Second, based thereupon, a cross-domain informativeness classifier is proposed being not focused on specific disaster types but rather allowing for classifications across different types. Third, the applicability of this cross-domain classifier is demonstrated, showing its accuracy compared to other disaster type specific approaches.
David Graf, Werner Retschitzegger, Wieland Schwinger, Birgit Pröll, Elisabeth Kapsammer

COTILES: Leveraging Content and Structure for Evolutionary Community Detection

Abstract
Most community detection algorithms for online social networks rely solely either on the structure of the network, or on its contents. Both extremes ignore valuable information that influences cluster formation. We propose COTILES, an evolutionary community detection algorithm, that leverages both structural and content-based criteria so as to derive densely connected communities with similar contents. Specifically, we extend a fast online structural community detection algorithm by applying additional content-based constraints. We also further explore the effect of structure and content-based criteria on the clustering result by introducing three tunable variations of COTILES that either tighten or relax these criteria. Through our experimental evaluation, we show that the proposed method derives more cohesive communities compared to the original structural one, and highlight when the proposed variations should be deployed.
Nikolaos Sachpenderis, Georgia Koloniari, Alexandros Karakasidis

A Weighted Feature-Based Image Quality Assessment Framework in Real-Time

Abstract
Nowadays, social media runs a significant portion of people’s daily lives. Millions of people use social media applications to share photos. The massive volume of images shared on social media presents serious challenges and requires large computational infrastructure to ensure successful data processing. However, an image gets distorted somehow during the processing, transmission, sharing, or from a combination of many factors. So, there is a need to guarantee acceptable delivery content, especially for image processing applications. In this paper, we present a framework developed to process a large number of images in real-time while estimating the image quality. Our quality evaluation is measured based on four methods: Perceptual Coherence Measure, Semantic Coherence Measure, Content-Based Image Retrieval, and Structural Similarity Index. A weighted quality method is then calculated based on the four previous methods while providing a way to optimize the execution latency. Lastly, a set of experiments is conducted to evaluate our proposed approach.
Zahi Al Chami, Chady Abou Jaoude, Bechara Al Bouna, Richard Chbeir

Sharing Knowledge in Digital Ecosystems Using Semantic Multimedia Big Data

Abstract
The use of formal representations has a basic importance in the era of big data. This need is more evident in the context of multimedia big data due to the intrinsic complexity of this type of data. Furthermore, the relationships between objects should be clearly expressed and formalized to give the right meaning to the correlation of data. For this reason the design of formal models to represent and manage information is a necessary task to implement intelligent information systems. Approaches based on the semantic web need to improve the data models that are the basis for implementing big data applications. Using these models, data and information visualization becomes an intrinsic and strategic task for the analysis and exploration of multimedia Big Data. In this article we propose the use of a semantic approach to formalize the structure of a multimedia Big Data model. Moreover, the identification of multimodal features to represent concepts and linguistic-semantic properties to relate them is an effective way to bridge the gap between target semantic classes and low-level multimedia descriptors. The proposed model has been implemented in a NoSQL graph database populated by different knowledge sources. We explore a visualization strategy of this large knowledge base and we present and discuss a case study for sharing information represented by our model according to a peer-to-peer(P2P) architecture. In this digital ecosystem, agents (e.g. machines, intelligent systems, robots,...) act like interconnected peers exchanging and delivering knowledge with each other.
Antonio M. Rinaldi, Cristiano Russo

Facilitating and Managing Machine Learning and Data Analysis Tasks in Big Data Environments Using Web and Microservice Technologies

Abstract
Driven by the current advances of machine learning in a wide range of application areas, the need for developing easy to use frameworks for instrumenting machine learning effectively for non data analytics experts as well as novices increased dramatically. Furthermore, building machine learning models in the context of Big Data environments still represents a great challenge. In the present article, those challenges are addressed by introducing a new generic framework for efficiently facilitating the training, testing, managing, storing and retrieving of machine learning models in the context of Big Data. The framework makes use of a powerful Big Data software stack platform, web technologies and a microservice architecture for a fully manageable and highly scalable solution. A highly configurable user interface hiding platform details from the user is introduced giving the user the ability to easily train, test and manage machine learning models. Moreover, the framework automatically indexes and characterizes models and allows flexible exploration of them in the visual interface. The performance and usability of the new framework is evaluated on state-of-the-arts machine learning algorithms: it is shown that executing, storing and retrieving machine learning models via the framework results in a well acceptable low overhead demonstrating that the framework can provide an efficient approach for facilitating machine learning in Big Data environments. It is also evaluated, how configuration options (e.g. caching of RDDs in Apache Spark) affect runtime performance. Furthermore, the evaluation provides indicators for when the utilization of distributed computing (i.e. parallel computation) based on Apache Spark on a cluster outperforms single computer execution of a machine learning model.
Shadi Shahoud, Sonja Gunnarsdottir, Hatem Khalloof, Clemens Duepmeier, Veit Hagenmeyer

Stable Marriage Matching for Homogenizing Load Distribution in Cloud Data Center

Abstract
Running a sheer virtualized data center with the help of Virtual Machines (VM) is the de facto-standard in modern data centers. Live migration offers immense flexibility opportunities as it endows the system administrators with tools to seamlessly move VMs across physical machines. Several studies have shown that the resource utilization within a data center is not homogeneous across the physical servers. Load imbalance situations are observed where a significant portion of servers are either in overloaded or underloaded states. Apart from leading to inefficient usage of energy by underloaded servers, this might lead to serious QoS degradation issues in the overloaded servers.
In this paper, we propose a lightweight decentralized solution for homogenizing the load across different machines in a data center by mapping the problem to a Stable Marriage matching problem. The algorithm judiciously chooses pairs of overloaded and underloaded servers for matching and subsequently VM migrations are performed to reduce load imbalance. For the purpose of comparisons, three different greedy matching algorithms are also introduced. In order to verify the feasibility of our approach in real-life scenarios, we implement our solution on a small test-bed. For the larger scale scenarios, we provide simulation results that demonstrate the efficiency of the algorithm and its ability to yield a near-optimal solution compared to other algorithms. The results are promising, given the low computational footprint of the algorithm.
Disha Sangar, Ramesh Upreti, Hårek Haugerud, Kyrre Begnum, Anis Yazidi

A Sentiment Analysis Software Framework for the Support of Business Information Architecture in the Tourist Sector

Abstract
In recent years, the increased use of digital tools within the Peruvian tourism industry has created a corresponding increase in revenues. However, both factors have caused increased competition in the sector that in turn puts pressure on small and medium enterprises’ (SME) revenues and profitability. This study aims to apply neural network based sentiment analysis on social networks to generate a new information search channel that provides a global understanding of user trends and preferences in the tourism sector. A working data-analysis framework will be developed and integrated with tools from the cloud to allow a visual assessment of high probability outcomes based on historical data, to help SMEs estimate the number of tourists arriving and places they want to visit, so that they can generate desirable travel packages in advance, reduce logistics costs, increase sales, and ultimately improve both quality and precision of customer service.
Javier Murga, Gianpierre Zapata, Heyul Chavez, Carlos Raymundo, Luis Rivera, Francisco Domínguez, Javier M. Moguerza, José Marí­a Álvarez

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise