Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIII

Selected Papers from FDSE 2014

Editors: Abdelkader Hameurlain, Josef Küng, Roland Wagner, Tran Khanh Dang, Nam Thoai

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This volume, the 23rd issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems,focuses on information and security engineering. It contains five revised and extended papers selected from the proceedings of the First International Conference on Future Data and Security Engineering, FDSE 2014, held in Ho Chi Minh City, Vietnam, November 19-21, 2014. The titles of the five papers are as follows: A Natural Language Processing Tool for White Collar Crime Investigation; Data Leakage Analysis of the Hibernate Query Language on a Propositional Formulae Domain; An Adaptive Similarity Search in Massive Datasets; Semantic Attack on anonymized Transactions; and Private Indexes for Mixed Encrypted Databases.

Frontmatter

A Natural Language Processing Tool for White Collar Crime Investigation

Abstract

In today’s world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and corporations are producing data on a large scale, and since the rise of the internet, e-mail and social media the amount of produced data has grown exponentially. From a law enforcement perspective we have to deal with these huge amounts of data when a criminal investigation is launched against an individual or company. Relevant questions need to be answered like who committed the crime, who were involved, what happened and on what time, who were communicating and about what? Not only the amount of available data to investigate has increased enormously, but also the complexity of this data has increased. When these communication patterns need to be combined with for instance a seized financial administration or corporate document shares a complex investigation problem arises. Recently, criminal investigators face a huge challenge when evidence of a crime needs to be found in the Big Data environment where they have to deal with large and complex datasets especially in financial and fraud investigations. To tackle this problem, a financial and fraud investigation unit of a European country has developed a new tool named LES that uses Natural Language Processing (NLP) techniques to help criminal investigators handle large amounts of textual information in a more efficient and faster way. In this paper, we present this tool and we focus on the evaluation its performance in terms of the requirements of forensic investigation: speed, smarter and easier for investigators. In order to evaluate this LES tool, we use different performance metrics. We also show experimental results of our evaluation with large and complex datasets from real-world application.

Maarten van Banerveld, Mohand-Tahar Kechadi, Nhien-An Le-Khac

Data Leakage Analysis of the Hibernate Query Language on a Propositional Formulae Domain

Abstract

This paper presents an information flow analysis of Hibernate Query Language (HQL). We define a concrete semantics of HQL and we lift the semantics on an abstract domain of propositional formulae. This way, we capture variables dependences at each program point. This allows us to identify illegitimate information flow by checking the satisfiability of propositional formulae with respect to a truth value assignment based on their security levels.

Raju Halder, Angshuman Jana, Agostino Cortesi

An Adaptive Similarity Search in Massive Datasets

Abstract

Similarity search is an important task engaging in different fields of studies as well as in various application domains. The era of big data, however, has been posing challenges on existing information systems in general and on similarity search in particular. Aiming at large-scale data processing, we propose an adaptive similarity search in massive datasets with MapReduce. Additionally, our proposed scheme is both applicable and adaptable to popular similarity search cases such as pairwise similarity, search-by-example, range queries, and k-Nearest Neighbour queries. Moreover, we embed our collaborative refinements to effectively minimize irrelevant data objects as well as unnecessary computations. Furthermore, we experience our proposed methods with the two different document models known as shingles and terms. Last but not least, we conduct intensive empirical experiments not only to verify these methods themselves but also to compare them with a previous related work on real datasets. The results, after all, confirm the effectiveness of our proposed methods and show that they outperform the previous work in terms of query processing.

Trong Nhan Phan, Josef Küng, Tran Khanh Dang

Semantic Attack on Anonymised Transactions

Abstract

A transaction is a data record that contains items associated with an individual. For example, a set of movies rated by an individual form a transaction. Transaction data are important to applications such as marketing analysis and medical studies, but they may contain sensitive information about individuals which must be sanitised before being used. One popular approach to anonymising transaction data is set-based generalisation, which attempts to hide an original item by replacing it with a set of items. In this paper, we study how well this method can protect transaction data. We propose an attack that aims to reconstruct original transaction data from its set-generalised version by analysing semantic relationships that exist among the items. Our experiments show that set-based generalisation may not provide adequate protection for transaction data, and about 50 % of the items added to the transactions during generalisation can be detected by our method with a precision greater than 80 %.

Jianhua Shao, Hoang Ong

Private Indexes for Mixed Encrypted Databases

Abstract

Data privacy and query performance are two closely linked and inconsistent challenges for outsourced databases. Using mixed encryption methods on data attributes can partially reach a trade-off between the two challenges. However, encryption cannot always hide the correlations between attribute values. When the data tuples are accessed selectively, inferences based on comparing encrypted values could be launched, and some sensitive values may be disclosed. In this paper, we explore the intra-attribute based and inter-attribute based inferences in mixed encrypted databases. We develop a method to construct private indexes on encrypted values to defend against those inferences while supporting efficient selective access to encrypted data. We have conducted some experiments to validate our proposed method.

Yi Tang, Xiaolei Zhang, Ji Zhang

Backmatter

Title: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIII
Editors: Abdelkader Hameurlain
Josef Küng
Roland Wagner
Tran Khanh Dang
Nam Thoai
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-662-49175-1
Print ISBN: 978-3-662-49174-4
DOI: https://doi.org/10.1007/978-3-662-49175-1

Springer Professional

About this book

Table of Contents

Frontmatter

A Natural Language Processing Tool for White Collar Crime Investigation

Data Leakage Analysis of the Hibernate Query Language on a Propositional Formulae Domain

An Adaptive Similarity Search in Massive Datasets

Semantic Attack on Anonymised Transactions

Private Indexes for Mixed Encrypted Databases

Backmatter