nach oben

2015 | Buch

Kapitel lesen Erstes Kapitel lesen

Future Data and Security Engineering

Second International Conference, FDSE 2015, Ho Chi Minh City, Vietnam, November 23-25, 2015, Proceedings

herausgegeben von: Tran Khanh Dang, Prof. Roland Wagner, Josef Küng, Nam Thoai, Makoto Takizawa, Erich Neuhold

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the refereed proceedings of the Second International Conference on Future Data and Security Engineering, FDSE 2015, held in Ho Chi Minh City, Vietnam, in November 2015. The 20 revised full papers and 3 short papers presented were carefully reviewed and selected from 88 submissions. They have been organized in the following topical sections: big data analytics and massive dataset mining; security and privacy engineering; crowdsourcing and social network data analytics; sensor databases and applications in smart home and city; emerging data management systems and applications; context-based analysis and applications; and data models and advances in query processing.

Inhaltsverzeichnis

Frontmatter

Erratum to: Facilitating the Design/Evaluation Process of Web-Based Geographic Applications: A Case Study with WINDMash

The Nhan Luong, Christophe Marquesuzaà, Patrick Etcheverry, Thierry Nodenot, Sébastien Laborie

Big Data Analytics and Massive Dataset Mining

Frontmatter

Random Local SVMs for Classifying Large Datasets

Abstract

We propose a new parallel ensemble learning algorithm of random local support vector machines, called krSVM for the effectively non-linear classification of large datasets. The random local SVM in the krSVM learning strategy uses kmeans algorithm to partition the data into k clusters, followed which it constructs a non-linear SVM in each cluster to classify the data locally in the parallel way on multi-core computers. The krSVM algorithm is faster than the standard SVM in the non-linear classification of large datasets while maintaining the classification correctness. The numerical test results on 4 datasets from UCI repository and 3 benchmarks of handwritten letters recognition showed that our proposed algorithm is efficient compared to the standard SVM.

Thanh-Nghi Do, François Poulet

An Efficient Document Indexing-Based Similarity Search in Large Datasets

Abstract

In this paper, we principally devote our effort to proposing a novel MapReduce-based approach for efficient similarity search in big data. Specifically, we address the drawbacks of using inverted index in similarity search with MapReduce and then propose a simple yet efficient redundancy-free MapReduce scheme, which not only takes advantages over the baseline inverted index-based procedures but also adapts to various similarity measures and similarity searches. Additionally, we present other strategic methods in order to potentially contribute to eliminating unnecessary data and computations. Last but not least, empirical evaluations are intensively conducted with real massive datasets and Hadoop framework in the cluster of commodity machines to verify the proposed methods, whose promising results show how much beneficial they are when dealing with big data.

Trong Nhan Phan, Markus Jäger, Stefan Nadschläger, Josef Küng, Tran Khanh Dang

Using Local Rules in Random Forests of Decision Trees

Abstract

We propose to use local labeling rules in random forests of decision trees for effectively classifying data. The decision rules use the majority vote for labeling at terminal-nodes in decision trees, maybe making the classical random forest algorithm degrade the classification performance. Our investigation aims at replacing the majority rules with the local ones, i.e. support vector machines to improve the prediction correctness of decision forests. The numerical test results on 8 datasets from UCI repository and 2 benchmarks of handwritten letters recognition showed that our proposal is more accurate than the classical random forest algorithm.

Thanh-Nghi Do

A Term Weighting Scheme Approach for Vietnamese Text Classification

Abstract

The term weighting scheme, which is used to convert the documents to vectors in the term space, is a vital step in automatic text categorization. The previous studies showed that term weighting schemes dominate the performance. There have been extensive studies on term weighting for English text classification. However, not many works have been studied on Vietnamese text classification.. In this paper, we proposed a term weighting scheme called normalize(tf.rf _max ), which is based on tf.rf term weighting scheme – one of the most effective term weighting schemes to date. We conducted experiments to compare our proposed normalize(tf.rf _max ) term weighting scheme to tf.rf and tf.idf on Vietnamese text classification benchmark. The results showed that our proposed term weighting scheme can achieve about 3 %–5 % accuracy better than other term weighting schemes.

Vu Thanh Nguyen, Nguyen Tri Hai, Nguyen Hoang Nghia, Tuan Dinh Le

Security and Privacy Engineering

Frontmatter

Fault Data Analytics Using Decision Tree for Fault Detection

Abstract

Monitoring events on communication and computing systems becomes more and more challenging due to the increasing complexity and diversity of these systems. Several supporting tools have been created to assist system administrators in monitoring an enormous number of events daily. The main function of these tools is to filter as many as possible events and present non-trivial events to the administrators for fault analysis and detection. However, non-trivial events never decrease on large systems, such as cloud computing systems, while investigating events is time consuming. This paper proposes an approach for evaluating the severity level of an event using a classification and regression decision tree. The approach aims to build a decision tree based on the features of old events, then use this tree to decide the severity level of new events. The administrators take advantages of this decision to determine proper actions for the non-trivial events. We have implemented and experimented the approach for software bug datasets obtained from bug tracking systems. The experimental results reveal that the accuracy scores for different decision trees are above 70 % and some detailed analyses are provided.

Ha Manh Tran, Sinh Van Nguyen, Son Thanh Le, Quy Tran Vu

Evaluation of Reliability and Security of the Address Resolution Protocol

Abstract

This article shows the procedure to execute an ARP poisoning in order to stand out the insecurity that the Protocol has, and to compare it against other alternatives, to show the safety of each of these. Where it is concluded that ES-ARP and S-ARP are good choices to improve the safety of the ARP protocol, although is not 100 % secure, since if they send the answer and then the poisoned ARP reply is sent before the actual one is received, and set on the cache memory, the victim stores the wrong response in the cache and discards the actual one. When the first ARP request is sent, the victim and the attacker receive the message. Who comes first will get the ARP cache of the victim.

Elvia León, Brayan S. Reyes Daza, Octavio J. Salcedo Parra

Crowdsourcing and Social Network Data Analytics

Frontmatter

Establishing a Decision Tool for Business Process Crowdsourcing

Abstract

The integration of crowdsourcing in organisations fosters new managerial and business capabilities, especially regarding flexibility and agility of external human resources. However, a crowdsourcing project involves considering multiple contextual factors and choices and dealing with the novelty of the strategy, which makes managerial decisions difficult. This research addresses the problem by proposing a tool supporting business decision-makers in the establishment of crowdsourcing projects. The proposed tool is based on an extensive review of prior research in crowdsourcing and an ontology that standardises the fundamental crowdsourcing concepts, processes, dependencies, constraints, and managerial decisions. In particular, we discuss the architecture of the proposed tool and present two prototypes, one supporting what-if analysis and the other supporting detailed establishment of crowdsourcing processes.

Nguyen Hoang Thuan, Pedro Antunes, David Johnstone, Nguyen Huynh Anh Duy

Finding Similar Artists from the Web of Data: A PageRank Based Semantic Similarity Metric

Abstract

Since its commencement, the Linked Open Data cloud has been quickly become popular and offers rich data sources for quite a number of applications. The potential for application development using Linked Data is immense and needs intensive research efforts. Until now, the issue of how to efficiently exploit the data provided by the new platform remains an open research question. In this paper we present our investigation of utilizing a well-known encyclopedic dataset, DBpedia for finding similar musical artists. Our approach exploits a PageRank based semantic similarity metric for computing similarity in RDF graph. From the data provided by DBpedia, the similarity results help find out similar artists for a given artist. By doing this, we are also be able to examine the suitability of DBpedia for this type of recommendation tasks. Experimental results show that the outcomes are encouraging.

Phuong T. Nguyen, Hong Anh Le

Opinion Analysis in Social Networks Using Antonym Concepts on Graphs

Abstract

In sentiment analysis a text is usually classified as positive, negative or neutral; in this work we propose a method for obtaining the relatedness or similarity that an opinion about a particular subject has with regard to a pair of antonym concepts. In this way, a particular opinion is analyzed in terms of a set of features that can vary depending on the field of interest. With our method, it is possible, for example, to determine the balance of honesty, cleanliness, interestingness, or expensiveness that is expressed in an opinion. We used the standard similarity measures Hirst-St-Onge, Jiang-Conrath and Resnik from WordNet; however, finding that these measures are not well-suitable for working with all Parts-of-Speech, we additionally proposed a new measure based on graphs, to properly handle adjectives. We validated our results with a survey to a sample of 20 individuals, obtaining a precision above 82 % with our method.

Hiram Calvo

Sensor Databases and Applications in Smart Home and City

Frontmatter

Traffic Speed Data Investigation with Hierarchical Modeling

Abstract

This paper presents a novel topic model for traffic speed analysis in the urban environment. Our topic model is special in that the parameters for encoding the following two domain-specific aspects of traffic speeds are introduced. First, traffic speeds are measured by the sensors each having a fixed location. Therefore, it is likely that similar measurements will be given by the sensors located close to each other. Second, traffic speeds show a 24-hour periodicity. Therefore, it is likely that similar measurements will be given at the same time point on different days. We model these two aspects with Gaussian process priors and make topic probabilities location- and time-dependent. In this manner, our model utilizes the metadata of the traffic speed data. We offer a slice sampling to achieve less approximation than variational Bayesian inferences. We present an experimental result where we use the traffic speed data provided by New York City.

Tomonari Masada, Atsuhiro Takasu

An Effective Approach to Background Traffic Detection

Abstract

Background (BG) traffic detection is an important task in network traffic analysis and management which helps to improve the QoS and QoE network services. Quickly detecting BG traffic from a huge amount of live traffic travelling in the network is a challenging research topic. This paper proposes a novel approach, namely the periodicity detection map (PDM), to quickly identify BG traffic based on periodicity analysis as BG traffic is commonly periodically generated by applications. However, it is not necessary that every BG traffic flow is periodic, hence the periodicity analysis based approaches cannot detect non-periodic BG flows. This paper also discusses the capability of applying a machine learning based classification method whose training dataset is collected from the results of the PDM method to solve this issue. Evaluation analysis and experimental results reveal the effectiveness and efficiency of the proposed approaches compared to the conventional methods in terms of computational costs, memory usage, and ratio of BG flows detected.

Quang Tran Minh

An Approach for Developing Intelligent Systems in Smart Home Environment

Abstract

Smart home systems are taken into account recently. By detecting abnormal usages in these systems may help users/organizations to better understand the usage of their home appliances and to distinguish unnecessary usages as well as abnormal problems which can cause waste, damages, or even fire. In this work, we first present an overview on the Smart Home Environments (SHEs) including their classification, architecture, and techniques which can be used in SHEs, as well as their applications in practice. We then propose a framework including methods for abnormal usage detection using home appliance usage logs. The proposed methods are validated by using a real dataset. Experimental results show that these methods perform nicely and would be promising for practice.

Tran Nguyen Minh-Thai, Nguyen Thai-Nghe

Emerging Data Management Systems and Applications

Frontmatter

Modelling Sensible Business Processes

Abstract

In this paper we develop the concept of sensible business process, which appears in opposition to the more traditional concept of mechanistic business process that is currently supported by most business process modelling languages and tools. A sensible business process is founded on a rich model and affords predominant human control. Having developed a modelling tool supporting this concept, in this paper we report on a set of experiments with the tool. The obtained results show that sensible business processes (1) capture richer information about business processes; (2) contribute to knowledge sharing in organisations; and (3) support better process models.

David Simões, Nguyen Hoang Thuan, Lalitha Jonnavithula, Pedro Antunes

Contractual Proximity of Business Services

Abstract

Business services arguably play a central role in service-based information systems as they would fill in the gap between the technicality of Service-Oriented Architecture and the business aspects captured in Enterprise Architecture. Business services have distinctive features that are not typically observed in Web services, e.g. significant portions of the functionality of business services might be executed in a human-mediated fashion. The representation of business services requires that we view human activity and human-mediated functionality through the lens of computing and systems engineering.

Given the specification of a relatively complex business service, practitioners can deal with its complexity either by breaking it down into constituent services through common practices such as outsourcing or delegation, or by picking up an existing group of services (e.g. from a service catalog) that best realize that functionality. To address these challenges, we devise a formal machinery to (a) verify if a group of services contractually match the specification of the larger service in question; (b) to assess the contractual proximity of service groups relative to a contractual service specification to help decide which combination of services from a catalog best realize the desired functionality.

Lam-Son Lê

Energy-Efficient VM Scheduling in IaaS Clouds

Abstract

This paper investigates the energy-aware virtual machine (VM) scheduling problems in IaaS clouds. Each VM requires multiple resources in fixed time interval and non-preemption. Many previous researches proposed to use a minimum number of physical machines; however, this is not necessarily a good solution to minimize total energy consumption in the VM scheduling with multiple resources, fixed starting time and duration time. We observe that minimizing total energy consumption of physical machines in the scheduling problems is equivalent to minimizing the sum of total busy time of all active physical machines that are homogeneous. Based on these observations, we proposed ETRE algorithm to solve the scheduling problems. The ETRE algorithm’s swapping step swaps an allocating VM with a suitable overlapped VM, which is of the same VM type and is allocated on the same physical machine, to minimize total busy time of all physical machines. The ETRE uses resource utilization during executing time period of a physical machine as the evaluation metric, and will then choose a host that minimizes the metric to allocate a new VM. In addition, this work studies some heuristics for sorting the list of virtual machines (e.g., sorting by the earliest starting time, or the longest duration time first, etc.) to allocate VM. Using log-traces in the Feitelson’s Parallel Workloads Archive, our simulation results show that the ETRE algorithm could reduce total energy consumption average by 48 % compared to power-aware best-fit decreasing (PABFD [6]) and 49 % respectively to vector bin-packing norm-based greedy algorithms (VBP-Norm-L1/L2 [15]).

Nguyen Quang-Hung, Nam Thoai

Multi-diagram Representation of Enterprise Architecture: Information Visualization Meets Enterprise Information Management

Abstract

Modeling Enterprise Architecture (EA) requires the representation of multiple views for an enterprise. This could be done by a team of stakeholders having different backgrounds. The enterprise model built by the team consists of a large number of model elements capturing various aspects of the enterprise. To deal with this high complexity, each stakeholder of the team may want to view only a certain aspect of the enterprise model of her interest. Essentially, the stakeholders need a modeling framework for their EA modeling. We devise a visual modeling language and develop a supporting tool called SeamCAD. Instead of managing a list of ill-related diagrams, the tool manages a coherent enterprise model and generates diagrams on demand, i.e. based on the stakeholders’ modeling scope and interests. The notation of the SeamCAD language was based on the Unified Modeling Language and comes with distinctive layout for the purposes of visually and explicitly showing hierarchical containment in the diagrams. We also report industrial applications of our tool and language in this paper. We position our work at the intersection of information visualization and enterprise information management.

Lam-Son Lê

Enhancing the Quality of Medical Image Database Based on Kernels in Bandelet Domain

Abstract

Diagnostic imaging has contributed significantly to improving the accuracy, timeliness and efficiency of healthcare. Most of medical images have blur combined with noise because of many reasons. This problem will give difficulties to health professionals because each of small details is very useful for the treatment process of doctors. In this paper, we proposed a new method to improve the quality of medical images. The proposed method includes two steps: denoising by Bayesian thresholding in bandelet domain and using the Kernels set for deblurring. We undervested the proposed method by calculating the PSNR and MSE values. This method gives the result better than the other recent methods available in literature.

Nguyen Thanh Binh

Information Systems Success: A Literature Review

Abstract

Information systems (IS) success is a significant topic of interest, not only for scholars and practitioners but also for managers. This paper reviews the IS success research with a multidimensional approach. Various articles in academic journals and international conference on the same theme and between 1992 and 2015 were investigated. The finding indicates that (i) methodological, empirical studies are dominant, (ii) the notion “success” is chiefly represented through individual (e.g., users/customers) benefits, and (iii) DeLone & McLean model is heavily employed during the time. Some research avenues are discussed. Besides, the research gaps will be opportunities for adding development and research trends.

Thanh D. Nguyen, Tuan M. Nguyen, Thi H. Cao

Context-Based Data Analysis and Applications

Frontmatter

Facilitating the Design/Evaluation Process of Web-Based Geographic Applications: A Case Study with WINDMash

Abstract

Web-based geographic applications are continuously evolving and are becoming increasingly widespread. Actually, many Web-based geographic applications have been developed in various domains, such as tourism, education, surveillance and military. However, designing these applications is still a cumbersome task because it requires multiple and high-level technical skills related not only to recent Web technologies but also to technologies dedicated to geographic information systems (GIS). For instance, it requires several components (e.g. maps, multimedia contents, indexing services, databases) that have to be assembled together. Hence, developers have to deal with different technologies and application behaviour models. In order to take the designers out of this complexity and thus facilitate the design/evaluation of Web-based geographic applications, we propose a framework that focuses on both designers’ creativity and model executability. This framework has been implemented in a prototype named WINDMash, a Web mashup environment that designers can use both to create and to assess interactive Web-based applications that handle geographical information.

The Nhan Luong, Christophe Marquesuzaà, Patrick Etcheverry, Thierry Nodenot, Sébastien Laborie

A Context-Aware Recommendation Framework in E-Learning Environment

Abstract

The explosion of world-wide-web has offered people a large number of online courses, e-classes and e-schools. Such e-learning applications contain a wide variety of learning materials which make learners confused to select. In order to address this problem, in this paper we propose a context-aware recommendation framework to suggest a number of suitable learning materials for learners. In the proposed approach, firstly we present a method to determine contextual information implicitly. We then describe a technique to gain ratings from the study results data of learners. Finally, we propose two methods to predict and recommend potential items to active users. The first one is STI-GB for context-aware collaborative filtering (CACF) with contextual modeling approach combined from graph-based clustering technique and matrix factorization (MF). The second one is AVG which predicts ratings based on average calculation method. Experimental results reveal that the proposed consistently outperforms ISMF (combination of Item Splitting and MF), context-aware matrix factorization (CAMF) in terms of prediction accuracy.

Phung Do, Hung Nguyen, Vu Thanh Nguyen, Tran Nam Dung

Automatic Evaluation of the Computing Domain Ontology

Abstract

Ontology plays an important role in the recent years. Its applications now are more popular and variety. Ontologies are used in the different areas related to Information Technology, Biology, and Medicine, especially in Information Retrieval, Information Extraction, and Question Answering. Ontologies capture background knowledge by providing relevant terms and the formal relations between them, so that they can be used in a machine-processable way. Depending on the different applications, the structure of ontologies has been built and designed with different models. Good ontologies lead directly to a higher degree of reuse and a better cooperation over the boundaries of applications and domains. However, there are a number of challenges that must be faced when evaluating ontologies. In this paper, we propose a novel approach based on data-driven and information extraction system for evaluating the lexicon/vocabulary and consistency of a domain specific ontology. Furthermore, we evaluate the ontological structure and the relations of some terms of the ontology.

Chien D. C. Ta, Tuoi Phan Thi

Data Models and Advances in Query Processing

Frontmatter

Comics Instance Search with Bag of Visual Words

Abstract

Comics is rapidly developing and attracting a lot of people around the world. The problem is how a reader can find a translated version of a comics in his or her favorite language when he or she sees a certain comics page in another language. Therefore, in this paper, we propose a comics instance search based on Bag of Visual Words so that readers can find in a collection of translated versions of various comics with a single instance as a comics page in an arbitrary language. Our method is based on visual information and does not rely on textual information of comics. Our proposed system uses Apache Lucene to handle inverted index process to find comics pages with visual words and spatial verification using RANSAC to eliminate bad results. Experimental results on our dataset with 20 comics containing more than 270,000 images achieve the accuracy up to 77.5 %. This system can be improved for building a commercial system that allows a reader easily search a multi-language collection of comics with a comics page as an input query.

Duc-Hoang Nguyen, Minh-Triet Tran, Vinh-Tiep Nguyen

Defining Membership Functions in Fuzzy Object-Oriented Database Model

Abstract

In this paper, we focus study the characteristics of fuzzy attributes, object/class, class/superclass basing on approximate semantic approach to hegde algebras (HA). On this basis, we present methods of determining the membership degree on the fuzzy characteristics this.

Doan Van Thang, Dang Cong Quoc

Backmatter

Titel: Future Data and Security Engineering
herausgegeben von: Tran Khanh Dang
Prof. Roland Wagner
Josef Küng
Nam Thoai
Makoto Takizawa
Erich Neuhold
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-26135-5
Print ISBN: 978-3-319-26134-8
DOI: https://doi.org/10.1007/978-3-319-26135-5