Skip to main content

2017 | Buch

Future Data and Security Engineering

4th International Conference, FDSE 2017, Ho Chi Minh City, Vietnam, November 29 – December 1, 2017, Proceedings

herausgegeben von: Tran Khanh Dang, Prof. Dr. Roland Wagner, Josef Küng, Nam Thoai, Makoto Takizawa, Erich J. Neuhold

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the Third International Conference on Future Data and Security Engineering, FDSE 2016, held in Can Tho City, Vietnam, in November 2016.
The 28 revised full papers and 7 short papers presented were carefully reviewed and selected from 128 submissions. The accepted papers were grouped into the following sessions: Advances in query processing and optimization
Big data analytics and applications
Blockchains and emerging authentication techniques
Data engineering tools in software development
Data protection, data hiding, and access control
Internet of Things and applications
Security and privacy engineering
Social network data analytics and recommendation systems

Inhaltsverzeichnis

Frontmatter

Invited Keynotes

Frontmatter
Eco Models of Distributed Systems

It is critical to reduce electric energy consumed in information systems, especially servers in scalable clusters, in order to realize eco society. In this paper, we take the macro-level approach to reducing the electric energy consumption of servers. Here, we aim at reducing the total energy consumed by a server to perform application processes. First, we discuss power consumption models which give electric power to be consumed by a server to perform application processes. We also discuss computation models of a server which give execution time of application processes. Based on the models, we discuss algorithms to select an energy-efficient server to perform an application process issued by a client. We also discuss algorithms to migrate processes on host servers to more energy-efficient servers by using virtual machines.

Dilawaer Duolikun, Ryo Watanabe, Makoto Takizawa
Estimating the Assessment Difficulty of CVSS Environmental Metrics: An Experiment

[Context] The CVSS framework provides several dimensions to score vulnerabilities. The environmental metrics allow security analysts to downgrade or upgrade vulnerability scores based on a company’s computing environments and security requirements. [Question] How difficult is for a human assessor to change the CVSS environmental score due to changes in security requirements (let alone technical configurations) for PCI-DSS compliance for networks and systems vulnerabilities of different type? [Results] A controlled experiment with 29 MSc students shows that given a segmented network it is significantly more difficult to apply the CVSS scoring guidelines on security requirements with respect to a flat network layout, both before and after the network has been changed to meet the PCI-DSS security requirements. The network configuration also impact the correctness of vulnerabilities assessment at system level but not at application level. [Contribution] This paper is the first attempt to empirically investigate the guidelines for the CVSS environmental metrics. We discuss theoretical and practical key aspects needed to move forward vulnerability assessments for large scale systems.

Luca Allodi, Silvio Biagioni, Bruno Crispo, Katsiaryna Labunets, Fabio Massacci, Wagner Santos

Advances in Query Processing and Optimization

Frontmatter
Fast Top-Q and Top-K Query Answering

Efficient retrieval of the most relevant (e.g. top-k, k-NN) tuples is an important requirement in information systems which access large amounts of data. Top-k (or k-nearest-neighbors) queries retrieve the k-objects which score best for a specified objective function. But retrieving the closest objects does not tell the user how close or similar the objects are to the ideal object described by the input query. To support the query issuer more appropriate we introduce the top-q query answeringTQQA which does not return a fixed number of result tuples but all tuples that are similar to the searched optimum with at least some minimum degree q. We show how to combine top-q queries with top-k queries enabling the user to post a large number of interesting queries. To the best of our knowledge neither such a top-q query answering approach nor a combination with top-k has not been proposed before. We implemented our approach and evaluated it against the best position algorithm BPA-2 which proved to be the among the fastest threshold based top-k query answering approaches. Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements.

Claus Dabringer, Johann Eder
Low-Latency Radix-4 Multiplication Algorithm over Finite Fields

The multiplication over finite fields is the most basic and important arithmetic operation. In this paper, we propose a low-latency radix-4 multiplication algorithm based shifted polynomial basis (SPB) over finite fields. The existing multiplication algorithm using SPB has the critical path delay of one 2-input AND gate, one 2-input XOR gate, and one 1-bit latch, and the latency of about 0.5 m clock cycles. Our proposed radix-4 multiplication algorithm has the critical path delay of two 2-input AND gate, two 2-input XOR gate, and one 1-bit latch, and the latency of 0.25 m clock cycles. Our radix-4 multiplication algorithm saves about 20% time complexity as compared to the existing multiplication algorithm based on SPB. Therefore, we expect that the proposed algorithm can lead to a hardware architecture which has a considerably low latency. The multiplier applying the proposed algorithm will be a highly modular architecture and be thus well suited for VLSI implementations.

Kee-Won Kim, Hyun-Ho Lee, Seung-Hoon Kim
An Iterative Algorithm for Computing Shortest Paths Through Line Segments in 3D

A version of the geometrical shortest path problem is to compute a shortest path connecting two points and passing a finite set of line segments in three dimensions. This problem arises in the pursuit path problem and also be used as a tool to finding shortest paths on polyhedral surface. This paper presents an iterative algorithm for dealing with the problem, particularly with large data. The idea is to simultaneously determines on each segment a point such that the length of the path successively connecting the points is decreased. We show that after a finite number of iterations, the algorithm converges to give an approximate solution. The algorithm is implemented in C++ and tested for large datasets. The numerical results are shown and discussed.

Le Hong Trang, Quynh Chi Truong, Tran Khanh Dang
On Transformation-Based Spatial Access Methods with Monotone Norms

A norm-induced distance metric for spatially extended objects provides a formal basis for analytical work in transformation-based multidimensional spatial access methods such as locality preservation of the underlying transformation. We study a monotone-normed distance metric on the space of multidimensional polytopes, and prove a tight relationship between the distance metrics on the original space of k-dimensional hyperrectangles and the transform space of 2k-dimensional points via an arbitrary monotone norm under the corner transformation.

H. K. Dai
Query Answering System as a Tool in Incomplete Distributed Information System Optimization Process

We assume there is a group of connected distributed information systems (DIS). They work under the same ontology. Each information system create its own knowledgebase. Values of attributes in information system $$ S $$S form atomic expressions of a language used for communication with others. Collaboration among systems is initiated when one of them (called a client) is asked to resolve a query containing nonlocal attributes for $$ S $$S. In such case, the client has to ask for help other information systems to have that query answered. As the result of its request, knowledge is extracted locally in each information system and sent back to the client. The outcome of this step is a knowledgebase created at the client site, which can be used to answer given query. In this paper we present a method of identifying which information system is semantically the closest to client.

Agnieszka Dardzinska, Katarzyna Ignatiuk, Małgorzata Zdrodowska
Using a Genetic Algorithm in a Process of Optimizing the Deployment of Radio Stations

The article deals with an optimization issue of radio station deployment. This deployment is very important in order to gain the accuracy of the multilateration positioning system. However, deployment of stations is in many cases empirically addressed, which results in different quality of obtained results/data. This process still works with help of selecting multiple locations and following testing how the deployment works. The aim of this research is to design an algorithm, which is necessary for the optimization process of radio station deployment. In order to achieve this goal, the information and support of computing systems from company ERA a.s., was used as well. Czech company ERA a.s. is dealing with this issue for many years.

Barbora Tesarova, Andrea Vokalova

Big Data Analytics and Applications

Frontmatter
IFIN+: A Parallel Incremental Frequent Itemsets Mining in Shared-Memory Environment

In an effort to increase throughput for IFIN, a frequent itemsets mining algorithm, in this paper we introduce a solution, called IFIN+, for parallelizing the algorithm IFIN with shared-memory multithreads. The inspiration for our motivation is that today commodity processors’ computational power is enhanced with multi physical computational units; and therefore, exploiting full advantage of this is a potential solution for improving performance in single-machine environments. Some portions in the serial version are changed in means which increase efficiency and computational independence for convenience in designing parallel computation with Work-Pool model, be known as a good model for load balance. We conducted experiments to evaluate IFIN+ against its serial version IFIN, the well-known algorithm FP-Growth and other two state-of-the-art ones FIN and PrePost+. The experimental results show that the running time of IFIN+ is the most efficient, especially in the case of mining at different support thresholds in the same running session. Compare to its serial version, IFIN+ performance is improved significantly.

Van Quoc Phuong Huynh, Josef Küng, Markus Jäger, Tran Khanh Dang
Parallel Algorithm of Local Support Vector Regression for Large Datasets

We propose the new parallel algorithm of local support vector regression (local SVR), called kSVR for effectively dealing with large datasets. The learning strategy of kSVR performs the regression task with two main steps. The first one is to partition the training data into k clusters, followed which the second one is to learn the SVR model from each cluster to predict the data locally in the parallel way on multi-core computers. The kSVR algorithm is faster than the standard SVR for the non-linear regression of large datasets while maintaining the high correctness in the prediction. The numerical test results on datasets from UCI repository showed that our proposed kSVR is efficient compared to the standard SVR.

Le-Diem Bui, Minh-Thu Tran-Nguyen, Yong-Gi Kim, Thanh-Nghi Do
On Semi-supervised Learning with Sparse Data Handling for Educational Data Classification

An educational data classification task at the program level is investigated in this paper. This task concentrates on predicting the final study status of each student from the second year to the fourth year in their study path. By doing that, in-trouble students can be predicted as soon as possible. However, the task faces two main problems. The first problem is the existence of incomplete data once we conduct an early prediction and the second one is the lack of labeled data for a supervised learning process of this task. In order to overcome those difficulties, our work proposes a robust semi-supervised learning method with sparse data handling in either sequential or iterative approach. The sparse data handling process can help us with the k-nearest neighbors-based data imputation and the semi-supervised learning process with a random forest model as a base learner can exploit the availability of a larger set of unlabeled data in the task. These two processes can be conducted in sequence or integrated in each other for robustness and effectiveness in educational data classification. The experimental results show that our resulting robust random forest-based self-training algorithm with the iterative approach to sparse data handling outperforms the other algorithms with different sequential and traditional approaches for conducting the task. This algorithm provides us with a more effective classifier as a practical solution on educational data over the time.

Vo Thi Ngoc Chau, Nguyen Hua Phung
Logistic Regression Methods in Selected Medical Information Systems

This paper presents the process of building a new logistic regression model, which aims to support the decision-making process in medical database. The developed logistic regression model defines the probability of the disease and indicates the statistically significant changes that affect the onset of the disease. The value of probability can be treated as one of the feature in decision process of patient’s future treatment.

Anna Kasperczuk, Agnieszka Dardzinska

Blockchains and Emerging Authentication Techniques

Frontmatter
Mapping Requirements Specifications into a Formalized Blockchain-Enabled Authentication Protocol for Secured Personal Identity Assurance

The design and development of novel security and authentication protocols is a challenging task. Design flaws, security and privacy issues as well as incomplete specifications pose risks for its users. Authcoin is a blockchain-based validation and authentication protocol for secure identity assurance. Formal methods, such as Colored Petri Nets (CPNs), are suitable to design, develop and analyze such new protocols in order to detect flaws and mitigate identified security risks. In this work, the Authcoin protocol is formalized using Colored Petri Nets resulting in a verifiable CPN model. An Agent-Oriented Modeling (AOM) methodology is used to create goal models and corresponding behavior models. Next, these models are used to derive the Authcoin CPN models. The modeling strategy as well as the required protocol semantics are explained in detail. Furthermore, we conduct a state-space analysis on the resulting CPN model and derive specific model properties. The result is a complete and correct formal specification that is used to guide future implementations of Authcoin.

Benjamin Leiding, Alex Norta
Gait Recognition with Multi-region Size Convolutional Neural Network for Authentication with Wearable Sensors

As inertial sensors are low-cost, easy-to-use, and can be integrated in wearable devices, they can be used to establish as a new modality for user authentication in the smart environment in which computing systems can recognize persons implicitly by their walking patterns. This motivates our proposal to use multi-region size Convolutional Neural Network to recognize users from their gait patterns recorded from accelerometers and gyroscopes in mobile and wearable devices.Experiments on Inertial Sensor Dataset of OU-ISIR Gait Database, the largest inertial sensor-based gait database, demonstrate that our best CNN models provide the accuracy of $$96.84\%$$96.84% and EER of $$10.43\%$$10.43%, better than those of existing methods. Furthermore, we also prove by experiments that by using only a subset of subjects in OU-ISIR dataset to train CNN models, our method can achieve the accuracy and EER approximately $$(95.53 \pm 0.82)\%$$(95.53±0.82)% and $$(11.60 \pm 0.98)\%$$(11.60±0.98)%, respectively.

Khac-Tuan Nguyen, Thanh-Luong Vo-Tran, Dat-Thanh Dinh, Minh-Triet Tran

Data Engineering Tools in Software Development

Frontmatter
Agile Software Engineering Methodology for Information Systems’ Integration Projects

In this paper, first the notions are defined that are important for running integration projects – system of systems and sociotechnical system – and it is then argued that integrated systems should be treated as sociotechnical systems of systems. This is followed by defining the conceptual framework – viewpoint framework – required for agile engineering of such systems. Based on the viewpoint framework, the agile software engineering methodology for engineering sociotechnical systems of systems is then defined, proceeding by different viewpoint aspects. Presenting the methodology is illustrated by examples from the ongoing large-scale integration project by the European Union.

Kuldar Taveter, Alex Norta
Effectiveness of Object Oriented Inheritance Metrics in Software Reusability

Inheritance is a key feature of object oriented paradigm. It is actually the sharing of attributes and operations among classes based on a hierarchical relationship. Software reusability is the basic concept of software engineering that is affected by the sophistication of inheritance hierarchy so in order to determine complexity of inheritance which in turn has impact on software reusability; we have proposed class inheritance metrics and explained them in an elaborative manner. In the work presented here we proposed different class inheritance metrics, compared them with existing ones and attempted to present an alternate solution with some extended features to find out intricacy of class inheritance which significantly concerns with reusability.

Muhammad Ilyas, Josef Küng, Van Quoc Phuong Huynh

Data Protection, Data Hiding, and Access Control

Frontmatter
Security Analysis of Administrative Role-Based Access Control Policies with Contextual Information

In many ubiquitous systems, Role-based Access Control (RBAC) is often used to restrict system access to authorized users. Spatial-Temporal Role-Based Access Control (STRBAC) is an extension of RBAC with contextual information (such as time and space) and has been adopted in real world applications. In a large organization, the RBAC policy may be complex and managed by multiple collaborative administrators to satisfy the evolving needs of the organization. Collaborative administrative actions may interact in unintended ways with each other’s that may result in undesired effects to the security requirement of the organization. Analysis of these RBAC security concerns have been studied, especially with the Administrative Role-Based Access Control (ARBAC97). However, the analysis of its extension with contextual information, e.g., STRBAC, has not been considered in the literature. In this paper, we introduce a security analysis technique for the safety of Administrative STRBAC (ASTRBAC) Policies. We leverage First-Order Logic and Symbolic Model Checking (SMT) by translating ASTRBAC policy to decidable reachability problems. An extensive experimental evaluation confirms the correctness of our proposed solution, which supports finite ASTRBAC policies analysis without prior knowledge about the number of users.

Khai Kim Quoc Dinh, Tuan Duc Tran, Anh Truong
Metamorphic Malware Detection by PE Analysis with the Longest Common Sequence

Metamorphic malware detection is one of the most challenging tasks of antivirus software because of the difference in signatures of new variants from preceding one [1]. This paper proposes the method for the metamorphic malware detection by Portable Executable (PE) Analysis with the Longest Common Sequence (LCS). The proposed method contains the following phase: The raw feature extraction obtains valuable features like the information of Windows PE files which are PE header information, dependencies imports and API call functions, the code segments inside each of Windows PE file. Next, these segments are used for generating the detectors, which are later used to determine affinities with code segments of executable files by the longest common sequence algorithm. Finally, header, imports, API call information and affinities are combine into vectors as input for classifiers are used for classification after a dimensionality reduction. The experimental results showed that the proposed method can achieve up to 87.1% precision, 63.3% recall for benign and 92.6% precision, 93.7% for average malware.

Thanh Nguyen Vu, Toan Tan Nguyen, Hieu Phan Trung, Thao Do Duy, Ke Hoang Van, Tuan Dinh Le
A Steganography Technique for Images Based on Wavelet Transform

One of the possible ways of hiding secret information is by using images. Images are the most common type of payload in terms of their availability and usage in steganographic applications. They are capable of hiding secret information because the human eye is less sensitive to minor changes in an image. In this paper, we propose a steganographic technique through Haar Discrete Wavelet transformation where data is hidden in the frequency domain. We apply the method on Computer Vision Group (CVG) image data on two scales (512 and 1024) and share its results in colour channels after embedding a different capacity of text. This allows us to observe the different behaviours of each image from the colour planes and the main measures of its evaluation.

Ayidh Alharbi, M-Tahar Kechadi

Internet of Things and Applications

Frontmatter
Activity Recognition from Inertial Sensors with Convolutional Neural Networks

Human Activity Recognition is one of the attractive topics to develop smart interactive environment in which computing systems can understand human activities in natural context. Besides traditional approaches with visual data, inertial sensors in wearable devices provide a promising approach for human activity recognition. In this paper, we propose novel methods to recognize human activities from raw data captured from inertial sensors using convolutional neural networks with either 2D or 3D filters. We also take advantage of hand-crafted features to combine with learned features from Convolution-Pooling blocks to further improve accuracy for activity recognition. Experiments on UCI Human Activity Recognition dataset with six different activities demonstrate that our method can achieve 96.95%, higher than existing methods.

Quang-Do Ha, Minh-Triet Tran
Accuracy Improvement for Glucose Measurement in Handheld Devices by Using Neural Networks

In this study, an approach for improving the accuracy of glucose measurement by using handheld devices is presented. The proposed approach is based on reducing the effects of hematocrit. The hematocrit is estimated by using a neural network which is trained by a non-iterative learning algorithm. The inputs for neural network are sampled from the transduced current curve. This current curve is generated during the chemical reactions of glucose measurement process in the handheld devices. The experiments performed on a real dataset show that the accuracy of glucose measurement using the handheld devices is improved by using the proposed approach.

Hieu Trung Huynh, Ho Dac Quan, Yonggwan Won
Towards a Domain Specific Framework for Wearable Applications in Internet of Things

Generating source code from a software spec to automate software development is arguably one of the most challenging tasks due to, for example, the complexity of software domain, richness of user interface and the heterogeneity of development platforms. Domain-specific approaches make code generation technically possible by narrowing down the software domain. Internet of Things is a paradigm shift in computing that might eventually give rise to the proliferation of dedicated software methods and tools. Domain-specific software engineering in this new computing paradigm leaves a lot to be desired. In this paper, we propose an approach to semi-automatically generating C code out of a visual design for the software module controlling wearable devices. The visual design consists of (i) input panel describing components that receive input data and how they are wired to the module; (ii) output panel describing components that produce output data and how they are wired to the module; (iii) connectivity and data storage; and (iv) state machine of the module. We have tested our domain-specific framework in a case-study where wearable devices used for ordering (i.e., to serve restaurant’s clients) and delivering (i.e., to assist restaurant’s waiters) food at a restaurant need to be developed.

Long-Phuoc Tôn, Lam-Son Lê, Hoang-Anh Pham
Development and Implementation of a Web Application for the Management of Data Recorded by a Carbon Monoxide Sensor

The purpose of this work is to develop and implement a web application that can store, analyze and graph the data recorded by a carbon monoxide (CO) sensor. An artificial ventilation system is also proposed to reduce the amount of carbon monoxide concentration by activating fans. In this paper, four levels of carbon monoxide concentration were identified ranging from 20 to 449 parts per million (ppm) which is the normal level. Depending on the level of CO recorded by the sensor, a certain number of fans will be activated. The project is based on the work in [1] where a CO sensor sends data to the web application which then stores it in a database. The user can access the application via any device with access to the internet to be aware in real time of the data recorded by the sensor. The information recorded by the sensor can be detailed through charts and reports; this data includes the date and time of the last dangerous event, the number of registered users, the day of execution of the query and the time when the sensor is connected to the app.

Deisy Dayana Zambrano Soto, Octavio José Salcedo Parra, Diana Stella García Miranda

Security and Privacy Engineering

Frontmatter
Privacy-Aware Data Analysis Middleware for Data-Driven EHR Systems

Privacy preservation is an essential requirement for information systems and also it is regulated by law. However, existing solutions for privacy protection during data analysis have some limitations when applied to data-driven electronic health record (EHR) systems such as data distortion and flexibility. This paper presents a novel approach to deal with this issue that is a privacy-aware protocol for healthcare data analysis. This approach uses special secured views. For the compatibility with data-driven EHR systems, the protocol is proposed together with a high-level middleware architecture. The suggested solution is discussed based on system requirements and specification to demonstrate its advantages.

Thien-An Nguyen, Nhien-An Le-Khac, M-Tahar Kechadi
An Exact Consensus-Based Network Intrusion Detection System

In a recent work Toulouse et al. [1] introduced a fully distributed network intrusion detection system (NIDS) based on an average consensus algorithm. In this initial work, modules of the NIDS repeatedly average their state with the state of their neighbors to converge asymptotically to a same value, which in turn is used as measurement of some relevant state of the network wide monitored traffic. In the present work, local averaging is used to implement a finite convergence procedure for the consensus-based NIDS in [1]. We call this implementation exact consensus as local averaging computes exactly in a finite number of steps a function of the initial NIDS states. Furthermore, unlike asymptotic consensus which computed only the average sum function, this new distributed protocol can compute almost any function of the initial NIDS states. Tests are performed that compare the asymptotic consensus with this new exact consensus protocol. In particular, we compare the convergence speed of the two methods given a same pre-defined level of accuracy in the decisions computed by the intrusion detection system.

Michel Toulouse, Quang Tran Minh, Thao Nguyen
Binary Tree Based Deterministic Positive Selection Approach to Network Security

Positive selection is one of artificial immune approaches, which finds application in network security. It relies on building detectors for protecting self cells, i.e. positive class objects. Random selection used to find candidates for detectors gives good results if the data is represented in a non-multidimensional space. For a higher dimension many attempts may be needed to find a detector. In an extreme case, the approach may fail due to not building any detector. This paper proposes an improved version of the positive selection approach. Detectors are constructed based on self cells in a deterministic way and they are stored in a binary tree structure. Thanks to this, each cell is protected by at least one detector regardless of the data dimension and size. Results of experiments conducted on network intrusion data (KDD Cup 1999 Data) and other datasets show that the proposed approach produces detectors of similar or better quality in a considerably shorter time compared with the probabilistic version. Furthermore, the number of detectors needed to cover the whole self space can be clearly smaller.

Piotr Hońko
Application of Rough Sets to Negative Selection Algorithms

Immune-based algorithms are mainly used for detecting anomalies in datasets from various domains. However, one of the main areas in which they are applied is computer security. Due to increased number of victim connections, a new effective approaches still are needed. Negative Selection Algorithms seem to be very interesting as they have a unique feature which allow for detecting new type of attacks. This paper presents the possibility of applying the rough sets inspirations to improve its efficiency and deal with uncertainty and inconsistency in data.

Andrzej Chmielewski
Focusing on Precision- and Trust-Propagation in Knowledge Processing Systems

In knowledge processing systems, when gathered data and knowledge from several (external sources) is used, the trustworthiness and quality of the information and data has to be evaluated before continuing processing with these values. We try to address the problem of the evaluation and calculation of possible trusting values by considering established methods from known literature and recent research.After the calculation, the obtained values have to be processed, depending on the complexity of the system, where the values are used and needed. Here the way of trust propagation, precision propagation and their aggregation or fusion is crucial, when multiple input values come together in one processing step. We discuss elaborated trust definitions already available and according options for trust and precision aggregation and propagation in units of knowledge processing.

Markus Jäger, Jussi Nikander, Stefan Nadschläger, Van Quoc Phuong Huynh, Josef Küng
MITIS - An Insider Threats Mitigation Framework for Information Systems

Cloud computing is now among the most extensively used mean for resource sharing as SaaS, PaaS, and IaaS. Computing Scenarios have been emerged into cloud computing instead of distributed computing. It has provided an efficient and flexible way for dynamic services meeting needs and challenges of the time in cost effective manners. Virtual environments provided the opportunity to migrate traditional systems to the cloud. Cloud service providers and Administrators generally have full access on Virtual Machines (VMs) whereas tenants have limited access on respective VMs. Cloud Admins as well as remote administrators also have full access rights on respective resources and may pose severe insiders threats on which tenants haven shown their concerns. Securing these resources are the key issues. In this paper, available practices for cloud security are investigated and a self-managed framework is introduced to mitigate malicious insider threats posed to these virtual environments.

Ahmad Ali, Mansoor Ahmed, Muhammad Ilyas, Josef Küng

Social Network Data Analytics and Recommendation Systems

Frontmatter
Integrating Knowledge-Based Reasoning Algorithms and Collaborative Filtering into E-Learning Material Recommendation System

This paper proposes the new method which adopts the knowledge-based reasoning algorithms and collaborative filtering to create an e-learning material recommendation system. Major problems in recommendation system (RS) will be considered, including data preprocess, feature extraction, combination of knowledge-based reasoning and collaborative filtering algorithms, method of forming a weighted hybrid RS for better prediction. The experimental results show that our proposed method can achieve better prediction accuracy when comparing to rule-based reasoning (RBR), case-based reasoning (CBR), and Matrix Factorization (MF).

Phung Do, Kha Nguyen, Thanh Nguyen Vu, Tran Nam Dung, Tuan Dinh Le
A Semantic-Based Recommendation Approach for Cold-Start Problem

Recommender systems (RS) can predict a list of items which are appropriated to users by using collaborative or content-based filtering methods. The former is more popular than the latter approach, however, it suffers from cold-start problem which can be known as new-user or new-item problems. Since the user/item firstly appears in the system, the RS has no data (feedback) to learn, thus, it cannot provide any recommendation. In this work, we propose using a semantic-based approach to tackle the cold-start problem in recommender systems. With this approach, we create a semantic model to retrieve past similarity data given a new user. Experimental results show that the proposed approach works well for the cold-start problem.

Huynh Thanh-Tai, Nguyen Thai-Nghe
Identifying Key Player Using Sum of Influence Probabilities in a Social Network

There have been a number of researches on finding key players. This paper proposes a new approach to discover key players based on the theory of probability, according to which the probability of successful diffusion of an innovation through a social network is applied. This work presents a formula for measuring influence probability or probability of successful propagation of an innovation from a person to another through the network, proposes a definition of key player, based on sum of influence probabilities. Proposed definition completely fits with the Independent Cascade Model of diffusion process in a social network. It would be easy for understanding and applying, especially in marketing domain, in which effectiveness of each marketing campaign is often considered as total number of adoptions of a new product or innovation through the campaign.

Ngo Thanh Hung, Huynh Thanh Viet

Emerging Data Management Systems and Applications

Frontmatter
Key Success Factors in Introducing National e-Identification Systems

The following article seeks to investigate what the main success factors are when implementing national e-identification systems as a part of e-governance strategies. The article reviews the case of Ukraine that currently is in the beginning of e-identification management system deployment. In frames of the paper, positive experience of foreign countries in electronic identity management is examined aiming to outline lessons that can be learned by Ukraine. The article aims to identify main issues and problems that inhibit the development of successful e-identification system in Ukraine assuming citizens’ awareness as one of the key success factors. Positioning it as a crucial factor is underpinned by means of conducting a survey among Ukrainian citizens. Based on conducted interviews with officials, a local government e-identity solution is discussed as a project that can be potentially applicable on a national level. Personal vision of authors on improving and raising citizens’ awareness on e-government and e-identification is presented as a recommendation for stakeholders’ consideration, being at the same time a hypothesis for future studies.

Valentyna Tsap, Ingrid Pappel, Dirk Draheim
The Digital Archiving Process in Estonia – Assessment and Future Perspectives

Although the National Archive of Estonia (NAE) has a full-function digital archive (reception, storage, use) still not a lot of digital content has been provided for digital preservation in the Estonian National archives. This raises problems concerning the digital preservation of the valuable content for future generations. Institutions complain about the complexity of the archiving process despite the Estonian digital archiving system shows a very user-friendly design. The main purpose of the research project was first, to assess the current status of the digital archiving process in Estonia; secondly, to identify bottlenecks and, thirdly, to provide solutions in order to simplify the systems. This paper discusses the results of this project by bringing out the answers to proposed issues as well recommendations.

Ingrid Pappel, Karin Oolu, Koit Saarevet, Mihkel Lauk, Dirk Draheim
Is There a Need for a New Generation of Congestion Control Algorithms in Point-to-Point Networks?

The main purpose of this paper is to answer the following question: Is there a need for a new generation of congestion control algorithms in point-to-point networks? In order to give an answer, the current behavior of distinct congestion control algorithms was studied over different point-to-point network scenarios, with several metrics such as: segment size, buffer size, transfer speed and time of acknowledgement (ACK). After the failures were discovered, the results were analyzed and various recommendations were established aiming at decreasing congestion in TCP point-to-point networks.

Andrés Felipe Hernández Leon, Octavio José Salcedo Parra, Danilo Alfonso López Sarmiento
Backmatter
Metadaten
Titel
Future Data and Security Engineering
herausgegeben von
Tran Khanh Dang
Prof. Dr. Roland Wagner
Josef Küng
Nam Thoai
Makoto Takizawa
Erich J. Neuhold
Copyright-Jahr
2017
Electronic ISBN
978-3-319-70004-5
Print ISBN
978-3-319-70003-8
DOI
https://doi.org/10.1007/978-3-319-70004-5