Sphere Decoding Algorithm Based on a New Radius Definition

The problem of integer least squares (ILSP) is playing a significant role in cryptography. ILSP is equivalent to finding the closest lattice point to a given point and is known to be NP-hard. Sphere decoding (SD) is an effective method for solving the ILSP. In this paper, we mainly solve ILSP by the sphere decoding algorithm (SDA). One of the key issues in SDA is how to implement a more effective tree pruning strategy. In this paper a new definition for sphere radius is proposed as a tree pruning strategy to reduce the computational complexity. The core idea of our algorithm (K-SE-SD) is to sacrifice a relatively small reduction in accuracy to reduce relatively more time complexity. Our experiments demonstrate that on average the proposed idea can reduce about 70.8% running time with an accuracy of 86.8% when $$k = \lceil n/3 \rceil $$k=⌈n/3⌉, where n is the lattice dimension.

Ping Wang, DongDong Shang, Zhiwei Sun

Inferring Travel Purposes for Transit Smart Card Data Using

Understanding travel purposes is crucial for urban transportation planning and resource allocation. Conventionally, travel purposes are obtained from household travel survey, however, household travel survey is usually conducted every 5–10 years and only has 1–2% sampling size over the whole urban dwellers. Therefore, the information on travel purposes is very limited and usually biased which cause mismatch in urban and transportation planning. Meanwhile, many cities have accumulated a large amount of transit data, such as those from transit smart cards. Such data contains many individual traveling records, but has not been included to generate travel survey data because of lacking the information of travel purposes. To make fully use the data and to generate more comprehensive travel data, this study attempted to infer travel purposes for smart card data by a naïve Bayes probabilistic model. Experimental results demonstrated that proposed method could infer commuting activities with the accuracy of more than 95%, while the accuracy of predicting other activities was about 60%. This is a promising approach to integrated big data into transportation work routines.

Zhenzhen Liu, Qing-Quan Li, Yan Zhuang, Jiacheng Xiong, Shuiquan Li

A Hybrid Music Recommendation System Based on Scene-State Perception Model

In recent years, the recommendation based on mobile users and the one based on context-aware have become popular topics in the field of the recommendation system. However, most of the music platforms need manual annotation of user scene which means if the user forgets to do that, the recommendation system may fail to work. In this paper, we propose a scene-sensing model based on Naive Bayesian classification which can be used to automatically locate the users’ scene and predict their state in real time. Exactly established on the basis of user scene and life state, we propose a hybrid music recommendation system which combines the recommendation result of SVD++ collaborative filtering model and logical regression model which is used to predict the most recent popular music. Experimental results indicates that the hybrid recommendation system perform well on mobile users.

Zhixuan Liang, Zehao Tan, Zhenyue Zhuo, Xi Zhang

Research on Parallel Architecture of OpenCL-Based FPGA

Moore’s law encounters a bottleneck today. Computing power of the general purpose processor is restricted. At the same time, new types of enterprise computing such as big data management and analysis bring more challenges to the computational performance and scalability of the data center. Research efforts have been devoted to accelerating algorithm on Field Programmable Gate Arrays (FPGAs), due to their high performance and reprogramming. In this paper, we first study the heterogeneous platform of OpenCL-based FPGA, and propose a novel multi-computing unit combined with internal hardware flow parallel acceleration framework. Then, we evaluate the influences of different number of computing units on performance and resource utilization with the high performance computing applications (AES algorithm) that implemented through the proposed framework. Meanwhile, we compare the performance with CPU implementation. The result shows that our proposed framework has advantages of high performance and scalability for the implementation of a class of algorithms suitable for parallelization, and suits for the demands of data center and high performance computing (HPC) applications.

Yi Zhang, Ye Cai, Qiuming Luo

Smart Resource Allocation Using Reinforcement Learning in Content-Centric Cyber-Physical Systems

The exponential growing rate of the networking technologies has led to a dramatical large scope of the connected computing environment. As a novel computing deployment, Cyber-Physical Systems (CPSs) are considered an alternative for achieving high performance by the enhanced capabilities in system controls, resource allocations, data exchanges, and flexible adoptions. However, current CPS is encountering the bottleneck concerning the resource allocation due to the mismatching networking service quality and complicated service offering environments. The concept of Quality of Experience (QoE) in networks further increases the demand for intensifying intelligent resource allocations to satisfy distinct user groups in a dynamic manner. This paper concentrates on the issue of resource allocations in CPS and also considers the satisfactory of QoE in content-centric computing systems. A novel approach is proposed by this work, which utilizes the mechanism of reinforcement learning to obtain high accurate QoE in resource allocations. The assessments of the proposed approach were processed by both theoretical proofs and experimental evaluations.

Keke Gai, Meikang Qiu, Meiqin Liu, Hui Zhao

Effective Malware Detection Based on Behaviour and Data Features

Malware is one of the most serious security threats on the Internet today. Traditional detection methods become ineffective as malware continues to evolve. Recently, various machine learning approaches have been proposed for detecting malware. However, either they focused on behaviour information, leaving the data information out of consideration, or they did not consider too much about the new malware with different behaviours or new malware versions obtained by obfuscation techniques. In this paper, we propose an effective approach for malware detection using machine learning. Different from most existing work, we take into account not only the behaviour information but also the data information, namely, the opcodes, data types and system libraries used in executables. We employ various machine learning methods in our implementation. Several experiments are conducted to evaluate our approach. The results show that (1) the classifier trained by Random Forest performs best with the accuracy 0.9788 and the AUC 0.9959; (2) all the features (including data types) are effective for malware detection; (3) our classifier is capable of detecting some fresh malware; (4) our classifier has a resistance to some obfuscation techniques.

Zhiwu Xu, Cheng Wen, Shengchao Qin, Zhong Ming

Load Pattern Shape Clustering Analysis for Manufacturing

Manufacturing is dominant sector in electricity power consumption users. Currently an AMI infrastructure is widely deployed to collect real-time customer electricity consumption data. The knowledge of customer electricity consumption profiling can be used to smart grid dispatching, marketing and pricing based on analyzing and mining the load profiling in different industries, seasons and time periods. This paper investigates an auto fuzzy K-means clustering algorithm for load pattern shape data, which can find out the optimal number of power behaviors pattern. A data-preprocessing framework for load pattern shape retrieval is shown to reduce the dimensions efficiently. In the other hand, a validation index is applied in the algorithm, which balances the scattering within the clusters and the separation between the clusters, to discover the real electricity consumption pattern automatically based on the density of load time series data. The experimental results show this algorithm can efficiently discover the electricity consumption behavior, such as order, continual load and continual low load profiling, and different over time working pattern in weekend and public holiday. The results can predict the electricity consumption behavior for different type of industry, which will benefit for Demand Side Management smart grid dispatching and marketing.

Mark Junjie Li, Weiguang Liu

A New Architecture of Smart House Control System

Smart house control system is very important in practice and is one of the research focuses. This paper proposes a new architecture of smart house control system based on Ad-Hoc wireless sensor network and cloud service. The architecture is open and flexible. A smart house control system named Airome is developed based on this architecture to show the effectiveness of the architecture.

Lianghai Yang, Feiqiao Mao, Jiaqi Tan

Big Data Analysis of TV Dramas Based on Machine Learning

Currently, large amount of TV dramas has overwhelmed the demand of TV station which had caused massive waste of resources. This article offers several practical solutions to tackle the above-mentioned problems through building model based on machine learning. Firstly, we build a TV score prediction model with regression and machine learning to rank the most welcomed TV drama. Moreover, we write an Internet worm to collect data from the internet, and build a Star popularity index prediction model by machine learning and regression. And list the much-acclaimed stars based on the popularity index. In conclusion, with the predict score of the TV drama predicted based on machine learning, it can provide a reference for TV station to manage TV programs and with the starring ranking it can help TV drama production team to produce TV dramas in a high quality.

Jiaqi Tan, Feiqiao Mao, Lianghai Yang, Jiahui Wang

Face Based Advertisement Recommendation with Deep Learning: A Case Study

Recently, there is a massive growth of the offline advertising industries. To increase the performance of offline advertising, researchers bring out several methodologies.However, the existing advertisement serving schemes are accustomed to focusing on traditional print media, resulting in the lack of personality and impression. Meanwhile, we find that facial features such as age, gender, can help us classify consumers intuitively and rapidly so that it can raise the accuracy in recommendation in a short time. Motivated by an original idea, we offer a Face Based Advertisement Recommendation System (FBARS). We propose that the FBARS works well in offline scenario and basically it could raise the accuracy 4 times. it performs 4 times better than the classic method using collaborative filtering.

Xiaozhe Yao, Yingying Chen, Rongjie Liao, Shubin Cai

Ensemble Learning for Crowd Flows Prediction on Campus

Campus security is an increasing-attention problem in recent years. Crowd flows prediction on campus is helpful for people monitoring and can avoid potential risks. In this paper, based on distributed visiting data collection on campus, we propose a crowd flows prediction method with ensemble learning. For feature selection, we introduce more information than people visiting data, such as vocation and weather, and evaluate the feature importance as well as their combinations. For prediction model, we use stacking method with Random Forest, Gradient Boosting Tree and XGBoost for a better performance of prediction. Experimental results show that our method obtain high accuracy for crowd flows prediction with low extra cost.

Chuting Wu, Tianshu Yin, Shuaijun Ge, Ke Yu

Impact of Probability Distribution Selection on RVFL Performance

The initialization of input weights and hidden biases plays an important role in random vector functional link networks (RVFL). Although some optimization algorithms for initialization have been proposed in recent years, the initialization strategies of these algorithms are under the premise of the uniform distribution. In this paper, ten benchmark datasets are used to study the impact of different probability distributions (e.g., Uniform, Gaussian, and Gamma distributions) initialization on the performance of RVFL. The experimental results present some interesting observations and valuable instructions: (1) No matter whether we use Uniform, Gaussian, or Gamma distributions, RVFL initialized by the distribution with smaller variances always get lower training and testing RMSE; (2) Compared with the Uniform distribution, the Gaussian and Gamma distributions with smaller variances usually give the RVFL model better performance; (3) Regardless of the distribution, RVFL with the direct link from the input layer to the output layer has better performance than those without the link; (4) RVFL initialized by the distribution with larger variances generally needs more hidden nodes to achieve equivalent accuracy with ones having the smaller variances; (5) With the increase of distribution variances, the performance of RVFL decreases first and then remains stable.

Weipeng Cao, Jinzhu Gao, Zhong Ming, Shubin Cai, Hua Zheng

Joint Sparse Locality Preserving Projections

Manifold learning and feature selection have been widely studied in face recognition in the past two decades. This paper focuses on making use of the manifold structure of datasets for feature extraction and selection. We propose a novel method called Joint Sparse Locality Preserving Projections (JSLPP). In order to preserve the manifold structure of datasets, we first propose a manifold-based regression model by using a nearest-neighbor graph, then the $$ L_{2,1} $$L2,1-norm regularization term is imposed on the model to perform feature selection. At last, an efficient iterative algorithm is designed to solve the sparse regression model. The convergence analysis and computational complexity analysis of the algorithm are presented. Experimental results on two face datasets indicate that JSLPP outperforms six classical and state-of-the-art dimensionality reduction algorithms.

Haibiao Liu, Zhihui Lai, Yudong Chen

Research on Dynamic Safe Loading Techniques in Android Application Protection System

Android is a widespread used embedded system. The number of Android applications has been rapidly growing. Because of Android open source policy and limited application security mechanism, Android applications are confronted with many serious security threats. By malicious reverse and illegal tampering, thousands of Android applications have been infected and millions of users have been exposed to dangers. In this paper, we proposed an improved Android applications protection system based on DEX block encryption and multi-file features checksum. Experiment results show that the proposed system is more reliable than the commonly-used Android application protection systems when facing with attack tools such as APK Tools and IDA pro.

Shubin Cai, Rongjie Huang, Ningsheng Yang, Jinwen Jiang, Zhong Ming, Zhengping Liang, Zhiguang Shan

Research on Optimizing Last Level Cache Performance for Hybrid Main Memory

Hybrid main memory including DRAM and non-volatile memory (NVM) such as phase change memory (PCM) has became a perfect substitute to DRAM-based main memory. Because it has the advantage about high performance and energy-efficient in embedded systems. The effective management of last level cache is very important which can reduce cache misses and has important practical significance on the improvement of overall system performance. In last level caches, the common used cache replacement algorithm Least Recently Used (LRU) may cause cache pollution by inserting non-reusable data into the cache. In this article we research the hybrid main memory but now the existing cache policies fail to fully solve the asymmetry between the operations of NVM and DRAM. To solve these problems we mentioned above, we propose a Process-based Pollute Region Isolation (PPRI) algorithm for improving the efficiency of last level cache utilization. It is a good way to eliminate competition between reusable and nonreusable cache lines. We also propose an improved last-level cache management scheme ILRU for the hybrid main memory which improves the cache hit ratio and minimizes write-backs to PCM. Experimental results show that the proposed framework can get better performance (average improved 17.39%) and more energy saving (average decreased 12.46%) compared with the latest cache management schemes for hybrid main memory architecture.

Hua Zheng, Zhong Ming, Meikang Qiu, Xi Zhang

Quality of Service (QoS) in Lan-To-Lan Environments Through Modification of Packages

In this paper we are going to show the ability to have the libraries for python Scapy and Net Filter Queue (nfqueue), initially to capture packets from a video call on Skype, then modifying the values contained in the ToS field, evidenced where a high priority is set, and finally replacing the original package. Tests are done taking the congested channel where it is found that when changing the ToS field, both qualitative and quantitative improvement is presented in the call. Finding that by modifying the ToS field of the call packets can obtain an improvement between 5.87% and 23.5% for voice, and 9.18% and 27.55%.

Cesar Andrés Hernández, Gabriel Felipe Diaz, Octavio José Salcedo Parra

Heuristic Algorithm for Flexible Optical Networks OTN

This paper presents the concept of flexible optical networks, based mainly on routing tasks and wavelength assignment (“λ”) by means of a heuristic algorithm called EEM, obtaining the calculation of the optical paths according to the demands of the topologies (NSFnet and EON) with the configuration of variable transponders and 100% traffic protection via disjoint channels. With the Net2plan software, the simulations of the topologies were realized, observing the increase of 20%–80% of the traffic offered by the Network (in Tbps) before observing an increase in the blocking probability according to the routes assigned by the algorithm.

Diego Fernando Aguirre Moreno, Octavio José Salcedo Parra, Danilo Alfonso López Sarmiento

VR3DMaker: A 3D Modeling System Based on Vive

The capability of conventional Computer Aided Design tools can be improved with the intuitivity of virtual reality (VR for short). VR environments can provide designers with enough creative space, any desired objects and intuitive experience, which is different to the conventional way before monitor. The key technology for VR equipment is SLMA (Simultaneous localization and mapping). One of the VR Manufacturer, HTC, have provided a position and orientation tracking system to solve this problem by its product called Tracker. Therefore, we can identify our motion and track the trail. With a set of such information collected, the models in Virtual Space can be manipulated and deformed like they tend to be in the realistic world. In this paper, we present a 3D modeling platform that building a model based on the Unity3D platform by virtual reality technology. This platform includes a four module with workspace module, a base-model module, a transformation module and a model import and export module.

Shubin Cai, Jinchun Wen, Zhong Ming, Zhiguang Shan

A Knowledge Graph Based Solution for Entity Discovery and Linking in Open-Domain Questions

Named entity discovery and linking is the fundamental and core component of question answering. In Question Entity Discovery and Linking (QEDL) problem, traditional methods are challenged because multiple entities in one short question are difficult to be discovered entirely and the incomplete information in short text makes entity linking hard to implement. To overcome these difficulties, we proposed a knowledge graph based solution for QEDL and developed a system consists of Question Entity Discovery (QED) module and Entity Linking (EL) module. The method of QED module is a tradeoff and ensemble of two methods. One is the method based on knowledge graph retrieval, which could extract more entities in questions and guarantee the recall rate, the other is the method based on Conditional Random Field (CRF), which improves the precision rate. The EL module is treated as a ranking problem and Learning to Rank (LTR) method with features such as semantic similarity, text similarity and entity popularity is utilized to extract and make full use of the information in short texts. On the official dataset of a shared QEDL evaluation task, our approach could obtain 64.44% F1 score of QED and 64.86% accuracy of EL, which ranks the 2nd place and indicates its practical use for QEDL problem.

Kai Lei, Bing Zhang, Yong Liu, Yang Deng, Dongyu Zhang, Ying Shen

Predicting App Usage Based on Link Prediction in User-App Bipartite Network

Nowadays the explosion of smartphone Apps has created a fertile ground to study behavior of mobile users. In this paper, we utilize network footprint data (NFP) which consists of DPI data from ISPs and Crawler data from Web for predicting App usage. We construct User-App Bipartite network based on the network footprint data and propose the App Usage prediction method based on link prediction. We extract three-category features calculated from the bipartite network, the projection network and the original NFP data respectively and apply supervised machine learning models to the proposed features. We compare the results of App usage prediction model with different features combination in our experiments. It can be seen that the proposed link-prediction-based method is very effective for App usage prediction.

Yaowen Tan, Ke Yu, Xiaofei Wu, Di Pan, Yang Liu

Sentiment Classification of Reviews on Automobile Websites by Combining Word2Vec and Dependency Parsing

The online product reviews become one of the most useful and vast information sources for guiding customers’ decisions and helping the companies improve the quality of the products and services. Therefore, It is valuable to automatically identify sentiments from comment texts, which is concerned with the Sentiment Classification. In this paper, we propose a novel machine learning-based method called ADSSR to classify the sentiments of reviews on popular automobile websites in China. We extract the features based on dependency parsing which can reveal the syntactic structure of the sentence, to avoid obtaining the same vectors for sentences that contain the same words but a different grammatical structure. In order to reduce the dimensionality of the feature vectors and keep the contributions of the low-frequency words, we obtain the distributed vectors learned by Word2Vec and group the semantic similar words in a cluster through the K-means to obtain the pairs of each word and its corresponding cluster, and then replace every word with its corresponding cluster label. Experiments show the efficiency of the proposed sentiment classification method.

Feifei Liu, Fang Wei, Ke Yu, Xiaofei Wu

Data Quality Evaluation: Methodology and Key Factors

Data Quality Evaluation is becoming an institutionalized stage in data quality lifecycle. More and more practice is promoted by data management and user organization in specific fields especially in better informationalized application circumstance.In order to improve the ability of data quality evaluation, the paper presents the key factors for data quality assessment and measurement. On the base of analyzing the main methodologies and standards on data quality management, the key factors includes objectives, general principles, characteristics, measurement function etc.

Ying Yang, Yuan Yuan, Bo Li

SecTube: SGX-Based Trusted Transmission System

Trusted communication is a key component in trusted computing paradigm. Sensitive data usually has to be migrated between two applications or platforms in the environment of open network. In this case, not only file integrity monitor tools but also trusted transmission is needed. However, existing trusted transmission solutions run on the user’s application platform or operating system. The lack of the isolation makes such security software easy to be subverted. In this paper, we present a novel approach called SecTube to protect the data safety in transmission. It utilizes Intel’s new security technology SGX to give user application a safer execution environment. We also present the design and implementation of enclave socket in this paper. We realize the SecTube in Ubuntu 14.04 and several experiments are conducted. The experimental results show the effectiveness of SecTube and demonstrate that the average performance overhead of SecTube is only about 15%.

Jian Chen, Bo Dai, Yanbo Wang, Yiyang Yao, Bo Li

The Research How to Judge Social Vehicles Driving into ART

This paper presents a new algorithm for avoiding ambulances being stuck for a long time in first aid, which uses advanced the Internet of Things (IoT) and location technology. Firstly, referring to BRT at home and aboard, ART is presented in the paper, which will be a kind of public service program in many important cities in China. Meanwhile; it is a very detailed research about the design idea of ART. Secondly, it also presents an algorithm how to judge social vehicles driving into ART, in order to avoid ambulances blocked, and increase times to rescue patients. Finally, the feasibility of the proposed algorithm is illustrated by some simulation experiments.

Weihu Wang, Zenggang Xiong, Yanshen Liu, Fang Xu, Conghuan Ye

PSVA: A Content-Based Publish/Subscribe Video Advertising Framework

Online video technology rapidly developed in the past few decades, amounts all applications, video advertising is a uprising approach to improve commercial advertisement revenue due to its diverse presentations, interactivity GUI, tailored media publishing and measurable subscription features etc. However to accurately match commercial adverts from the publisher’s media source, with consumer desire contents, as well as effective real-time video streaming transmission in a media processing system remains a problem. This paper proposes a content-based Publish/Subscribe Video Advertising (PSVA) framework. In this framework, we construct a high-performance scalable communication subsystem to carry out real-time video streaming transmission, and the handle event matching by a stream computation subsystem. The experimental results show that the PSVA framework is effective.

Feiyang Wang, Dongyu Zhang, Yuming Lu, Kai Lei

Practice and Research on Private Cloud Platform Based on OpenStack

In recent years, cloud computing develops rapidly, and it has become increasingly popular research field in the IT field. This paper elaborates the concept and structure of cloud computing, cloud storage and private cloud, and discusses how to implement private cloud. This paper introduces the important role that OpenStack plays in private cloud construction, and puts OpenStack as an example to introduce the construction of private cloud platform.

Zhe Diao, Youwei Zhu

Approach for Semi-automatic Construction of Anti-infective Drug Ontology Based on Entity Linking

Ontology can be used for the interpretation of natural language. To construct an anti-infective drug ontology, one needs to design and deploy a methodological step to carry out the entity discovery and linking. Medical synonym resources have been an important part of medical natural language processing (NLP). However, there are problems such as low precision and low recall rate. In this study, an NLP approach is adopted to generate candidate entities. Open ontology is analyzed to extract semantic relations. Six-word vector features and word-level features are selected to perform the entity linking. The extraction results of synonyms with a single feature and different combinations of features are studied. Experiments show that our selected features have achieved a precision rate of 86.77%, a recall rate of 89.03% and an F1 score of 87.89%. This paper finally presents the structure of the proposed ontology and its relevant statistical data.

Ying Shen, Yang Deng, Kaiqi Yuan, Li Liu, Yong Liu

Constructing Ontology-Based Cancer Treatment Decision Support System with Case-Based Reasoning

Decision support is a probabilistic and quantitative method designed for modeling problems in situations with ambiguity. Computer technology can be employed to provide clinical decision support and treatment recommendations. The problem of natural language applications is that they lack formality and the interpretation is not consistent. Conversely, ontologies can capture the intended meaning and specify modeling primitives. Disease Ontology (DO) that pertains to cancer’s clinical stages and their corresponding information components is utilized to improve the reasoning ability of a decision support system (DSS). The proposed DSS uses Case-Based Reasoning (CBR) to consider disease manifestations and provides physicians with treatment solutions from similar previous cases for reference. The proposed DSS supports natural language processing (NLP) queries. The DSS obtained 84.63% accuracy in disease classification with the help of the ontology.

Ying Shen, Joël Colloc, Armelle Jacquet-Andrieu, Ziyi Guo, Yong Liu

Using Virtualization for Blockchain Testing

Blockchain technology is experiencing prosperity. A wide variety of open source blockchains emerge recent years, which are different in architecture design or protocol parameters. When a developer wants to compare runtime performance of different blockchains, he could conduct a simulation with event simulator, or run application on physical machines. Either way will result in problems of unconvincing or costly. Container, a lightweight virtualization technique, is suitable for solving this dilemma. Tens of containers could run simultaneously in a single-core CPU computer, with each container having a running blockchain client. A simulated blockchain network with hundreds of nodes could be easily established in this way. In this paper, we firstly overview blockchain architecture choices which will significantly affect performance. Then we introduce our framework on testing blockchains using containerization. Authenticity and high cost of P2P application testing would be balanced in this framework. Finally, we implement our framework to run a demo testing how bitcoin’s network parameters will affect system reliability.

Chen Chen, Zhuyun Qi, Yirui Liu, Kai Lei

DiPot: A Distributed Industrial Honeypot System

Recent years witness the prosperous of Internet and Cyber Physical Systems (CPS). More and more industrial devices and systems are connected to the Internet and thus become the target for attackers. This paper proposed a distributed industrial honeypot system called DiPot to monitor Internet scanning and attacking behaviors against industrial control systems. DiPot offers attack clustering and visualization services to users and could help users to be aware of current ICS security situation. Different from existing Honeypot systems, DiPot has two advantages: high-degree simulation and deep data analysis. DiPot is also equipped with an advanced visualization frontend and could provide users with good experience. Through 6 months running, DiPot has obtained plenty of data and captured some real-world attack samples from Internet. The experimental results demonstrate the effectiveness and efficiency of DiPot.

Jianhong Cao, Wei Li, Jianjun Li, Bo Li

MSA vs. MVC: Future Trends for Big Data Processing Platforms

Big data processing systems design is highly prioritized concern for both academia and industry. The conventional MVC architecture exposes limitations on system scalability and consistency. The task of integrating new services into an existing commercial application platform has become an impossible task and torturing nightmare for the system development team. The innovative MSA architecture is aimed to solve such a problem. The main contribution of this paper is comparison between the MSA and MVC system design and development architectures, summaries future research and development issues and challenges. This paper first discusses the problems and challenges of big data management, compares and discusses the characteristics of MVC and MSA patterned big data processing (BDP) platforms. Then we verify the MSA big data management systems, distributed data storage and the progress of the large data storage architecture utilize an experimental BDP platform. Finally list future research and development direction to provide valuable reference for further work.

Yuming Lu, Wei Liu, Haoxiang Cui

Attention-Aware Path-Based Relation Extraction for Medical Knowledge Graph

The task of entity relation extraction discovers new relation facts and enables broader applications of knowledge graph. Distant supervision is widely adopted for relation extraction, which requires large amounts of texts containing entity pairs as training data. However, in some specific domains such as medical-related applications, entity pairs that have certain relations might not appear together, thus it is difficult to meet the requirement for distantly supervised relation extraction. In the light of this challenge, we propose a novel path-based model to discover new entity relation facts. Instead of finding texts for relation extraction, the proposed method extracts path-only information for entity pairs from the current knowledge graph. For each pair of entities, multiple paths can be extracted, and some of them are more useful for relation extraction than others. In order to capture this observation, we employ attention mechanism to assign different weights for different paths, which highlights the useful paths for entity relation extraction. To demonstrate the effectiveness of the proposed method, we conduct various experiments on a large-scale medical knowledge graph. Compared with the state-of-the-art relation extraction methods using the structure of knowledge graph, the proposed method significantly improves the accuracy of extracted relation facts and achieves the best performance.

Desi Wen, Yong Liu, Kaiqi Yuan, Shangchun Si, Ying Shen

Information Centric Networking Media Streaming Experiment Platform Design

Information Centric Networking (ICN) is a revolutionary concept that considers data content interconnected networking instead of equipment interconnected data transmission. Designing and optimizing such a system require not only a software simulator environment but also an experimental platform to test new type of big-data processing architecture. This paper introduce and contribute on put forward the idea of a new distributed publish-subscribe media big data processing platforms base on the ICN architecture. First part discusses the component that included in a ICN media streaming platform, and then describes the design of a distributed publish-subscribe (P/S) system, which manages the subscription query, storage, transmission and content scheduling that complies with the named data networking (NDN) framework. The simulation utilizes software defined network (SDN) controller to carry out artificial intelligence supported computing in the cloud to generate global view of the distributed media content.

Yuming Lu, Tao Hu, Xiaojun Wang

An Implementation of Content-Based Pub/Sub System via Stream Computation

The sheer volume of data delivered via the Internet requires a more flexible and powerful communication model. As an expressive loosely-coupled asynchronous messaging model, Publish-Subscribe (Pub/Sub) system has been widely used. Traditional topic-based Pub/Sub system fails to understand the information of messages delivered, all messages must be previously classified into a set of topics. Content-based Pub/Sub system can dynamically choose subscribers for each message by its metadata. Existing distributed Pub/Sub systems are built on the overlay network consists of message brokers, which can adapt to heterogeneous network but inevitably impairs performance. In this paper, we designed a novel centralized tiered content-based Pub/Sub system with a four-layer architecture. In access layer, a customized naming strategy is proposed to achieve high availability. Internal message routing is finished in routing layer and sharding scheme is used to lower routing overhead. In computation layer, a two-step streaming computation model is used to boost the performance. In storage layer we adopt column-oriented database HBase for persistence. A set of comprehensive experiments were conduct to verify that our system achieve excellent performance, linear scalability and high availability.

Lei Huang, Li Liu, Jiayu Chen, Kai Lei

Security Message Broadcast Mechanism Research in Vehicular Network

Due to the rapid movement of the vehicle as a network node in the vehicle ad hoc network, the network topology changes frequently, and the communication link between the nodes is unstable, which leads directly to the fact that the security message cannot be transmitted in a timely and reliable manner. Based on these problems and practical requirements, this paper proposes a security message broadcast mechanism SMLP based on location prediction. The algorithm is divided into three steps. First, the future location information is predicted according to the location and direction of the neighbor node, and then the optimal relay node is selected. Finally, the relay node broadcasts the security message. In the stage of selecting the optimal relay node, the algorithm overcomes the shortcoming of the nearest neighbor node of the destination node as the next hop forwarding node, and further considers the continuous connection time between the nodes, direction coefficient and distance coefficient of node, and the stability coefficient is defined by the combination of these factors. SMLP chooses the neighbor node with the largest stability coefficient as the next hop relay node. The simulation results show that the SMLP routing protocol can reduce the packet loss rate and end-to-end delay in the packet transmission process to a certain extent compared with the GPSR routing protocol. It can be well applied to the transmission of security messages in the vehicle ad hoc network.

Yanlin Zhao, Kena Dong, Xiumei Fan

Efficient Algorithm for Traffic Engineering in Multi-domain Networks

Current communication and network infrastructures have created billions of petabytes of data on the network every second. This imposes challenging traffic demands as a major research problem. This paper proposes a scheme for Cloud-of-Things and Edge Computing (CoTEC) traffic management in multi-domain networks. In order to direct the traffic flows through the service nodes in multi-domain networks, we assign the critical egress point for each traffic flow in the CoTEC network with multiple egress routers to optimize CoTEC traffic flows known as Egress-Topology (ET). Therefore, the proposed ET topology incorporates traditional Multi-Topology Routing (MTR) in the CoTEC network to address the inconsistencies between service overlay routing and the Border Gateway Protocol (BGP) policies. Furthermore, the proposed ET introduces a number of programmable nodes which can be configured to ease of ongoing traffic on the network, and re-align services among the other nodes in multi-domain networks. Results show that our optimization algorithm has lower execution time and better QoS than without using optimization algorithm, thus allowing us to meet the demand of flexibility and efficiency of multi-domain networks in comparisons to justify our research contributions.

Jian Sun, Siyu Sun, Ke Li, Dan Liao, Victor Chang

Reliable and Efficient Deployment for Virtual Network Functions

Network function virtualization (NFV) is a promising technique aimed at reducing capital expenditures (CAPEX) and operating expenditures (OPEX), and improving the flexibility and scalability of an entire network. However, this emerging technique has some challenges. A major problem is reliability, which involves ensuring the availability of deployed SFCs, namely, the probability of successfully chaining a series of big-data-based virtual network functions (VNFs) while considering both the feasibility and the specific requirements of clients, because the substrate network remains vulnerable to earthquakes, floods and other natural disasters. Based on the premise of users’ demands for SFC requirements, we present an Ensure Reliability Cost Saving (ER_CS) algorithm to reduce the CAPEX and OPEX of telecommunication service providers (TSPs) by reducing the reliability of the SFC deployments. We employ big-data-based arbitrary topologies as the substrate network. The results of extensive experiments indicate that the proposed algorithms perform efficiently in terms of the blocking ratio, resource consumption and time consumption.

Jian Sun, Gang Sun, Dan Liao, Yao Li, Muthu Ramachandran, Victor Chang

Empirical Study of Data Allocation in Heterogeneous Memory

With the rapid development of data-driven technologies, implementing heterogeneous memories is an alternative for processing large-size data tasks or efficient computations while considering economic factors. Many previous studies have addressed the exploration of adopting heterogeneous memories in the field of the algorithm design. One of the vital components of using the heterogeneous memory is creating effective data allocation plans. However, it is challenge to discern the superiority of each method for generating data allocation plans due to various application scenarios and constraints. In this work, we have completed an empirical study focusing recent advanced data allocation mechanisms for heterogeneous memories. We use experimental evaluations to examine a number of representative strategies and the main findings of this work also include analyses and syntheses deriving from our evaluations.

Hui Zhao, Meikang Qiu, Keke Gai

NEM: A NEW In-VM Monitoring with High Efficiency and Strong Isolation

VMI technology is proposed to protect virtual machine and prevent it from attacking by malware. Although VMI technology can provide out-of-VM isolation to ensure the security of monitors, the overhead of context switching between the guest VMs and the hypervisor for each monitor point makes this approach wasteful in many application scenarios. On the other hand, semantic gap of extracting meaningful information from the guest is a problem need to be optimized for the VMI technology. In this paper, we present None-Exit Monitoring (NEM), a framework that can do the monitoring inside the guest to avoid overhead of VM-exit and VM-entry switching, and it can also provide strong isolation between the guest and the monitor tools. In NEM, we use two new hardware virtualization assistant features: Intel VT VMFUNC and #VE. NEM can provide isolated memory views and strict limits of privileges while using EPTP-switching to realize world-switches instead of root/non-root switching, which can reduce overhead of invocation of the monitor. On the other hand, IN-VM monitoring can achieve richer information on a virtual machine, which can enhance the capability of the monitor. To support EPTP-switching function of VMFUNC and #VE exception, we patch the open source KVM. We also implement NEM in KVM and evaluate its functionality and efficiency. Experimental result has shown that NEM can satisfy the security requirement of a virtual machine monitor and can greatly improve the efficiency.

Jingjie Qin, Bin Shi, Bo Li

k-CoFi: Modeling k-Granularity Preference Context in Collaborative Filtering

Collaborative filtering (CF) is a highly applicable technology for predicting a user’s rating to a certain item. Recently, some works have gradually switched from modeling users’ rating behaviors alone to modeling both users’ behaviors and preference context beneath rating behaviors such as the set of other items rated by user u. In this paper, we go one step beyond and propose a novel perspective, i.e., k-granularity preference context, which is able to absorb existing preference context as special cases. Based on this new perspective, we further develop a novel and a generic recommendation method called k-CoFi that models k-granularity preference context in collaborative filtering in a principled way. Empirically, we study the effectiveness of factorization with coarse granularity, fine granularity and smooth granularity, and their complementarity, by applying k-CoFi to three real-world datasets. We also obtain some interesting and promising results and useful guidance for practitioners from the experiments.

Yunfeng Huang, Zixiang Chen, Lin Li, Weike Pan, Zhiguang Shan, Zhong Ming

Implementation Maximum Overall Coverage Constraint Non-negative Matrix Factorization for Hyperspectral Mixed Pixels Analysis Using MapReduce

As an effective blind source separation method, non-negative matrix factorization has been widely adopted to analyze mixed data in hyperspectral image. However some constraints have to be added in the objective function for more accurate estimates due to the existence of local optima. In this paper, a new NMF-based mixed data analysis algorithm is presented, with maximum overall coverage constraint introduced in traditional NMF, referred to as the MOCC-NMF. Furthermore, in order to handle huge computation involved, parallelism implementation of proposed algorithm using MapReduce is described and the new partitioning strategy to obtain matrix multiplication and determinant value is discussed in detail. In the numerical experiments conducted on real hyperspectral and synthetic datasets of different sizes, the efficiency and scalability of the proposed algorithm is confirmed.

Ying Wang, Qian Zhou, Yunfeng Kong

Improved Three-Dimensional Model Feature of Non-rigid Based on HKS

The recognition and retrieval of 3D models have been a hot spot in the field of computer vision. Since the non-rigid shapes can generate various deformations, the recognition and retrieval of non-rigid 3D models are more complex and challenging than rigid one. Therefore, the key to the recognition and retrieval of non-rigid 3D models is to extract a feature which obtains substantial description ability and stability. An improved HKS feature named NSIHKS (NSIHKS, new scale Invariance heat kernel signature) was used to describe the shape of models in the paper. NSIHKS contains intrinsic invariance, scale transformation invariance, robustness et al. Moreover it has good resistance even under faint noise. Firstly, the NSIHKS features of each model were extracted and processed with clustering algorithm. Secondly, an efficient algorithm of similarity measurement was designed on the basis of Ming distance. Finally, NSIHKS features of each model in the standard data set were compared via the aforementioned distance algorithm. Experimental results of standard data set in this field show that this feature has good effect on the application of non-rigid 3D model retrieval.

Fanzhi Zeng, Jiechang Qian, Yan Zhou, Changqing Yuan, Chen Wu

An Object Detection Algorithm for Deep Learning Based on Batch Normalization

Based on the advantage of deep learning in object extraction, in this paper we design a deep network that adds Batch-Normalization to the convolution layer. Batch-Normalization has three main advantages. Firstly, it normalizes the input data, which can speed up the fitting of parameters. Secondly, Batch-Normalization can reconstruct the distribution of the input data, so that the feature of input data will not be lost. Thirdly, Batch-Normalization is able to prevent over-fitting, so it can replace Dropout, Local Response Normalization to simplify the network. The network in this paper adopted region proposal to get region of interests. Training classification and position adjustment at the same time to improve accuracy. Comprehensive experimental results have demonstrated the efficacy of the proposed network for objects detection.

Yan Zhou, Changqing Yuan, Fanzhi Zeng, Jiechang Qian, Chen Wu

Towards a Novel Protocol Analysis Framework for Industrial Control Systems

Nowadays industrial controls systems (ICS) are becoming more and more robust and intelligent, owing to the development of industrial networking technology. While, on the other hand, security issues arise and pose great challenges. Among these issues, the security of ICS protocols receives the attention from both academy and industry in recent years. Due to the close and proprietary nature of industrial protocols, it is difficult to conduct protocol analysis and protection on these protocols. To address this issue, we propose a novel protocol analysis framework, named ICS-PAS, for ICS protocols. ICS-PAF could differentiate unknown protocols and their command types, extract protocol format and recognize the data types of protocol payloads. In addition, ICS-PAF could also infer and model the state transition of ICS protocols. ICS-PAS requires no prior knowledge and could deal with binary protocols. We also conduct comprehensive experiments to verify the performance of ICS-PAS. The results show that ICS-PAS outperforms traditional approaches in terms of recognition accuracy and efficiency.

Jiye Wang, Liang Zhou, Xindai Lu, Huan Ying, Haixiang Wang

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter