Skip to main content

Über dieses Buch

This six volume set LNCS 11063 – 11068 constitutes the thoroughly refereed conference proceedings of the 4th International Conference on Cloud Computing and Security, ICCCS 2018, held in Haikou, China, in June 2018. The 386 full papers of these six volumes were carefully reviewed and selected from 1743 submissions. The papers cover ideas and achievements in the theory and practice of all areas of inventive systems which includes control, artificial intelligence, automation systems, computing systems, electrical and informative systems. The six volumes are arranged according to the subject areas as follows: cloud computing, cloud security, encryption, information hiding, IoT security, multimedia forensics.



Cloud Computing


3D Airway Tree Centerline Extraction Algorithm for Virtual Bronchoscope

Centerline extraction is the basis to understand three dimensional structure of the lung. Since the bronchus has a complex tree structure, bronchoscopists easily tend to get disoriented path to a target location. In this paper, an automatic centerline extraction algorithm for 3D virtual bronchoscopy is presented. This algorithm has three main components. Firstly, a new airway tree segmentation method based on region growing is applied to extract major airway branches and sub-branches. Secondly, the original center is adjusted according to the geometry features of Jacobian matrix, and modified Dijkstra shortest path algorithm is applied in the centerline algorithm to yield the centerline of the bronchus. Then, the airway tree structure and feature calculation are represented from many features. Our algorithm is tested with various CT image data and its performance is efficient.

Xiang Yu, Yanbo Li, Hui Lu, Le Wang

A Collective Computing Architecture Supporting Heterogeneous Tasks and Computing Devices

Abowd shows a new vision of computer framework - collective computing. In this case, kinds of remote computing devices including people who is regarded as a kind of computing devices connect with each other in a group for completing a complexed work. Therefore, the computing capacity of the various computing devices is fully exploited in different tasks. However, most of the current researches focus on the dedicated system, the heterogeneous tasks and computing devices performance in the infrastructure is not paid enough attentions. This paper presents a collective computing architecture that supports heterogeneous tasks and computing devices, which uses a series of centralized managers for analysing and distributing tasks and controlling heterogeneous computing devices. The whole architecture is layered in order to obtain loads balance, centralized dispatch and low delay communication. This architecture provides a common infrastructure for processing heterogeneous tasks by heterogeneous devices but not for some specialized systems or functions. At last, we implement a prototype system by virtual computers and android phones for proving that the architecture can use heterogeneous devices to perform heterogeneous tasks well.

Yang Li, Yunlong Zhao, Zhenhua Zhang, Qian Geng, Ran Wang

A Control Approach Using Network Latency Interval to Preserve Real-Time Causality

A Distributed Virtual Environment (DVE) is to simulate the real world and offer the fidelity with real-time constraint. However, existing causal order control methods could not function well due to the large and dynamic network transmission latency in big data scale environments. In this paper, a novel control approach using network latency interval to preserve real-time causality is proposed to effectively select the causal control information dynamically adapted to the network latency and irrelevant to the computing node scale. The results of groups of experiments indicate that the proposed approach is more efficient in preserving causal order delivery of events in large-scale networks, and in the meanwhile, meets the real-time constraint of causality preservation in a DVE.

Hangjun Zhou, Guang Sun, Shuyang Du, Feng Liu, Bo Yang, Yaqian Zhuo

A Co-occurrence Matrix Based Multi-keyword Ranked Search Scheme over Encrypted Cloud Data

Searchable encryption has become a very important technique for secure data search in cloud systems. It can conduct search functions over encrypted cloud data and protect the data privacy meanwhile. There are many searchable encryption schemes that support basic search functions like single keyword search, multi-keyword search, similiarity search and so on. To enrich the search functionality, several searchable encryption schemes that support semantic search have been proposed recently. However, these schemes only focus on the existence of keywords during the search process. They have not considered the semantic relation between keywords, which have significant influence on the search results.To solve this problem, in this paper, we propose a new co-occurrence matrix based Semantic Multi-keyword Ranked Search scheme (SMRS) which considers both the semantic relation of keywords and keyword weight for more accurate search results over encrypted cloud data. We design a term co-occurrence matrix to quantize the semantic correlation of keywords and adopt the widely-used TF-IDF rule to measure the keyword weight. In particular, our scheme can effectively retrieve data items that match the users’ intention better. The indexes and queries are encrypted by the secure kNN algorithm. We also add some randomness in the encryption process for stronger protection. Security analysis prove that SMRS is secure under the known background model. The performance evaluation shows that our scheme achieves high search accuracy and practical search efficiency.

Nan Jia, Shaojing Fu, Dongsheng Wang, Ming Xu

A Cooperative Spectrum Sensing Method Based on Clustering Algorithm and Signal Feature

To solve the problem that the threshold is difficult to calculate in the spectrum sensing method of random matrix, this paper proposes a spectrum sensing method based on the combination of clustering algorithm and signal features. At the same time, in order to improve the performance of feature estimation and the detection performance under the condition of small number of cooperative users, a concept based on stochastic matrix splitting and reorganization is introduced to logically increase the number of cooperative users. In order to further obtain the information from the signal matrix and improve the feature accuracy, the signal perceived by each secondary user (SU) is decomposed into I and Q (IQ) components. Firstly, the signal matrix is split and reassembled and IQ decomposed. Then the covariance matrices of split matrix and matrix after IQ decomposition are calculated respectively and the corresponding eigenvalues are obtained. Then, the features are formed into a feature vector. Finally, the algorithm classifies these feature vector. The simulation experiments under different signal characteristics and different clustering algorithms show that the proposed method can effectively improve the performance of spectrum sensing.

Shunchao Zhang, Yonghua Wang, Pin Wan, Yongwei Zhang, Xingcheng Li

A Dynamic Network Change Detection Method Using Network Embedding

Dynamic networks are ubiquitous. Detecting dynamic network changes is helpful to understand the network development trend and discover network anomalies in time. It is a research hotspot at present. The structure of the network in the real world is very complex, the current feature learning method is difficult to capture a variety of network connectivity patterns, and the definition of efficient network features requires a large number of neighborhood knowledge and computational costs. In order to overcome this limitation, this paper presents a method of dynamic network change detection using network embedding, which automates the whole process by using feature extraction as a embedding problem, and carries out dynamic network change detection by analyzing the distribution of nodes in space after network embedding processing. We use this method to simulate dynamic network and real dynamic network datasets to prove the validity of this method.

Tong Sun, Yan Liu

A Genetic Algorithm Based Method of Early Warning Rule Mining for Student Performance Prediction

Prediction of student failure in course learning has become a very difficult issue due to the large number of factors that can affect student’s low performance, and it is difficult to use classical statistical methods because the results are usually very difficult to being understood by end-user. In this study, a genetic algorithm approach is proposed to deal with these problems using a data set of 576 higher education students’ course learning information. Firstly, a mechanism of chromosome encoding is designed to represent associated individual namely classification rule. Secondly, a flexible fitness function is proposed in order to evaluate the quality of each individual, which can make a trade-off between sensitivity and specificity. Thirdly, a set of genetic operators including selection, crossover and mutation are constructed to generate offspring from the fittest individuals so as to select out the best solution to our problem, which can be easily used as an early warning rule to predict student failure in course learning. Finally, by testing the model, consistency was shown between the predicted results and the observed data, indicating that the employed method is promising for identifying at-risk students. The interpretable result is a significant advantage over other classical methods as it can obtain a both accurate and comprehensible classifier for student performance prediction.

Chunqiao Mi, Xiaoning Peng, Zhiping Cai, Qingyou Deng, Changhua Zhao

A Hybrid Resource Scheduling Strategy in Speculative Execution Based on Non-cooperative Game Theory

Hadoop is a well-known parallel computing framework for processing large-scale data, but there is such a task in the Hadoop framework called the “Straggling task” and has a serious impact on Hadoop. Speculative execution is an efficient method of processing “Straggling Tasks” by monitoring the real-time rate of running tasks and backing up “Straggler” on another node to increase the chance of an early completion of a backup task. The proposed speculative execution strategy has many problems, such as misjudgement of “Straggling task” and improper selection of backup nodes, which leads to inefficient implementation of speculative execution. This paper proposes a hybrid resource scheduling strategy in speculative execution based on non-cooperative game theory (HRSE), which transforms the resource scheduling of backup task in speculative execution into a multi-party non-cooperative game problem. The backup task group is the game participant and the game strategy is the computing node, the utility function is the overall task execution time of the cluster. When the game reaches the Nash equilibrium state, the final resource scheduling scheme is obtained. Finally, we implemented the strategy in Hadoop-2.6.0, experimental results show that the scheduling scheme can guarantee the efficiency of speculative execution and improve the fault-tolerant performance of the computation under the condition of high cluster load.

Williams Dannah, Qi Liu, Dandan Jin

A Joint Approach to Data Clustering and Robo-Advisor

Robo-advisor is a type of financial recommendation that can provide investors with financial advice or investment management online. Data clustering and item recommendation are both important and challenging in Robo-advisor. These two tasks are often considered independently and most efforts have been made to tackle them separately. However, users in data clustering and group relationship in item recommendation are inherently related. For example, a large number of financial transactions include not only the user’s asset information, but also the user’s social information. The existence of relations between users and groups motivates us to jointly perform clustering and item recommendation for Robo-advisor in this paper. In particular, we provide a principle way to capture the relations between users and groups, and propose a novel framework CLURE, which fuses data CLUstering and item REcommendation into a coherent model. With experiments on benchmark and real-world datasets, we demonstrate that the proposed framework CLURE achieves superior performance on both tasks compared to the state-of-the-art methods.

Jingming Xue, En Zhu, Qiang Liu, Chuanli Wang, Jianping Yin

A Measurement Allocation for Block Image Compressive Sensing

In this paper, we propose a measurement allocation to reduce the blocking artifacts existing in the Block Compressive Sensing (BCS) system of image. We compute the error between each image block and its adjacent ones, and evaluate the structure complexity of each block. According to the error energy, each block is adaptively measured and reconstructed. Experimental results show that the proposed method improves the qualities of reconstructed images from both subjective and objective points of view when compared with BCS of image.

Xiaomeng Duan, Xu Li, Ran Li

A Method of Small Sample Reliability Assessment Based on Bayesian Theory

In this paper, for the problem of how to improve the accuracy of Bayesian small sample reliability evaluation, we improve the pre-test information preprocessing and Bayesian reliability evaluation. In the preprocessing stage of pre-test information, the conversion of pre-test information is mainly studied. Aiming at the problem of information conversion in similar systems, this paper presents a novel method, which based on the correlation coefficient to determine the relationship between similar systems, and the use of D-S evidence theory to integrate the conversion method; At the same time, In this paper, an improved method based on improved HS algorithm is proposed to improve the accuracy of information partitioning and matching, and then improve the conversion efficiency. In the Bayesian reliability evaluation stage, the distribution of pre-test information is determined by using the method of conjugate pre-distribution distribution, and a mixed pre-test model is proposed to solve the problem of pre-test information “submerged” small sample field test information and the problem of multi-source pre-test distribution of the weight; and the evaluation results of the reliability parameters are obtained effectively.

Hongbin Wang, Nianbin Wang, Lianke Zhou, Zhenbei Gu, Ruowen Dang

A Modified Dai-Yuan Conjugate Gradient Algorithm for Large-Scale Optimization Problems

It is well know that DY conjugate gradient is one of the most efficient optimization algorithms, which sufficiently utilizes the current information of the search direction and gradient function. It is regrettable that DY conjugate gradient algorithm fails to address large scale optimization model and few scholars and writers paid much attention to modifying it. Thus, to solve large scale unconstrained optimization problems, a modified DY conjugate gradient algorithm under Yuan-Wei-Lu line search was proposed. The proposed algorithm not only has a descent character but also a trust region property. At the same time, the objective algorithm meets the demand of global convergence and the corresponding numeral test proves it is more outstanding compare with similar optimization algorithms.

Gonglin Yuan, Tingting Li

A Modified Wei-Yao-Liu Conjugate Gradient Algorithm for Two Type Minimization Optimization Models

This paper presents a modified Wei-Yao-Liu conjugate gradient method, which automatically not only has sufficient descent property but also owns trust region property without carrying out any line search technique. The global convergence property for unconstrained optimization problems is satisfied with weak Wolfe-Powell (WWP) line search. Meanwhile, the present method can be extended to solve nonlinear equations problems. Under some mild condition, line search method and project technique, the global convergence is established. Some preliminary numerical tests are presented. The numerical results show its effectiveness.

Xiaoliang Wang, Wujie Hu, Gonglin Yuan

A New Fully Homomorphic Encryption Scheme on Batch Technique

In 2011, Naehrig et al. proposed a RLWE-based Homomorphic Encryption scheme. In this paper, we designed a new scheme which combined with the batch technique. Concretely, the technique packed multiple “plaintext slots” into a ciphertext by using the Chinese Remainder Theorem, and then performed homomorphic operations on it. Considering the exponential growth of the noise in each multiplication operation, we used the key switching and modulus switching technique to reduce the noise size in ciphertext, ensuring the correct decryption and the next homomorphic computation. In particular, We can encrypt $$ {\text{O}}\left( {n\lambda } \right) $$ plaintexts in the encryption process, improving the efficiency of $$ \lambda $$ times compared to the original scheme. Finally, we analyzed the security and parameters of the scheme. It was proven that our scheme is CPA security.

Mengtian Li, Bin Hu

A Novel Convolution Neural Network for Background Segmentation Recognition

The convolution neural network for image classification is an application of deep learning on image processing. Convolutional neural networks have the advantage of being able to convolve directly with image pixels and extract image features from image pixels. This approach is closer to the treatment of the human brain’s visual system. However, up to now, it is impossible to achieve 100% classification accuracy regardless of any kind of convolutional neural network models. At the same time, we also found that sometimes the picture background will also affect the recognition effect of the neural network on the picture, and after removing the picture background, it can correctly recognize the same picture. Therefore, we design a model based on the convolutional neural network to remove the image background for some specific images (such as selfies, animals, flowers, etc.) here, and then the image after processing will be identified and classified. Experiments show that the proposed method can maintain a high level of integrity of the target to be detected in the image after removing the background (here the indicator is Intersection Over Union, IOU); moreover, through multiple classification models verify that the classification accuracy of some background-removed pictures is significantly higher than that of pictures without any treatment.

Wei Fang, Yewen Ding, Feihong Zhang

A Parallel Pre-schedule Max-Min Ant System

The parameter sensitivity of MMAS algorithm is analyzed in this paper. And then, we propose a multi-ant colony parallel optimization algorithm based on dynamic parameter adaptation strategy, aiming at the performance lack of traditional ACO algorithm. This algorithm makes use of cloud computing parallelism to design and analyze the MMAS system. The convergence solution comparison results show that this method has certain advantages.

Ying Zheng, Qianlong Yang, Longhai Jin, Lili He

A Reliable Method of Icing Detection for Transmission Lines

The current ice-covering image processing method has poor fault-tolerance. And the classic algorithm using Hough transform linear fitting method to detect the conductor inclination is of low accuracy, in spite of the ability of suppressing interference and noise. In this paper, a new improved method was proposed, which is based on the analysis and studies of domestic and international methods for measurement of line ice thickness using image processing. The Hough transform was combined with the least-squares method to fit the edges of the covered wire. And fault-tolerance technology based on recovery blocks was used to solve the problem that the system fails due to data deviation from expectations. Experimental results show that compared with the traditional processing methods, the two proposed algorithms are capable to improve the system’s fault-tolerance, this ensures high reliability and accuracy of the final measurement result. And it has great significance to study the image-based ice coating thickness detection algorithm.

Zhao Guodong, Li Pengfei, Fang Fan, Liu Xiaoyu, Zhang Yuewei

A Research About Trustworthiness Metric Method of SaaS Services Based on AHP

Cloud computing, the internet of things (IoT), big data are the driving force for the transformation of the whole economy and cloud computing and services on cloud are also the foundations and infrastructure of the other two technologies. As the development of these technologies has brought us great changes in our life, challenges come as well. For the similar cloud services in functions in the cloud environment, it is a big problem to find the probable services to meet users’ needs of security, reliability, easy of use and so on. To solve the problem mentioned above, a method of measuring the trustworthiness of SaaS services based on AHP is proposed, on the basis of the research on SaaS services and trusted computing. And also, a metric model for trustworthiness is built. In the end, a case study is put forward which proves the feasibility and usability of the method and model.

Tilei Gao, Tong Li, Rong Jiang, Ren Duan, Rui Zhu, Ming Yang

A Spectrum Sensing Algorithm Based on Information Geometry and K-medoids Clustering

In order to improve the performance of existing spectrum sensing methods in cognitive radios and solve the complex problem of decision threshold calculations. This paper uses the information geometry theory and combines the unsupervised learning method of K-medoids clustering to realize the spectrum sensing. Firstly, using the information geometry theory, the statistical characteristics of wireless spectrum signals received by secondary users are analyzed and transformed into geometric characteristics on statistical manifolds. Correspondingly, the sampled signal of the secondary user corresponds to the point on the statistical manifold, and the distance feature between different points is obtained by using a metric method on the manifold. Finally, the K-medoids clustering algorithm is used to classify the distance features and determine whether the primary user signal exists, and achieve the purpose of spectrum sensing. Simulation results show that the proposed method outperforms traditional spectrum sensing algorithms.

Yonghua Wang, Qiang Chen, Jiangfan Li, Pin Wan, Shuiling Pang

A Spectrum Sensing Method Based on Null Space Pursuit Algorithm and FCM Clustering Algorithm

In order to improve the sensing performance of spectrum sensing systems in complex environments. This paper proposes a spectrum sensing method based on Null Space Pursuit algorithm (NSP) and fuzzy c-means (FCM) clustering algorithm. The signal sensing by the spectrum system is first pre-processed using a Null Space Pursuit algorithm and the signal is decomposed into sub-signal components with more distinct features. In order to further improve the accuracy of feature estimation the IQ decomposition method is used to process the signal. Then extract the eigenvalues of the signals to form a two dimensional feature vector. Finally, these eigenvectors and the FCM clustering algorithm yield a classifier that uses the classifier to determine the state of the unknown spectrum. In the experimental part, we verify the method in different environments. Experimental results show that the method can effectively improve the sensing performance of spectrum sensing system compared to traditional spectrum sensing methods.

Yongwei Zhang, Yonghua Wang, Pin Wan, Shunchao Zhang, Nan Li

A Survey of Machine Learning-Based Resource Scheduling Algorithms in Cloud Computing Environment

As a new type of computing resource, cloud computing attracts more and more users because it is convenient and quick service. The cloud server is used by a large number of users, which brings about the problem of how to reasonably schedule resources to ensure the load balance of the cloud environment. With the development of research, scholars have found that the simple job scheduling of physical resources cannot meet the utilization of resources. Connecting the characteristic of resource scheduling in cloud environment and machine learning, researchers gradually abstract a resource scheduling problem into a mathematical problem, and then combine machine learning with group algorithm to put forward the intelligent algorithm which can optimize the resource structure and the improve the resource utilization. In this survey, we discuss several algorithms that use machine learning to solve resource scheduling problems in a cloud environment. Experiments show that machine learning can assist the cloud environment to achieve load balancing.

Qi Liu, YingHang Jiang

A Trusted Computing Base for Information System Classified Protection

The 21st century is the age of information when information becomes an important strategic resource. Information security turns into one of the biggest issues facing computer technology today. Our computer systems face the risk of being plagued by powerful, feature-rich malware. Current malware exploit the vulnerabilities that are endemic to the huge computing base that needs to be trusted to secure our private information. This summary presents the trusted computing base (TCB) and the Trusted Computing Group (TCG). TCB is the collectivity of the computer’s protector, which influences the security of system. The Trusted Computing Group (TCG) is an international industry standards group. There are extensive theories about information security and technology. Providing some technology and methods that can prevent you system from being attacked by malware and controlled by unauthorized persons. At last, we introduce efficient TCB reduction.

Hui Lu, Xiang Cui, Le Wang, Yu Jiang, Ronglai Jia

Address Allocation Scheme Based on Local MAC Address

Virtualization and scale expansion in cloud data center result in sharp MAC address consumption which also expose some weaknesses in MAC address. The concept of local MAC address is proposed by IEEE standards organization to solve the limitation in traditional MAC address. In order to realize the automatic allocation of the local MAC address, we put forward two address allocation schemes which is appropriate for infrastructured network and ad-hoc network respectively. Our first scheme is a centralized management scheme which needs the usage of an address allocation device, while the other one is a distributed allocation scheme which is more suitable in wireless network without infrastructures. We simulate these two schemes based on OPNET and make an analysis on them. The results shows, our two schemes are feasible in different environment.

Xinran Fan, Ting Ao, Zhengyou Xia

An Approach Based on Value Revision to Activity Recognition

In recent years, activity recognition has attracted extensive attention and been applied in many areas including smart home, healthcare, energy saving, etc. A number of approaches for activity recognition have been proposed. However, the dispersion of values of run sensors imposes negative effect on activity recognition. In this paper, an approach based on value revision is proposed to recognize the activities of a resident. First, the time and the total duration that each sensor is run are calculated as sensor attributes in an activity record. Second, the start time, end time and duration of an activity record are extracted from a set of activity records as activity attributes. Third, the information gains of attributes are calculated to filter attributes which have low the information gains. Fourth, values of attributes of tested activity records are replaced with the most similar values of attributes of activity records of training dataset. Finally, classifiers are exploited to recognize daily activities. This paper validates the recognition of daily activities in two datasets by comparing the proposed approach and a previous approach. The results demonstrate that the proposed approach favourably outperforms previous approach.

Zhengguo Zhai, Yaqing Liu, Xiangxin Wang, Yu Jiang

An Encryption Traffic Analysis Countermeasure Model Based on Game Theory

With the development of network technologies, the proportion of encrypted traffic in cyberspace is increasing. This phenomenon directly leads to the increasingly challenging management and control of network traffic. The research on encrypted traffic analysis and monitoring at this stage has become an important direction. Based on game theory, this paper proposes a countermeasure model in the detection of encrypted traffic and expounds the key elements of the model. Finally, we will present a detailed analysis of the pay and benefits between the two sides of the game.

Xiangsong Gao, Hui Lu, Xiang Cui, Le Wang

An Image Retrieval Technology Based on Morphology in Cloud Computing

In recent years, with the rapid development of computer network technology, the number of various multimedia data, including image data information, has increased rapidly. How to efficiently retrieve these image information and extract information from users in large amounts of image data efficiently and accurately has become a key issue in the field of information retrieval. The improvement of traditional algorithm of image retrieval is difficult to solve a series of problems such as massive data storage, computation and transmission. As a new computing model, cloud computing has a very important role in promoting the development of image retrieval. In this paper, image edge information is extracted by morphological method, and image retrieval is done as image shape feature. It solves the problem of poor edge detection by traditional methods. These features are extracted by the cloud computing platform of Hadoop, which can effectively improve the performance of image retrieval.

Gui Liu, Jianhua Yao, Zhonghai Zhou

An Improved ICS Honeypot Based on SNAP7 and IMUNES

Honeypot, as an active defense technology, can make up for the low efficiency of detection system for unknown threats and is of great significance for the safety of industrial control network. At present, there are many defects in industrial control system (ICS) honeypot, which can’t support large-scale deployment at the same time with high fraudulence and a certain degree of interaction. In order to compensate for these defects, an improved honeypot scheme has been proposed, which is based on the SNAP7 and IMUNES. The proposed honeypot can be deployed rapidly, and also, through the use of IMUNES and SNAP7 to achieve rapid construction of industrial control network “shadow” system, the system has the characteristics of light weight, high, strong deceptive and a certain degree of interaction. With scalability, it is easy to achieve docking industrial control Honeynet and computer network.

Chenpeng Ding, Jiangtao Zhai, Yuewei Dai

Analysis of Dynamic Change Regulation of Water and Salt in Saline-Alkali Land Based on Big Data

With the rapid development of the “Internet plus”, the monitoring service of saline-alkali land has undergone profound changes, intelligent technologies such as big data, cloud computing and data mining are gradually being applied to analyze the changing trend of saline-alkaline lands. In order to study the dynamic changes rule of saline-alkali land, using the synchronous monitoring system of the Internet of things to monitor soil moisture, salt content and other relevant data continuously. Then taking the soil of Yong’an town in Kenli County as the research object, the change trend and correlation of the key factors such as soil temperature, salt and water and pH are studied. The results show that the soil water content and salt content had a relatively obvious seasonal change. The soil was alkaline soil, and the change of pH value was not obvious with the season. Correlation analysis shows that there is a significant positive correlation between water content and salt content in most months, and correlations between the two factors and the other soil factors are different. Through the research on the trend and correlation of key factors such as soil water and salt, revealed the factors that affect the change of water content and salt content in the area which provided reference for the improvement of soil salinization in the next step.

Rui Zhao, Pingzeng Liu, He Li, Xueru Yu, Xue Wang

Analysis of LSTM-RNN Based on Attack Type of KDD-99 Dataset

Method and model of machine learning have applied to many industry fields. Employing RNN to detect and recognize network events and intrusions is extensively studied. This paper divides KDD-99 dataset into 4 subsets according to data item’s ‘attack type’ field. And then, LSTM-RNN is trained and verified on each subset in order to optimize model parameters. Experiments show the strategy of training for LSTM-RNN could boost model accuracy.

Chaochao Luo, Le Wang, Hui Lu

Analysis of Price Fluctuation Characteristics and Influencing Factors of Garlic Based on HP Filter Method

In order to analyze the fluctuation regulation of garlic price from a large number of garlic prices data, this paper makes an in-depth analysis of the garlic price data by using the HP filtering analysis method. Firstly analyzing the seasonal effect on garlic price volatility using CensusX12 seasonal adjustment method; secondly, the monthly data of garlic price in Shandong garlic wholesale market from January 2010 to December 2017 was decomposed by HP filtering. So the original sequence, the long-term trend sequence, and the cyclic variation sequence were obtained after the garlic price season was adjusted. Since January 2010, the fluctuation of garlic monthly prices can be divided into 5 cycles. After sorting out the regulation of garlic price and consulting a lot of data, it is concluded that the price of garlic is affected by planting area and natural conditions, and it is easy to cause market speculation and blindly follow suit.

Guojing Wu, Pingzeng Liu, Weijie Chen, Wei Han

Application of Extensible Mind Mapping in Retirement Paradox Solution

With the development of society, the problem of population ageing becomes increasingly in our life. However, there is a contradictory problem that population ageing brings much pressure to economy while delay of retirement age makes employment pressure more severe. In order to solve the problem, we put up with a new innovative thought and perform an innovative thinking in combination with Extensive mind mapping, giving some suggestions to population ageing problem.

Wenjun Hong, Rui Fan, Bifeng Guo, Yongzhao Feng, Fuyu Ma, Shunliang Ye

Big Data Equi-Join Optimization Algorithms on Spark Cloud Computing Platform

On Spark cloud computing platform, the conventional big data equi-join algorithms cannot meet the performance requirements well and the procedure of it is very time-consuming, so the efficiency of big data equi-join is a burning challenge. To overcome it, in this paper, we propose Compressed Bloom Filter Join algorithm, an efficient algorithm filters out most of invalid connections which cannot meet the criteria to reduce network overhead, and it constructs static one-dimensional bit array to improve join performance. Moreover, Compressed Bloom Filter Join Extension algorithm, an extended optimization based on Compressed Bloom Filter Join algorithm, produces a dynamic two-dimensional bit array to filter out invalid records, and it can further accelerate the process of data join when the data size is unknown. Experimental results show that the performance of two optimization algorithms which can reduce time consumption and the data size of Shuffle stage are better than Hash Join and Broadcast Join on Spark cloud computing platform.

Sihui Li, Wei Xu

Blocking Time-Based MPTCP Scheduler for Heterogeneous Networks

In order to solve the problem of buffer congestion caused by multipath transmission, we present Blocking Time-based MPTCP (MPTCP-BT) in heterogeneous network. The proposed algorithm designs a new metric path blocking delay (PBD) and compared it with the receiver buffer to determine whether using a path would cause blocking, and then selects specific subflows set to transfer data. Our evaluation proves that MPTCP-BT allows a reduce by 9.45% in average completion time and decreasing number of out-of-order (OFO) packets by 58.5% as compared against default MPTCP and other schedulers. The experimental results show that the MPTCP-BT algorithm reduces the OFO packets in receiving terminal and improves the throughput.

Chen Ling, Wensheng Tang, Pingping Dong, Wenjun Yang, Xiaoping Lou, Hangjun Zhou

Cloud Computing Data Security Protection Strategy

Cloud computing is a computing model that has emerged in recent years. It is of great significance for promoting massive information data calculations. We store large amounts of data in the cloud and use algorithms to make huge computations on large-scale data centers on the Internet. Big data cloud computing resources have a huge amount of information, which can provide convenient and fast computing while being a new type of storage. Due to the large amount of data that needs to be processed by cloud computing, many information processing methods are involved, and customer privacy data is difficult to be effectively protected in the cloud computing environment. Therefore, the protection of information security in the cloud computing environment becomes more important. Based on the analysis of the data security problems in the big data cloud computing environment, this paper analyzes the security protection of big data cloud computing data, which is of great significance for promoting the establishment of a complete data security system, and can also be further promoted the useful of cloud computing.

Yu Wei, Yongsheng Zhang

Clustering Model Based on RBM Encoding in Big Data

In this paper, a clustering model based on deep learning RBM encoding is proposed for the further data mining of the massive, complex and high-dimensional data. This model includes two major parts: pre-training and fine-tuning & optimization. In the pre-training part, proper parameters are adopted for RBM encoding to reduce the high-dimensional and large-scaled data, and then pre-clustering is done with k-means and other algorithms. The fine-tuning & optimization part is developed from the deep structure of pre-training to form a deep fine-tuning, and network is initialized with the parameters generated from the pre-training, and then the initial clustering center generated from pre-training process is further clustered and optimized. At the same time, encoding features are optimized and the final clustering center and membership matrix are obtained. In order to validate this model, some data are selected from the UCI dataset for clustering comparison. It is indicated in the data analysis that this clustering model based on RBM encoding has little impact on the clustering effect, but the execution is more efficient.

Lina Yuan, Xinfeng Xiao, FuFang Li, Ningning Deng

Criteria Interdependence in Fuzzy Multi-criteria Decision Making: A Survey

In this paper, we investigate how Bonferroni mean (BM) operator models criteria interdependence in fuzzy multi-criteria decision making problems. We first study definitions of different types of fuzzy sets proposed in 1960s–2010s; we then introduce definitions of aggregation functions and the Bonferroni mean operator; we finally survey the work of modeling criteria interdependence by using BM and its extensions.

Le Sun, Jinyuan He

DBHUB: A Lightweight Middleware for Accessing Heterogeneous Database Systems

Traditional relational database management system (RDBMS) has the capability of full transaction processing but introduces the unnecessary overhead for dealing with the unstructured data in several big data scenarios. In contrast, the NoSQL systems can query the unstructured data with higher space-time efficiency, but most of them are lacking the function of transaction processing. To bridge the gap, we propose DBHUB, a lightweight middleware to combine the advantages of both sides. DBHUB provides the compatible APIs to upper applications and detects the received queries to extract the unstructured data automatically. In general, DBHUB handles the write query with RDBMS’s storage engine and serves the read query by NoSQL’s routine. We implement DBHUB in a practical system which including InnoDB in MySQL and MongoDB. The experimental results show that DBHUB can effectively accelerate the read query on unstructured data against the single RDBMS one. Meanwhile, the write query incurs mild overhead due to the write amplifications on heterogeneous databases.

Dingding Li, Wande Chen, Mingming Pan, He Li, Hai Liu, Yong Tang

Design a New Dual Polarized Antenna Using Metallic Loop and Annular-Ring Slot

The dual polarized antenna is the key component of the fully polarized electronic system, which can sense the polarization information of the electromagnetic wave. This paper presents a low cost dual polarized antenna composed of a metallic loop antenna and an annular-ring slot antenna. The metallic loop antenna is a current radiator and the annular-ring slot antenna can be regarded as a magnetic current radiator. A dual linearly polarized antenna is realized by using complementary loop antenna. Compact antenna structure together with simple coaxial probes feeding substantially decrease the antenna cost. Simulated results indicate that the proposed antenna has an impedance band width of 4.61%, higher antenna gain than 6 dB and broad beam width.

Qingyuan Fang, Zhiwei Gao, Shugang Jiang

Design and Implementation of Web Crawler Based on Coroutine Model

Web crawler is widely used in Chinese information processing. According to the problem to be dealt with, crawling related domains data, it provides the basis for subsequent Chinese information processing. The traditional multi-threaded model has obvious limitations and deficiencies when dealing with high concurrency and large number of I/O blocking operations. To solve the above problems, this paper proposes a solution based on the coroutine model. In this paper, the basic principles and implementation methods of coroutine are discussed in detail, then give a complete implementation of web crawler based on coroutine. Experimental results had shown that our scheme can effectively reduce system load and improve web crawler crawling efficiency.

Renshuang Ding, Meihua Wang

Design and Research on B2B Trading Platform Based on Consortium Blockchains

With the development of economy and technology, blockchains has become the preferred technology to solve the credit problems of both parties. Taking the sensitivity of transaction data into account, the use of blockchains technology in the B2B trading system is very appropriate. The credit problem of the B2B trading platform are solved in the article by introducing consortium blockchains technology, and a container-based cloud platform based on docker and kubernetes is built to improve the performance in the application of the consortium blockchains. First, it solves the problem of node performance bottleneck caused by the sudden increase of nodes, high-frequency transactions, and overclocking transactions. Second, it solves the problem of rapid iterative upgrade of node system versions with the rapid development of consortium blockchains technology, and securely stores consortium blockchains data and enterprise’s own sensitive data. The platform, to a certain extent, solved the credit problems of the transactions between enterprises, and at the same time reduced the time cost and the economic cost of the company.

Xiaolan Xie, Qiangqing Zheng, Zhihong Guo, Qi Wang, Xinrong Li

Design and Simulation of a New Stacked Printed Antenna

The key component of the fully polarized electronic system is dual polarized antenna, which can sense the polarization information of the electromagnetic wave. In the thesis a design of dual polarized antenna is proposed, which combines the electronic radiation resource and magnetic radiation resource. The feeding ports of the antenna are mounted at the bottom of the antenna which is helpful for the array application. A metal reflector ground was introduced to achieve the unidirectional pattern. The radiation fields emitted by the electronic current resource and magnetic current resource are approximately orthogonal to each other among large space range, which is suitable for application case. The full wave electromagnetic simulation and optimization design of the proposed antenna were carried out and the designed antenna was fabricated. The measured port isolation of the designed antenna is above 20 dB. The design effectiveness is validated through the experiments.

Zhiwei Gao, Weidong Liu, Qingyuan Fang, Shugang Jiang

Digital Continuity Guarantee of Electronic Record Based on Data Usability in Big Data Environment

At present some developed countries have put forward their digital continuity action plans. And the digital continuity has also become a hot spot in the research of electronic records. However, the technology and measures for the protection of the digital continuity are still lack.In this paper, we first point out the necessity of ensuring the availability and completeness of the electronic record. Moreover, the technology framework of availability and completeness for electronic record are firstly proposed. Secondly, we make use of the coding theory to ensure the availability of electronic record. In addition, the technology of ensuring completeness for electronic record is constructed based on functional dependency theory, and the method of evaluating completeness of electronic record is present using the functional dependency theory.

Jiang Xu, Jian Zhang, Yongjun Ren, Hye-Jin Kim

Distributed Monitoring System for Microservices-Based IoT Middleware System

Microservices based architecture is a promising middleware architecture of Internet of things for its advantages of agility and scalability. However, comparing to the native Service oriented Architecture (SOA), the widespread nature, no matter logically or physically, of this lightweight middleware system has made its organization, tracing and monitoring much harder, which could further compromise the effectiveness and performance. To this end, we design, implement and evaluate a new distributed monitoring system for microservices-based middleware of Internet of Things, which is designed as a cloud native system. This system is featured with supporting Kubernetes orchestration, instrument Java and Spring Cloud framework and owing the ability to obtain the performance metrics from all host and containers in an efficient way. Furthermore, it could collect the trace generated by a call from application frontend to each layered microservices, even fetching logging, and finally store them in a big data system for stream processing or map/reduce. The real implementation based evaluation has demonstrated the effectiveness of this system design.

Rui Kang, Zhenyu Zhou, Jiahua Liu, Zhongran Zhou, Shunwang Xu

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

Top-k dominating queries, which return the k best items with a comprehensive “goodness” criterion based on dominance, have attracted considerable attention recently due to its important role in many data mining applications including multi-criteria decision making. In the Big Data era, the modes of data storage and processing are becoming distributed, and data is incomplete commonly in some real applications. The related existing researches focus on centralized datasets, or on complete data in distributed environments, and do not involve incomplete data in distributed environments. In this work, we present the first study for processing top-k dominating queries on incomplete data in distributed environments. We show that, through detailed analysis, even though the dominance relation on incomplete data objects is non-transitive in general, the transitive dominance relation holds for some incomplete data objects with different bitmaps. We then propose an novel algorithm TKDI-MR based on MapReduce for processing TKD queries on incomplete data in distributed environments utilizing the aforementioned property. Extensive experiments with both real-world and large-scale synthetic datasets demonstrate that our approach is able to achieve good efficiency and stability.

Xiangwu Ding, Chao Yan, Yuan Zhao, Zewei Yang

Energy-Efficient Cloud Task Scheduling Research Based on Immunity-Ant Colony Algorithm

The increasing power consumption in the date center has become a constraint to the development of the cloud computing. With the aid of traditional Immunity algorithm and ant colony algorithm, this paper present a new multi-object scheduling algorithm, which combined the immunity algorithm and ant colony algorithm. The new algorithm considers cloud environment dynamics and select energy-efficient and reduce execution time as the optimization target. This algorithm assigns the jobs to the resources according to the job length and resources capacities. Then, the paper compared this algorithm with other famous scheduling algorithm in a simulation tool–Clousim. The result of simulation proves the new algorithm has better performance.

Jianhong Zhai, Xini Liu, Hongli Zhang

Enhancing Location Privacy for Geolocation Service Through Perturbation

Third party geolocation services have been widely used in various of location dependent scenarios, such like the searching of Internet of things (IoT) and the location-based services (LBSs). Despite the privacy preservation on using geolocation, which has been widely discussed in last decades, the equally severe issue of the privacy preservation on obtaining geolocation gained much fewer efforts from the researchers. In this paper, we propose a location perturbation scheme to protect location privacy in third party geolocation services. On the basis of the fundamental of positioning technologies, we design a perturbing method to blur the real location by adjusting the underlying signal space fingerprint information. Then a differential privacy mechanism is applied to the perturbation process to further strengthen the privacy level. Evaluation result are illustrated to show the practicality of our approach.

Yuhang Wang, Hongli Zhang, Shen Su

Facebook5k: A Novel Evaluation Resource Dataset for Cross-Media Search

Semantic concepts selection for model construction and data collection is an open research question. It is highly demanding to choose good multimedia concepts with small semantic gaps to facilitate the work of cross-media system developers. Since, this work is very scarce therefore; this paper contributes a new real-world web image dataset created by NGN Tsinghua Laboratory students for cross media search. Unlike previous datasets, such as Flicker30k, Wikipedia and NUS have high semantic gap, results in leading to inconsistency with real time applications. To overcome these drawbacks, the proposed Facebook5k dataset includes: (1) 5130 images crawled from Facebook through users feelings; (2) Images are categorized according to users feelings; (3) Facebook5k is independent of tags and language, rather than uses feelings for search. Based on the proposed dataset, we point out key features of social website images and identify some research problems on image annotation and retrieval. The benchmark results show the effectiveness of the proposed dataset to simplify and improve general image retrieval.

Sadaqat ur Rehman, Yongfeng Huang, Shanshan Tu, Obaid ur Rehman

Greedy Embedding Strategy for Dynamic Graphs Based on Spanning Tree

In dynamic graphs, node additions and node/link failures cause coordinate updates and routing failures for greedy geometric routing. To avoid the packet local minima, the whole topology should be re-embedded, which causes high overhead. In this paper, a sufficient condition that a spanning tree can be greedily embedded into a metric space is found, which can help us avoid re-embedding. Based on the sufficient condition, a dimensional expanding strategy (DES) for online greedy embedding in high dimensional metric spaces is proposed, which can avoid re-embedding, as well as reducing the overhead.

Yanbin Sun, Mohan Li, Le Wang, Hui Lu

Heterogeneous Cloud Resources Management: Truthful Mechanism Design in Shared Multi-minded Users

We address the problem of dynamic virtual machine provisioning and allocation of heterogeneous cloud resources. Existing works consider each user requests single bundle (single-minded), but a user may request multiple bundles (multi-minded). Thus, our object is to provide and allocate efficiently multiple VMs considering multi-minded setting to maximize social welfare. We formulate this problem in an auction-based setting and design optimal and approximation mechanisms. In addition, we show the approximation is $$\frac{a_{max}}{a_{min}}\sqrt{R\frac{c_{max}}{c_{min}}}+2$$ , where $$c_{max}$$ / $$c_{min}$$ is the maximum/minimum available resources, and $$a_{max}$$ / $$a_{min}$$ is the maximum/minimum requested resources. Furthermore, we show our proposed mechanisms are truthful, that is, they drive the system into an equilibrium where any user does not have incentives to maximize her own profit by untruthful value. Experimental results demonstrate that our proposed approximation mechanism gets the near-optimal allocation within a reasonable time whiling to giving the users incentives to report their true declarations.

Xi Liu, Jing Zhang, Xiaolu Zhang, Xuejie Zhang

Hyper-graph Regularized Multi-view Matrix Factorization for Vehicle Identification

Recent vehicle identification systems based on radio frequency identification (RFID) often suffer from the challenges including long distance limitation and risk of malevolent tampering. A natural idea is to integrate multiple visual features with RFID information to improve the identification performance. In this paper, we propose an improved visual feature representation method, called hyper-graph regularized multi-view matrix factorization (HMMF), for vehicle identification. The proposed HMMF pushes cross-view clusters towards a common embedding, and maintains the high-order within-view structure simultaneously. We further propose semi-supervised HMMF (SemiHMMF) to incorporate the labels to utilize the partial labels of RFID data. An iterative optimization algorithm is developed based on multiplicative rules. Experiments on two real-world datasets demonstrate the effectiveness of the proposed methods on vehicle identification.

Bin Qian, Xiaobo Shen, Zhenqiu Shu, Xiguang Gu, Jin Huang, Jiabin Hu

Inconsistent Selection of Optimal Frame Length in WMSN

In traditional wire networks and wireless networks (e.g. WLAN, Cellular mobile network), optimal frame length is calculated according to the maximum data throughput, and adopted by every nodes in the network. But it is not suitable for Wireless Multimedia Sensor Network (WMSN), which mostly concerns effective utilization of limited energy instead of data throughput, and the nodes in the WMSN may have a very different channel environment. By adopting new benefit model in this paper the issue of optimal frame length is explored and an algorithm named OFLA is presented to achieve approximate optimal value. The one contributions of this article is a new benefit model of frame length in WSMN has been proposed. Another is with the new model and the algorithm, each node in WMSN can bring out the optimal frame length independently to compose the packets for the longest life-span of WMSN.

Yanli Wang, Yuanyuan Hong, Ziyi Qiao, Baili Zhang

Influence Maximization Algorithm in Social Networks Based on Three Degrees of Influence Rule

Influence maximization algorithms in social networks are aimed at mining the most influential TOP-K nodes in the current social network, through which we will get the fastest spreading speed of information and the widest scope of influence by putting those nodes as initial active nodes and spreading them in a specific diffusion model. Nowadays, influence maximization algorithms in large-scale social networks are required to be of low time complexity and high accuracy, which are very hard to meet at the same time. The traditional Degree Centrality algorithm, despite of its simple structure and less complexity, has less satisfactory accuracy. The Closeness Centrality algorithm and the Betweenness Centrality algorithm are comparatively highly accurate having taken global metrics into consideration. However, their time complexity is also higher. Hence, a new algorithm based on Three Degrees of Influence Rule, namely, Linear-Decrescence Degree Centrality Algorithm, is proposed in this paper in order to meet the above two requirements for influence maximization algorithms in large-scale social networks. This algorithm, as a tradeoff between the low accuracy degree algorithm and other high time complexity algorithms, can meet the requirements of high accuracy and low time complexity at the same time.

Hongbin Wang, Guisheng Yin, Lianke Zhou, Xiaolong Chen, Dongjia Zhang

Influencing Factors Analysis of Desert Precipitation Based on Big Data

To analyse the long-term effects and disturbances of other meteorological factors on precipitation, the historical information of the Ewenki Banner from 1991 to 2010 for a total of 20 years is taken as an example to construct the dynamic relationship between precipitation and temperature, humidity, wind speed and sunshine hours. The results show that each variable has a first-order single integer sequence at the 1% level of significance and has a stable long-term equilibrium relationship at the 5% level of significance. The vector autoregressive model (VAR) also passes the stability test to satisfy the preconditions for running. According to the analysis of impulse response and variance decomposition, the influence of humidity is greatest except for 43.29% of precipitation subject to its influence. The contribution of humidity reaches 43.85%, exceeding the impact of precipitation itself. The impact of sunshine hours on precipitation has a certain lag period and the fluctuations in the later period are relatively large. The impact of wind speed and temperature on the precipitation is not obvious. The model better simulates the impact of various influencing factors on precipitation and proves its feasibility in the analysis of the influencing factors of desert precipitation, which has certain practical significance.

Xue Wang, Pingzeng Liu, Xueru Yu

Interactive Construction of Criterion Relations for Multi-criteria Decision Making

Multi-criteria decision making (MCDM) is a category of techniques for solving decision making problems based on the performance of multiple criteria. One shortcoming of the existing MCDM techniques is that they rarely consider the relations among decision criteria. Nevertheless, different types of criterion relations significantly impact the results of the decision making problem. In this paper, we solve this problem by establishing and measuring different types of relations among decision criteria. We propose a MCDM framework, named InterDM, to rank a set of alternatives based on the utilities of both singleton criteria and criterion coalitions, in which we design an Interactive Interpretive Structural Modeling technique to construct consistent criterion relations. We use a case study of ranking cloud services to demonstrate the efficiency of InterDM.

Le Sun, Jinyuan He

Iteratively Modeling Based Cleansing Interactively Samples of Big Data

Taking advantage of big data means analyzing it and building prediction model on it. However, the data obtained in reality often contains dirty data due to various factors. One method of using big data is to clean the whole data at first, and then train predictive model on cleaned data, but existing cleaning approaches often need lots of completely clean data as guide to fix errors, that is impractical to obtain many clean data. Another method is to train predictive model on raw data directly, which causes the model is not accurate. Therefore, we explore the iterative updating model process and propose an updating algorithm combining data cleaning and conjugate gradient. In this paper, we incrementally update initial model trained on raw data towards the optimum by cleaning samples instead of whole data at each iteration. And the updating direction is established according to gradient of data. After multiple iterations, we can obtain the optimal model that still works well without cleaning data when new data comes in. We also present cluster descent sampling algorithm to accelerate model convergence. Our evaluation on real datasets shows that the approach significantly improves model accuracy compared with training model directly on raw data.

Xiangwu Ding, Shengnan Qin

Label Noise Detection Based on Tri-training

In machine learning, noise contained in the training dataset can be divided into attribute noise and label noise. Many works prove that label noise is more harmful compared to attribute noise. A set of noise filtering algorithms have been proposed to identify and remove noise prior to learning. However, almost all existing works solve this problem in a pure supervised way. It means noise identification is only based on the information of labeled data. In fact, unlabeled data are available in many applications, and the amount of unlabeled data is usually much bigger than labeled data. Therefore, in this paper, we consider to make use of unlabeled data to improve the performance of noise filtering. Tri-training is a powerful semi-supervised learning algorithm. It is adopted in this work because it is independent in the view of data. Finally, a set of experiments are conducted to prove the effectiveness of the proposed method.

Hongbin Zhu, Jiahua Liu, Ming Wan

Long Short Term Memory Model for Analysis and Forecast of PM2.5

Atmospheric PM2.5 is a pollutant that has a major impact on the atmospheric environment and human health. Based on LSTM, we construct two prediction models, Stack LSTM and Encoder-Decoder, and evaluate the prediction performance of the model through four years of meteorological data training and testing models in Nanjing, Beijing, and Sanya. In the experiment, using the meteorological factors, contaminant factors, seasonal factors, and the normalized results of PM2.5 as input, the daily average PM2.5 concentration is predicted from 1 to 3 days. Experimental results show that the LSTM model has better performance than Random Forest and Encoder-Decoder. Using Nanjing as an example, comparing the forecast results of Nanjing PM2.5 with the data released by the environmental authorities, it is found that the value of the PM2.5 concentration predicted by the LSTM model is very close to the value of PM2.5 monitored by Nanjing’s environmental authorities. In prediction of PM2.5 for three consecutive days, the Root Mean Square Error (RMSE) of the LSTM model is only 18.96. Under the LSTM model, the prediction result of the three cities are better than other prediction models, which shows that the LSTM model has a good adaptability in predicting the PM2.5 concentration.

Leiming Yan, Min Zhou, Yaowen Wu, Luqi Yan

Matching Algorithm of Composite Service Based on Indexing Mechanism in BPM

Service matching technology has promoted the development of Business Process Management (BPM). It’s significant for us to know how to retrieve a qualified service from a large number of candidate services in the library, and how to match and compose the services more appropriately. For the first part, this paper introduces the definition of the service model and rules of service matching. Next, in order to accelerate retrieval speed of services, we propose an indexing mechanism for the parameters of the service. Then, we would introduce the matching algorithm of composite services based on indexing mechanism. At the same time, this paper also analyzes the time complexity and space complexity of the algorithm. Through several simulation experiments, we verify the feasibility of the algorithm. After comparing the Back-Front algorithm, Front-Back algorithm, Service Composition Algorithm Based on AND/OR Graph with our algorithm, we find out the comparison results about the matching time and the retrieved number of composite services. The experiment results show that the matching algorithm based on indexing mechanism can greatly improve the matching speed and get more composite services without compromising the quality of service combination.

Qiubo Huang, Yuxiao Qian, Guohua Liu, Keyuan Jiang

MFI-5 Based Similarity Measurement of Business Process Models

With the increasing use of business process model management techniques, a large number of business process models are being developed in the industry, so the corresponding enterprises and organizations usually need to maintain a large business process set. An approach is presented based on the Meta-model for process model registration (MFI-5) to accurately measure the similarity of process models. First, based on MFI-5, the Process Model Description Framework (PMDF) is constructed. According to PMDF, a similarity feature set of the process model (SFS) is defined. Second, the Business Process Modeling Notation (BPMN) is utilized to describe corresponding business process, and the BPMN models are obtained. Further the BPMN models are identified and quantified by using SFS, so the model vectors are obtained. At last, the Tanimoto Coefficient-based algorithm is utilized to calculate the similarity between any two vectors, the similarity measure matrix of the BPMN models can be extracted. We illustrate the approach in the context of measuring the similarity of the online sales service processes, the result of which shows that the proposed approach can facilitate business process recommendation.

Zhao Li, Jun Wu, Shuangmei Peng, Peng Chen, Jingsha He, Yiwang Huang, Keqing He

MLS-Join: An Efficient MapReduce-Based Algorithm for String Similarity Self-joins with Edit Distance Constraint

String similarity joins is an essential operation in data integration. The era of big data calls for scalable algorithms to support large-scale string similarity joins. In this paper, we study scalable string similarity self-joins with edit distance constraint, and a MapReduce based algorithm, called MLS-Join, is proposed to supports similarity self-joins. The proposed self-join algorithm is a filter-verify based method. In filter stage, the existing multi-match-aware select substring scheme is improved to decrease the amount of generated signatures and to eliminate redundant string pairs including self-to-self pairs and duplicate pairs. In verify stage, the dataset is read only once by use of the techniques of positive/reversed pairs and combined key. Experimental results on real-world datasets show that our algorithm significantly outperformed state-of-the-art approaches.

Decai Sun, Xiaoxia Wang

Multi-dimensional Regression for Colour Prediction in Pad Dyeing

This paper aims to predict fabric colours by analyzing the relationship between multiple process parameters and colours of dyed fabrics in pad dyeing. The task is approached as a multi-dimensional regression problem. Within the framework of machine learning designed for colour prediction, two models, back-propagation neural network (BPNN) and multi-dimensional support vector regressor (M-SVR) are implemented. The process parameters are fed to these multi-dimensional regression models to predict the fabric colours measured in CIELAB values. The raw data used in our study are directly provided by a dyeing and printing manufacturer. As our experiments show, BPNN outperforms M-SVR with a relatively large data set while M-SVR is more accurate than BPNN is with a relatively small data set.

Zhao Chen, Chengzhi Zhou, Yijun Zhou, Lingyun Zhu, Ting Lu, Guohua Liu

Multi-situation Analytic Hierarchy Process Based on Bayesian for Mobile Service Recommendation

Aiming at that the mobile service recommendation results are inaccurate when the preparatory recommendation schemes are similar, this paper proposes a multi-situation Analytic Hierarchy Process based on Bayesian (MSAHPB). Firstly, the three-layer model of AHP was constructed. Then, introducing the multiple situation elements into the standard layer of the MSAHPB model. In order to determine the mutual influence between scenarios, different situations are used as criteria to estimate the impact of each situation on the recommended target. After that, establishing the relational judgment matrix for the adjacent two layers, in which the Bayesian is used instead of the method of assigning matrix by artificial experience. Deducing the weighted values of each situation by Bayesian formula, taking the prior probability of events as a benchmark. Then, to ensure that each matrix satisfies the consistency criterion, this paper tests the consistency of each judgment matrix according to the 1–9 scale, and calculates the matching degree between each scheme and target. Finally, using food service recommendation as an example, the experimental results showed that the method we proposed is effective.

Weihong Wang, Fuxiang Zhou, Yuhui Cao, Dawei Zhang, Jieli Sun

Multi-source Enterprise Innovation Data Fusion Method Based on Hierarchy

As a main body of innovation behavior, enterprise involves a lot of application systems, which accumulate a large amount of data resources. It is necessary to integrate enterprise innovation behavior data set to provide users with unified data view. This paper proposes a multi-source data fusion method based on multi-level. It describes the information fusion model from 3 levels: data level, feature level and decision level. The experimental results show that the fusion method can meet the needs of enterprise’s innovation behavior in data fusion and decision analysis.

Jinying Xu, Yuehua Lv, Jieren Cheng

Network Public Opinion Emotion Classification Based on Joint Deep Neural Network

The analysis of the emotional tendency of public opinion data in the network can help to grasp the dynamics of public opinion in a timely and accurate way, and extract the trend of public opinions. At present, the neural network model has been proved to have a good performance in sentiment classification. Therefore, according to the characteristics of public opinion information, this paper proposes a joint deep neural network to extract high-dimensional features of word-level through convolutional neural network (CNN), and then input it into the long short term memory network (LSTM) to learning sequence characteristics. This model was used to emotionally categorize the Weibo commentary data of “Yulin Maternal Jumping Event” and obtained high classification accuracy.

Xiaoling Xia, Wenjie Wang, Guohua Yang

Optimizing Cuckoo Feature Selection Algorithm with the New Initialization Strategy and Fitness Function

In machine learning and data mining tasks, feature selection is an important process of data preprocessing. Recent studies have shown that Binary Cuckoo Search Algorithm for Feature Selection (BCS [1]) has the better ability to classification and dimension reduction. However, by analyzing BCS algorithm, we notice that the randomness of initialization and the defects of fitness function severely weaken the classification performance and dimension reduction. Therefore, we propose a new feature selection algorithm FS $$\_$$ CSO, which adopts the chaotic properties of the Chebyshev as a new initialization strategy to get the better original populations (solutions), and combines the information gain and the L1-norm as a new fitness function to accelerate the convergence of the algorithm. We validate FS $$\_$$ CSO with various experimental data on small, medium and large datasets on the UCI dataset. In the experiment, FS $$\_$$ CSO uses the KNN, J48 and SVM classifiers to guide the learning process. The experimental results show that FS $$\_$$ CSO has a significant improvement in classification performance and dimension reduction. Comparing the FS $$\_$$ CSO algorithm with the more efficient feature selection algorithms proposed in recent years, FS $$\_$$ CSO is highly competitive in terms of accuracy and dimension reduction.

Yingying Wang, Zhanshan Li, Haihong Yu, Lei Deng

Prediction of Garlic Price Based on ARIMA Model

In recent years, garlic prices have fluctuate drastically, and garlic price prediction has always been the focus of attention. In order to study the price fluctuations of garlic and predict garlic prices, the most commonly used time series forecasting method autoregressive integrated moving average (ARIMA) model is used in this article to predict garlic prices. Combining the monthly average price data for 2010–2017 in Shandong, China, which is representative of the world. Using the powerful data analysis function of R language, forecast the monthly average price of garlic in the first half of 2018 in Shandong province. The results of experiment show that the ARIMA model has good effect in predicting the short-term garlic price fluctuation, and the garlic price fluctuation trend in the first half of 2018 is to rise first and then fall. Finally, according to several major factors which affecting the price fluctuation of garlic, some suggestions such as the establishment of garlic growth model, the improvement of forecast methods, and the strengthening of market supervision are proposed.

Baojia Wang, Pingzeng Liu, Chao Zhang, Junmei Wang, Liu Peng


Weitere Informationen

Premium Partner