Clustering and Classification

Frontmatter

BOWL: Bag of Word Clusters Text Representation Using Word Embeddings

The text representation is fundamental for text mining and information retrieval. The Bag Of Words (BOW) and its variants (e.g. TF-IDF) are very basic text representation methods. Although the BOW and TF-IDF are simple and perform well in tasks like classification and clustering, its representation efficiency is extremely low. Besides, word level semantic similarity is not captured which results failing to capture text level similarity in many situations. In this paper, we propose a straightforward Bag Of Word cLusters (BOWL) representation for texts in a higher level, much lower dimensional space. We exploit the word embeddings to group semantically close words and consider them as a whole. The word embeddings are trained on a large corpus and incorporate extensive knowledge. We demonstrate on three benchmark datasets and two tasks, that BOWL representation shows significant advantages in terms of representation accuracy and efficiency.

Weikang Rui, Kai Xing, Yawei Jia

Clustering Categorical Sequences with Variable-Length Tuples Representation

Clustering categorical sequences is currently a difficult problem due to the lack of an efficient representation model for sequences. Unlike the existing models, which mainly focus on the fixed-length tuples representation, in this paper, a new representation model on the variable-length tuples is proposed. The variable-length tuples are obtained using a pruning method applied to delete the redundant tuples from the suffix tree, which is created for the fixed-length tuples with a large memory-length of sequences, in terms of the entropy-based measure evaluating the redundancy of tuples. A partitioning algorithm for clustering categorical sequences is then defined based on the normalized representation using tuples collected from the pruned tree. Experimental studies on six real-world sequence sets show the effectiveness and suitability of the proposed method for subsequence-based clustering.

Liang Yuan, Zhiling Hong, Lifei Chen, Qiang Cai

Cellular Automata Based on Occlusion Relationship for Saliency Detection

Different from the traditional images, 4D light field images contain the scene structure information and have been proved that can better obtain the saliency. Instead of estimating depth or using the unique refocusing capability, we proposed to obtain the occlusion relationship from the raw image to calculate saliency detection. The occlusion relationship is calculated using the Epipolar Plane Image (EPI) from the raw light field image which can distinguish a region is most likely a foreground or background. By analyzing the occlusion relationship in the scene, true edges of objects can be selected from the surface textures of objects, which is effective to segment the object completely. Moreover, we assume that objects which are non-occluded are more likely to be the foreground and objects that are occluded by lots of objects are background. Then the occlusion relationship is integrated into a modified saliency detection framework to obtain the saliency regions. Experiment results demonstrate that the occlusion relationship can help to improve the saliency detection accuracy, and the proposed method achieves significantly higher accuracy and robustness in comparison with state-of-the-art light field saliency detection methods.

Hao Sheng, Weichao Feng, Shuo Zhang

Text Mining and Lexical Analysis

Frontmatter

A Practical Method of Identifying Chinese Metaphor Phrases from Corpus

Research of linguistic metaphors is an important branch of natural language processing. Applications (e.g. semantic understanding, machine translation, and information retrieval) are affected if metaphors can not be identified appropriately. This paper presents a three-phase method for recognizing Chinese metaphor phrases from a large-scale corpus. First, we acquire the context of every candidate phrase. Then hierarchical clustering is used to cluster the phrases based on their contextual information. Finally, heuristic rules are used on the clustering result to determine whether a candidate phrase is a metaphor phrase. Experimental results show the method achieves a satisfactory performance.

Jianhui Fu, Shi Wang, Ya Wang, Cungen Cao

Knowledge Extraction from Chinese Records of Cyber Attacks Based on a Semantic Grammar

Knowledge acquisition from text is an important research of artificial intelligence. In this paper, we present a method of acquiring knowledge from Chinese records of events of cyber attacks based on a semantic grammar. In order to parse the sentences in the records, the method first identifies Chinese noun phrases in the records, and then use the semantic grammar of the cyber-attack domain to parse the records. Finally, knowledge is extracted from the parsing trees. Experimental results show that our method for noun phase identification has a good performance, and the precision of knowledge acquisition reaches a high level of 90 %.

Fang Fang, Ya Wang, Luchen Zhang, Cungen Cao

Increasing Topic Coherence by Aggregating Topic Models

In this paper, we introduce a novel method for aggregating multiple topic models to produce an aggregate model that contains topics with greater coherence than individual models. When generating a topic model a number of parameters must be specified. Depending on the parameters chosen the resulting topics can be very general or very specific. In this paper the process of aggregating multiple topic models generated using different parameters is investigated; the hypothesis being that combining the general and specific topics can increase topic coherence. The aggregate model is created using cosine similarity and Jensen-Shannon divergence to combine topics which are above a similarity threshold. The model is evaluated using evaluation methods to calculate the coherence of topics in the base models against those of the aggregated model. The results presented in this paper show that the aggregated model outperforms standard topic models at a statistically significant level in terms of topic coherence when evaluated against an external corpus.

Stuart J. Blair, Yaxin Bi, Maurice D. Mulvenna

Learning Chinese-Japanese Bilingual Word Embedding by Using Common Characters

Bilingual word embedding, which maps word embedding of two languages into one vector space, has been widely applied in the domain of machine translation, word sense disambiguation and so on. However, no model has been universally accepted for learning bilingual word embedding. In this work, we propose a novel model named CJ-BOC to learn Chinese-Japanese word embeddings. Given Chinese and Japanese share a large portion of common characters, we exploit them in our training process. We demonstrated the effectiveness of such exploitation through theoretical and also experimental study. To evaluate the performance of CJ-BOC, we conducted a comprehensive experiment, which reveals its speed advantage, and high quality of acquired word embeddings as well.

Jilei Wang, Shiying Luo, Yanning Li, Shu-Tao Xia

Content and Document Analysis

Frontmatter

Analyzing Topic-Sentiment and Topic Evolution over Time from Social Media

Most online news websites have enabled users to annotate their sentiments while reading the news. Different from traditional users’ feedbacks such as reviews or ratings, those annotations are more intuitive to express the sentiment of the users. Topic model is proved more effective to analyze the text information, however, most existing topic models focus on either extracting static topic sentiment or tracking topics over time but ignoring sentiment analysis. In the paper, we propose a joint topic-sentiment over time model (JTSoT) to detect the topic-sentiment shift and track the topic evolution over time. The critical challenge is how to balance the relationship among the topic, sentiment and time. The topic is represented as a Beta distribution over time and a Dirichlet distribution with respect to the sentiment. We evaluate our method on the real-world news dataset. The experimental results show that we have achieved high correlation between the topic and sentiment, better interpretable topic evolution, and higher document sentiment classification result and perplexity.

Yan Hu, Xiaofei Xu, Li Li

An Unsupervised Framework Towards Sci-Tech Compound Entity Recognition

Classifying sci-tech compound named entities, such as the names of patents and projects, plays an important role in enhancing many high-level applications. However, there are very little work on this novel and hard problem. Traditional sequence labeling strategies cannot apply on sci-tech compound entities due to heavy cost of human annotation and low data redundancy. This paper concludes three intrinsic characteristics of sci-tech compound entities, and further proposes a generic and unsupervised framework named SCSegVal to address the problem. Our SCSegVal consists of two components: text splitting and segment validating. We reduce the best split of a text to the problem of maximizing the stickiness sum of segments. The construction of indicative words used in segment validating is reduced to the classical minimum set cover problem. Experimental results on classifying real-world science-technology entities show that SCSegVal achieves a sharp increase comparing with the classical supervised HMM-based approach.

Yang Yan, Tingwen Liu, Li Guo, Jiapeng Zhao, Jinqiao Shi

Identifying Helpful Online Reviews with Word Embedding Features

The advent of Web 2.0 has enabled users to share their opinions via various social media websites. People’s decision-making process is strongly influenced by online reviews. Predicting the helpfulness of reviews can help to save time and find helpful suggestions. However, most of previous works focused on exploring new features with external data source, such as user’s profile, semantic dictionaries, etc. In this paper, we maintain that the helpfulness of an online review can be predicted by knowing only word embedding information. Word embedding information is a kind of word semantic representation computed with word context. We hypothesize that word embedding information would allow us to accurately predict the helpfulness of an online review. The experiments were conducted to prove this hypothesis and the results showed a substantial improvement compared with baselines of features previously used.

Jie Chen, Chunxia Zhang, Zhendong Niu

Facial Texture Analysis for Recognition of Human Gender

The study of similarities and dissimilarities between human faces has been very active research topic for decades. The human face is composed of several textural representations, carrying discriminating information from one another mainly due to variations in age, gender, and facial expressions. Based on study and analysis of different facial features, the development of an accurate and efficient gender recognition system is an ultimate requirement for different useful applications such as surveillance, design of entrance and exit protocols at shopping malls, browsing of gender specific advertisement material on internet, etc. The gender recognition is essentially a binary classification process in which an input image is reported either true or false gender of the person in question. In this paper, we propose a knowledge based gender recognition system using histogram of oriented gradients (HOG) and local binary descriptor (namely LBP, Brisk and FREAK) based features, which can be used in robotics etc. To the best of our knowledge, it is the first time that Brisk and FREAK based features are used for the gender recognition problem. To evaluate our proposed system, we use standard IMM gender database (of 240 images) to extracted features and train (and test) the K nearest neighbors (KNN) classifier. Our results show that the proposed gender recognition system outperforms the existing methods when tested on above mentioned database, by giving accuracy as high as 90 %for FREAK features, and 94.16 % for HOG feature, improved by 6.5 %.

Zahra Noor, Muhammad Usman Akram, Mahmood Akhtar, Muhammad Saad

Enterprise Knowledge

Frontmatter

Quantitative Analysis Academic Evaluation Based on Attenuation-Mechanism

The citation-based measure is known unpredictable. However, it is used quite often in the cases when quantitatively evaluating the academic impact is required. With the development of social networks, it is natural to ask the question: is there any trustworthy model which is able to provide quantitatively analysis of the academic impact with a huge amount of relevant information instead of peer-review only before the prevalence of social media? Many efforts have been devoted to provide the standard academic evaluation indicators, but they are either inadequate to be fully qualified or unable to become the universal applicable measure. In this paper, we propose a systematic approach, named Attenuation Mechanism, to quantitatively analysis the academic evaluation based on four estimated factors. It would bring new insights into how the academic impact takes place and the influence it has (either short term or long term). The extensive experiments on real academic search datasets show that the proposed model can perform significantly better than the baseline models in different areas and different disciplines.

Fan Li, WenLi Yu, JinJing Zhang, Li Li

When IT Leveraging Competence Meets Uncertainty and Complexity with Social Capital in New Product Development

We examine how the aspects of IT leveraging competence [i.e., the effective uses of project and resource management systems (PRMS), organizational memory systems (OMS), and cooperative work systems (CWS)] and the social capital (SOCI) influence the performance [i.e., product effectiveness (PDT) and process efficiency (PCS)] by the coordination capability (COOR) and absorptive capacity (ACAP) under the uncertainty and complexity in the new product development (NPD). We find the IT leveraging competence positively affects COOR and ACAP, the links of SOCI-COOR, SOCI-ACAP, COOR-PCS, and ACAP-PDT are positive, neither uncertainty nor complexity has the moderating effect on the COOR-PCS link, the uncertainty negatively moderates the ACAP-PDT link, but the complexity has no moderating effect on this link. Our findings reveal why the NPD teams may have difficulty achieving high levels of performance and why these teams may vary in their ability to create the value from their COOR and ACAP.

Shiuann-Shuoh Chen, Pei-Yi Chen, Min Yu

Adoption Factors for Crowdsourcing Based Medical Information Platforms

Platforms increasingly utilize crowdsourcing to offer knowledge intense services such as medical diagnostics while retaining low costs. The increased interest elevated the adoption of established theories for user acceptance, risk avoidance and motivational influencing factors. We propose a combination of elements from existing concepts, extended by newly identified factors, to postulate a novel theoretical model. Our analysis of a survey with 349 respondents reveals new constructs based on users’ perception of risks and features. We found that risks and features are significantly interrelated and both influence perceived usefulness, the technology acceptance model’s most important construct. Usefulness is diminished by perceived risks while it is increased by crowdsourcing features. Additionally, external motivation yields important influence factors. The revealed interrelations are discussed and should be accounted for in future research and implementations.

Till Blesik, Markus Bick

Crowd Label Aggregation Under a Belief Function Framework

Crowdsourcing emerged as an efficient human-powered concept to tackle the problem of labeling complex tasks that computer programs still cannot solve. Amazon’s Mechanical Turk is one of the most popular platforms that allows to gather labels from human workers. These labels are then aggregated in order to estimate the true labels. Considering that not all labelers are experts, their answers may be imperfect and consequently unreliable. In this paper, we propose a novel label aggregation method based on the belief function theory. The proposed method grants a strong framework that does not only allow to reliably aggregate imperfect labels but also to integrate labelers expertise for more accurate results. To demonstrate the effectiveness of the proposed method, experiments are conducted on real datasets. The results show that our method is a promising solution in the crowd labeling domain.

Lina Abassi, Imen Boukhris

Formal Semantics and Fuzzy Logic

Frontmatter

Weighted Node Importance Contribution Correlation Matrix for Identifying China’s Core Metro Technologies with Patent Network Analysis

The purpose of this study is to identify the core technologies in the metro domain by analyzing its patent network, which is beneficial for grasping technological trends and advancing the metro domain in China. Metro patent data (1986–2016) published in China were collected from the State Intellectual Property Office of the People’s Republic of China. Then, we built a patent network with co-occurrence of information from the International Patent Classification, and improved the node importance contribution correlation matrix method to a weighted version in order to calculate the importance of each node. Nodes with high importance scores play more crucial roles in efficiency and stability of the network, and are viewed as the core metro technologies. The results can be useful for companies’ technology R&D planning and government policymaking.

Mei Long, Tieju Ma

Situation-Aware Rating Prediction Using Fuzzy Rules

Context-Aware Recommendation Systems (CARS) extend traditional recommendation systems by adapting their output to users’ specific contextual situations. Rating prediction in CARS has been tackled by researchers attempting to recommend appropriate items to users. However, in rating prediction, three thriving challenges are still to tackle: (i) the weight of each context dimension; (ii) the correlation between context dimensions; and (iii) situation inference. A major shortcoming of the classical methods is that there is no defined way to study dependencies and interactions existing among context dimensions. Context-aware algorithms made a strong assumption that context dimensions weights are the same or initialized with random values. To address these issues, we propose a novel approach for weighting context dimensions, studying the correlation between them to infer the current situation then predict the rating based on the inferred situation. Through detailed experimental evaluation we demonstrate that the proposed approach is helpful to improve the prediction accuracy.

Rim Dridi, Saloua Zammali, Khedija Arour

Adaptive Conceding Strategies for Negotiating Agents Based on Interval Type-2 Fuzzy Logic

In human-agent automated negotiations, one of crucial problems is how a negotiating agent updates conceding strategies in the light of the new information during the course of a negotiation. To this end, this paper proposes a novel model of a seller negotiating agent, which can be used in human-agent negotiations. More specifically, it can dynamically change its conceding strategies according to the remaining time and opponents’ cooperative degree. We use type-2 fuzzy rules to determine such changes because the rules of this kind can well reflect uncertain information in human-computer negotiations. Finally, our agent is evaluated by both agent-agent and human-agent experiments.

Jieyu Zhan, Xudong Luo

Semiotic Rules Generation and Inferences Reasoning for Movie Documents

With the rapid but still incomplete maturation of the information retrieval research related to multimedia document, the progress of a new solution requires the extraction of semantic information from the content. It should be however, not only extracted from the content but it should also present the different semiotics meaning conveyed in the content. In our work, we concentrate our efforts on the movie documents. In fact, the knowledge extracted separately can conceal the global vision on the sequence of events or analysis of history conveyed in a film. In this context, we are interested in this paper to generate relationships either between sub-parts of the same movie or between movies. Consequently, we propose an inference reasoning to build these relationships in order to reveal the hidden knowledge semantics of resources. These relationships are basically based on the semiotic description that poses a major challenge. A case study where we substantiate and prove the accurate performance of our proposed process is highlighted.

Manel Fourati, Anis Jedidi, Faiez Gargouri

Knowledge Engineering

Frontmatter

Generic Model for Adaptable Caching in the Knowledge-Oriented Web Engineering

Web applications represent a commonly used type of technology in many domains. They are applied for various knowledge-intensive purposes ranging from the external marketing activities to the internal content presentation. Therefore, the speed in which they can be utilised represents quite significant variable of their evaluation. Based on the domain knowledge expressed in a form of generic model this paper presents a possibility of the adaptable cache utilisation. The model is based on existing technologies and its aim is to significantly reduce data load and time demands associated with the use of web pages. The model is implemented in practice with various fragments of codes provided in text. This implementation is consequently tested by means of the three most common types of web pages. Acquired results prove significant reduction of data load in comparison with the traditional request-response model. The paper also outlines directions for further research in this area.

Jiří Štěpánek, Vladimír Bureš

Automatic Construction of Generalization Hierarchies for Publishing Anonymized Data

Concept hierarchies are widely used in multiple fields to carry out data analysis. In data privacy, they are known as Value Generalization Hierarchies (VGHs), and are used by generalization algorithms to dictate the data anonymization. Thus, their proper specification is critical to obtain anonymized data of good quality. The creation and evaluation of VGHs require expert knowledge and a significant amount of manual effort, making these tasks highly error-prone and time-consuming. In this paper we present AIKA, a knowledge-based framework to automatically construct and evaluate VGHs for the anonymization of categorical data. AIKA integrates ontologies to objectively create and evaluate VGHs. It also implements a multi-dimensional reward function to tailor the VGH evaluation to different use cases. Our experiments show that AIKA improved the creation of VGHs by generating VGHs of good quality in less time than when manually done. Results also showed how the reward function properly captures the desired VGH properties.

Vanessa Ayala-Rivera, Liam Murphy, Christina Thorpe

Schema-Based Query Rewriting in SPARQL

SPARQL query in the semantic web has drawn consideration attention from the OWL and RDF communities. In this paper, we present SPARQL-S, a new system for SPARQL entailment regime focused on fast and efficient querying with meaningful ontology schema and large volumes of RDF data. The basic idea of SPARQL-S is different from the previous SPARQL entailment implements which are mostly archived by deriving additional facts. SPARQL-S focuses on how to rewrite the BGP in SPARQL to include more entailment BGP information absorbed from ontology schema by concept entailment and then query the rewritten SPARQL in a distribution graph computation framework such as GraphX.

Lili Jiang, Jie Luo

Knowledge Enrichment and Visualization

Frontmatter

Finding the Optimal Users to Mention in the Appropriate Time on Twitter

Nowadays, Twitter has become an important platform to expand the diffusion of information or advertisement. Mention is a new feature on Twitter. By mentioning users in a tweet, they will receive notifications and their possible retweets may help to initiate large cascade diffusion of the tweet. In order to maximize the cascade diffusion, two important factors need to be considered: (1) The mentioned users will be interested the tweet; (2) The mentioned users should be online. The second factor was mainly studied in this paper. If we mention users when they are online, they will receive notifications immediately and their possible retweets may help to maximize the cascade diffusion as quickly as possible. In this paper, an unbalance assignment problem was proposed to ensure that we mentioned the optimal users in the appropriate time. In the assignment problem, constraints were modeled to overcome the overload problems on Twitter. Further, the unbalance assignment problem was converted to a balance assignment problem, and the Hungarian algorithm was took to solve the above problem. Experiments were conducted on a real dataset from Twitter containing about 2 thousand users and 5 million tweets in a target community, and results showed that our method was consistently better than mentioning users randomly.

Dayong Shen, Zhaoyun Ding, Fengcai Qiao, Jiajun Cheng, Hui Wang

Extracting Knowledge from Web Tables Based on DOM Tree Similarity

Structured (semi-structured) knowledge extraction from Web tables is an important way to obtain high quality knowledge. Unlike most extraction methods which need to understand the tables with external knowledge bases, our method uses the inherent similarities of tables to determine the semantic structure of tables. With a comprehensive analysis of table structures of various forms, we provide a novel way for calculating the DOM tree similarity between various web tables based on DTW and for clustering tables. By using 5000 Wikipedia tables which were extracted at random as the corpus, experiments show that the result of table clustering is close to the result of classification based on empirical approaches, and without the use of external knowledge bases, the quality of knowledge extracted from the tables is satisfactory.

Xiaolong Wu, Cungen Cao, Ya Wang, Jianhui Fu, Shi Wang

Single Image Dehazing Using Hölder Coefficient

Restoring the true scene appearance from hazy image is a challenging task and one of a most necessary part in image processing system. As we know, the clear-day image must have higher contrast compared to the hazy image. Our main idea is that the hazy image is enhanced based on this observation to achieve dehazing objective. Firstly, we use a new metric, simple but powerful Hölder coefficient, to estimate the hazy density roughly. In order to make the estimation density map more reasonable, we apply proposed energy function to refine it. Based on the refined map, we propose a new method to estimate the atmosphere light. Secondly, three new terms, which are used to enhance image, are modeled into energy function. Solving this energy function, transmission map can be obtained. Finally, we get haze-free image by using the transmission map. Experiment results demonstrate that our algorithm has similar or better performance compared to the state-of-art algorithms.

Dehao Shang, Tingting Wang, Faming Fang

Transfer Learning Based on A+ for Image Super-Resolution

Example learning-based super-resolution (SR) methods are effective to generate a high-resolution (HR) image from a single low-resolution (LR) input. And these SR methods have shown a great potential for many practical applications. Unfortunately, most of popular example learning-based approaches extract features from limited training images. These training images are insufficient for super resolution task. Our work is to transfer some supplemental information from other domains. Therefore, in this paper, a new algorithm Transfer Learning based on A+ (TLA) is proposed for image super-resolution task. First, we transfer supplemental information from other datasets to construct a new dictionary. Then, in sample selection, more training samples are supplemented to the basic training samples. In experiments, we seek to explore what types of images can provide more appropriate information for super-resolution task. Experimental results indicate that our approach is superior to A+ when transferring images containing similar content with original data.

Mei Su, Sheng-hua Zhong, Jian-min Jiang

Intuitive Knowledge Connectivity: Design and Prototyping of Cross-Platform Knowledge Networks

Individual users are overwhelmed with a flood of data. Current big-data strategies focus mainly on organizational uses of data analytics. To address this gap, we focus on personal data management (PDM) in the era of big data and cloud computing. We are developing and testing a PDM software that enables individuals to construct a cross-platform knowledge network by semi-automatically connecting new relevant data to an existing network of interlinked digital objects. Because the cloud-based services that support our knowledge work are currently fragmented, we suggest an integrated federated platform for editing and searching the personal-knowledge context as a network. This forms a directed edge-labeled property multigraph that spans over all of the cloud-based data silos. We present a design and a proof-of-concept implementation of a PDM tool that allows the creation of a personal-knowledge network that incorporates digital objects from different cloud services.

Michael Kaufmann, Andreas Waldis, Patrick Siegfried, Gwendolin Wilke, Edy Portmann, Matthias Hemmje

Knowledge Management

Frontmatter

Intellectual Capital and Boundary-Crossing Management Knowledge

Exploring how the intellectual capital (i.e., the capability of system, coordination, and socialization, and the human capital) influences the boundary-crossing management knowledge (i.e., the syntactic transfer, semantic translation, and pragmatic transformation), this study identifies differing effects for three dimensions of boundary-crossing management knowledge. The results indicate that the coordination capability primarily enhances a team’s syntactic transfer, semantic translation, and pragmatic transformation. The socialization capability primarily improves a team’s semantic translation and pragmatic transformation. Our findings reveal why teams may have difficulty managing levels of syntactic transfer, semantic translation, and pragmatic transformation and vary in their ability to create value from their boundary capability.

Shiuann-Shuoh Chen, Min Yu, Pei-Yi Chen

KPD: An Investigation into the Usability of Knowledge Portal in DMAIC Knowledge Management

Knowledge is considered as a resource that contributes an important role in the success of Six Sigma DMAIC methodology. However, knowledge resides brain of the individuals and exists in various forms and different places. This rise the problem of how to collect and share DMAIC knowledge everywhere all time. In this paper, we introduce a proposed Knowledge Portal named KPD that had been designed as a tool to manage DMAIC knowledge. Through the deployment of the Knowledge Portal, this paper aims at investigating its impacts on DMAIC execution based on experiments and appreciation of experts who are working in the areas of quality management and information technology. The results of the survey reveal that KPD benefits DMAIC deployment and impacts positively on the success of DMAIC through its knowledge management process.

Thanh-Dat Nguyen, Sergiu Nicolaescu, Claudiu Vasile Kifor

Knowledge Management and Intellectual Capital in the Logistics Service Industry

The changing business scenario in the logistics service market is affecting the development of relationships with customers and the continuous adaptation of service offering. In this context, knowledge management and intellectual capital are potentially successful assets for developing and improving competitive capabilities of logistics service companies. In order to supply more complex and knowledge-intensive services, it is necessary to evaluate the existing IC assets to identify future needs in this area. The main aim of this paper is to investigates how to assess the intellectual capital in order to improve the management of knowledge in third-party logistics service providers. The paper reviews the main methods for assessing intellectual capital assets in the logistics service industry. It suggests the non-monetary methods as the most appropriate ones.

Vincenzo Del Giudice, Pietro Evangelista, Pierfrancesco De Paola, Fabiana Forte

Knowledge Retrieval

Frontmatter

Digitalizing Seismograms Using a Neighborhood Backtracking Method

In this paper, we present a new algorithm for tracing waves on seismograms and digitalising the waves into single vectors in the form of a time series data. The algorithm consists of two main components that will be used to handle the smooth and complicate cases, respectively. The underlying feature of the algorithm lies in a novel searching process based on examining the context of pixels to ascertain how the tracing moves forward. The algorithm has been evaluated on a limited number of samples and the result demonstrates its competence. The work presented can be regarded as an effort of developing a uniform earthquake archive covering both the historical and the digital device periods for future reassessment of the seismic hazard cross the world. The archive developed will serve as an effective means for discovering precursor of earthquakes by characterising the spectral seismograms and the source parameters of less active sources, whereby permitting comparative studies on earthquakes and development of prediction models.

Yaxin Bi, Shichen Feng, Guoze Zhao, Bing Han

A Document Modeling Method Based on Deep Generative Model and Spectral Hashing

One of the most critical challenges in document modeling is the efficiency of the extraction of the high level representations. In this paper, a document modeling method based on deep generative model and spectral hashing is proposed. Firstly, dense and low-dimensional features are well learned from a deep generative model with word-count vectors as its input. And then, these features are used for training a spectral hashing model to compress a novel document into compact binary code, and the Hamming distances between these codewords correlate with semantic similarity. Taken together, retrieving similar neighbors is then done simply by retrieving all items with codewords within a small Hamming distance of the codewords for the query, which can be exceedingly fast and shows superior performance compared with conventional methods as well as guarantees accessibility to the large-scale dataset.

Hong Chen, Jungang Xu, Qi Wang, Ben He

Best Guided Backtracking Search Algorithm for Numerical Optimization Problems

Backtracking search algorithm is a promising stochastic search technique by using its historical information to guide the population evolution. Using historical population information improves the exploration capability, but slows the convergence, especially on the later stage of iteration. In this paper, a best guided backtracking search algorithm, termed as BGBSA, is proposed to enhance the convergence performance. BGBSA employs the historical information on the beginning stage of iteration, while using the best individual obtained so far on the later stage of iteration. Experiments are carried on the 28 benchmark functions to test BGBSA, and the results show the improvement in efficiency and effectiveness of BGBSA.

Wenting Zhao, Lijin Wang, Bingqing Wang, Yilong Yin

Domain Specific Cross-Lingual Knowledge Linking Based on Similarity Flooding

The global knowledge sharing makes large-scale multi-lingual knowledge bases an extremely valuable resource in the Big Data era. However, current mainstream multi-lingual ontologies based on online wikis still face the limited coverage of cross-lingual knowledge links. Linking the knowledge entries distributed in different online wikis will immensely enrich the information in the online knowledge bases and benefit many applications. In this paper, we propose an unsupervised framework for cross-lingual knowledge linking. Different from traditional methods, we target the cross-lingual knowledge linking task on specific domains. We evaluate the proposed method on two knowledge linking tasks to find English-Chinese knowledge links. Experiments on English Wikipedia and Baidu Baike show that the precision improvement of cross-lingual link prediction achieve the highest 6.12 % compared with the state-of-art methods.

Liangming Pan, Zhigang Wang, Juanzi Li, Jie Tang

LSSL-SSD: Social Spammer Detection with Laplacian Score and Semi-supervised Learning

The rapid development of social networks makes it easy for people to communicate online. However, social networks usually suffer from social spammers due to their openness. Spammers deliver information for economic purposes, and they pose threats to the security of social networks. To maintain the long-term running of online social networks, many detection methods are proposed. But current methods normally use high dimension features with supervised learning algorithms to find spammers, resulting in low detection performance. To solve this problem, in this paper, we first apply the Laplacian score method, which is an unsupervised feature selection method, to obtain useful features. Based on the selected features, the semi-supervised ensemble learning is then used to train the detection model. Experimental results on the Twitter dataset show the efficiency of our approach after feature selection. Moreover, the proposed method remains high detection performance in the face of limited labeled data.

Wentao Li, Min Gao, Wenge Rong, Junhao Wen, Qingyu Xiong, Bin Ling

Knowledge Systems and Security

Frontmatter

i-Shield: A System to Protect the Security of Your Smartphone

Losing smartphones is a troublesome thing as smartphones are playing an important role in our daily lives. As smartwatches become popular, we argue that smartwatches can play a role in smartphone antitheft design. In this paper, we propose i-Shield, a real-time antitheft system that leverages accelerometers and gyroscopes of smartphones and smartwatches to prevent smartphone being stolen. As opposed to existing solutions which are based on Bluetooth, NFC, or GPS tracking, i-Shield follows a practical manner to achieve the goal of real-time antitheft for smartphones. i-Shield recognizes taken-out events of smartphones using a supervised classifier, and applies a dynamic time warping (DTW) scheme to recognize whether the events are caused by users themselves. We conduct a series of experiments on iPhone6 and iPhone4s, and the evaluation results show that our system can achieve 97.4 % true positive rate of recognizing taken-out actions, and classify taken-out actions with misclassification rate of 1.12 %.

Zhuolong Yu, Liusheng Huang, Hansong Guo, Hongli Xu

Rule Management in Clinical Decision Support Expert System for the Alzheimer Disease

The explosion of medical knowledge and the uncertainty of some patient information in several diseases cause many clinical errors in medical decision support systems. Therefore, we aim to reduce these deficiencies using new decision support approach and new improvements. In this paper, we present a new way to manage rules in expert systems. First, we propose a knowledge specialization process that uses three types of rule bases. Then we explain how we can manage rules across these three types of rule bases. Finally, we present an implementation of a rule management system for the Alzheimer disease. We think that this improvement can enhance clinical decision support expert system performances to better support medical decision support.

Firas Zekri, Rafik Bouaziz

Stability Analysis of Switched Systems

A switched system is a hybrid system that is composed of a family of continuous-time and discrete-time subsystems with a specific rule orchestrating the switching among the subsystems. In this paper, switched systems are grouped into two broad categories based on the switch strategy. They are time-driven switched systems and event-driven switched systems, respectively. The sufficient conditions for exponentially asymptotic stability of the above two switched systems are discussed. We then perform both state responses and phase planes simulation of those systems in MATLAB. The stability of the subsystems and the switched system as a whole is further analyzed based on the simulation results.

Jinjing Zhang, Fan Li, Xiaobin Yang, Li Li

Implicit and Explicit Trust in Collaborative Filtering

Recommender Systems based on collaborative filtering could provide users with accurate recommendation. However, sometimes due to data sparsity and cold start of the input ratings matrix, this method could not find similar users accurately. In the past, researchers used implicit trust weight instead of the similarity weight to find similar users, to improve the quality of recommendation [17]. And they often ignore the role of explicit trust in the process of finding similar users. Therefore, in this paper, we explore the calculation of implicit trust and explicit trust. Then according to their role in the recommendation system, we propose a method that combined trust and similarity to get a better recommendation. At last, by experimenting on FilmTrust [5] data set which has the explicit trust matrix, the result showed that the method we proposed significantly improve the quality of recommendation, in addition, implicit trust and explicit trust have a positive effect on the quality of the results of recommendation.

Yuanxin Ouyang, Jingshuai Zhang, Weizhu Xie, Wenge Rong, Zhang Xiong

Neural Networks and Artificial Intelligence

Frontmatter

A Subset Space Perspective on Agents Cooperating for Knowledge

In this paper, we propose an additional application area of the subset space semantics of modal logic in terms of cooperating agents. While the original conception reflects both the knowledge acquisition process and the accompanying topological effect for a single agent, we show how a slight extension of that system can be utilized for modeling agents which, in a strict sense, cooperate for knowledge. In so doing, the agents will come in by means of so-called effort functions. These functions shall represent those of the agents’ actions which are targeted at more knowledge of the whole group. Our investigations result in a particular multi-agent version of the well-known logic of subset spaces, which allows us to reason about qualitative aspects of cooperation like the dominance of a joint commitment over any individual effort. On the technical side, a soundness and completeness theorem for one of the logics arising from that will be proved.

Bernhard Heinemann

Context-Aware Tree-Based Convolutional Neural Networks for Natural Language Inference

Natural language inference (NLI) aims to judge the relation between a premise sentence and a hypothesis sentence. In this paper, we propose a context-aware tree-based convolutional neural network (TBCNN) to improve the performance of NLI. In our method, we utilize tree-based convolutional neural networks, which are proposed in our previous work, to capture the premise’s and hypothesis’s information. In this paper, to enhance our previous model, we summarize the premise’s information in terms of both word level and convolution level by dynamic pooling and feed such information to the convolutional layer when we model the hypothesis. In this way, the tree-based convolutional sentence model is context-aware. Then we match the sentence vectors by heuristics including vector concatenation, element-wise difference/product so as to remain low computational complexity. Experiments show that the performance of our context-aware variant achieves better performance than individual TBCNNs.

Zhao Meng, Lili Mou, Ge Li, Zhi Jin

Learning Embeddings of API Tokens to Facilitate Deep Learning Based Program Processing

Deep learning has been applied for processing programs in recent years and gains extensive attention on the academic and industrial communities. In analogous to process natural language data based on word embeddings, embeddings of tokens (e.g. classes, variables, methods etc.) provide an important basis for processing programs with deep learning. Nowadays, lots of real-world programs rely on API libraries for implementation. They contain numbers of API tokens (e.g. API related classes, interfaces, methods etc.), which indicate notable semantics of programs. However, learning embeddings of API tokens is not exploited yet. In this paper, we propose a neural model to learn embeddings of API tokens. Our model combines a recurrent neural network with a convolutional neural network. And we use API documents as training corpus. Our model is trained on documents of five popular API libraries and evaluated on a description selecting task. To our best knowledge, this paper is the first to learn embeddings of API tokens and takes a meaningful step to facilitate deep learning based program processing.

Yangyang Lu, Ge Li, Rui Miao, Zhi Jin

Closed-Loop Product Lifecycle Management Based on a Multi-agent System for Decision Making in Collaborative Design

In the collaborative design environment, there is an increasing demand for information exchange and sharing to reduce lead time and to improve product quality and value. Software and communication technologies can be a relevant approach in this context, using for instance PLM (Product Lifecycle Management) systems. Each product lifecycle development phase generates knowledge, and managing this knowledge can be placed in a closed-loop. In this paper, we present a research in progress that exposes a collaborative architecture based on a multi-agent system which aims to support the knowledge management process in the closed-loop. This is a new strategic approach to manage the product lifecycle information efficiently in a distributed environment. The purpose of this paper is to illustrate the use of DOCK (Design based on Organization, Competence and Knowledge) methodology for the design of our multi-agent system and to demonstrate how to handle intelligent knowledge via a use case study.

Fatima Zahra Berriche, Besma Zeddini, Hubert Kadima, Alain Riviere

Ontologies

Frontmatter

A Benchmark for Ontologies Merging Assessment

In the last years, ontology modeling became popular and thousands of ontologies covering multiple fields of application are now available. However, as multiple ontologies might be available on the same or related domain, there is an urgent need for tools to compare, match, merge and assess ontologies. Ontology matching, which consists in aligning ontology, has been widely studied and benchmarks exist to evaluate the different matching methods. However, somewhat surprisingly, there are no significant benchmarks for merging ontologies, proving input ontologies and the resulting merged ontology. To fill this gap, we propose a benchmark for ontologies merging, which contains different ontologies types, for instance: taxonomies, lightweight ontologies, heavyweight ontologies and multilingual ontologies. We also show how the GROM tool (Graph Rewriting for Ontology Merging) can address the merging process and we evaluate it based on coverage, redundancy and coherence metrics. We performed experiments and show that the tool obtained good results in terms of redundancy and coherence.

Mariem Mahfoudh, Germain Forestier, Michel Hassenforder

TRSO: A Tourism Recommender System Based on Ontology

In the era of information explosion, the Internet has become one of the most important tools for users to get information. As one of the main applications, most of the tourists, if not all, utilize the search engine to obtain the useful travelling information online which makes tourism recommender systems valuable. However, given a huge amount of online information, it still remains challenging to develop an effective tourism recommender system. To tackle this challenge, in this work, we propose TRSO, an ontology-based tourism recommender system by incorporating different techniques. First, we adopt the association rules to dig out the associated users from a large number of users. By doing so, users in the database are divided into two categories: related users and unrelated users. Second, for the related users, we propose a collaborative filtering algorithm by incorporating the time and evaluation factors. For the unrelated users, we utilize a different collaborative filtering algorithm, which integrates the time factor and the tourism attraction ontology information. Third, we further filter useless information according to the context information. Finally, we expand the tourism attraction with other tourism information such as shopping, eating and traveling based on a tourism ontology. The experimental results on the standard benchmark show that the proposed tourism recommendation algorithm can achieve satisfactory and comprehensive recommendation performance.

Yan Chu, Hongbin Wang, Liying Zheng, Zhengkui Wang, Kian-Lee Tan

Generic Ontology Design Patterns: Qualitatively Graded Configuration

For semantic modelling, ontologies are a good compromise between formality and accessibility to the layman, but lack sufficient methodological support and development tools. In particular, Generic Ontology Design Patterns are suggested as a domain independent methodological tool. Qualitatively graded relations combine semantic relations with qualitative valuations. Qualitatively graded configuration is illustrated in two application domains: mobility assistants for users with a variety of age-related impairments, and allowed food for users with diet restrictions.

Bernd Krieg-Brückner

Recommendation Algorithms and Systems

Frontmatter

CUT: A Combined Approach for Tag Recommendation in Software Information Sites

Software information sites such as Stack Overflow and Ask Ubuntu allow programmers to post their questions and share knowledge online. Usually tags that describe the key content of the questions are required by the website. These tags play an important role in organizing and indexing user posts efficiently and provide accurate abstracts of complicated technical problems. Users attach tags to the questions according to their experience and knowledge. Due to the expression difference and lack of grasp of the software, choosing the accurate tags is not an easy job. In this paper, we propose CUT, an automatic tag recommendation approach which recommends appropriate tags after users post their questions. This approach incorporates code fragments, text content, users’ preference to tags and tag relation in recommendation process. We evaluated CUT by conducting comparative experiments on the Stack Overflow dataset. The results show that CUT achieves 69.9 % and 81.6 % respectively for recall@5 and recall@10, which outperforms the latest relevant approach.

Yong Yang, Ying Li, Yang Yue, Zhonghai Wu, Wenlong Shao

CoSoLoRec: Joint Factor Model with Content, Social, Location for Heterogeneous Point-of-Interest Recommendation

The pervasive use of Location-based Social Networks calls for more precise Point-of-Interest recommendation. The probability of a user’s visit to a target place is influenced by multiple factors. Though there are several fusion models in such fields, heterogeneous information are not considered comprehensively. To this end, we propose a novel probabilistic latent factor model by jointly considering the social correlation, geographical influence and users’ preference. To be specific, a variant of Latent Dirichlet Allocation is leveraged to extract the topics of both user and POI from reviews which is denoted as explicit interest. Then, Probabilistic Latent Factor Model is introduced to depict the implicit interest. Moreover, Kernel Density Estimation and friend-based Collaborative Filtering are leveraged to model user’s geographic allocation and social correlation respectively. Thus, we propose CoSoLoRec, a fusion framework, to ameliorate the recommendation. Experiments on two real-word datasets show the superiority of our approach over the state-of-the-art methods.

Hao Guo, Xin Li, Ming He, Xiangyu Zhao, Guiquan Liu, Guandong Xu

Evidential Item-Based Collaborative Filtering

Recommender Systems (RSs) in particular the collaborative filtering approaches have reached a high level of popularity. These approaches are designed for predicting the user’s future interests towards unrated items. However, the provided predictions should be taken with restraint because of the uncertainty pervading the real-world problems. Indeed, to not give consideration to such uncertainty may lead to unrepresentative results which can deeply affect the predictions’ accuracy as well as the user’s confidence towards the RS. In order to tackle this issue, we propose in this paper a new evidential item-based collaborative filtering approach. In our approach, we involve the belief function theory tools as well as the Evidential K-Nearest Neighbors (EKNN) classifier to deal with the uncertain aspect of items’ recommendation ignored by the classical methods. The performance of our new recommendation approach is proved through a comparative evaluation with several traditional collaborative filtering recommenders.

Raoua Abdelkhalek, Imen Boukhris, Zied Elouedi

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Clustering and Classification

Frontmatter

BOWL: Bag of Word Clusters Text Representation Using Word Embeddings

Clustering Categorical Sequences with Variable-Length Tuples Representation

Cellular Automata Based on Occlusion Relationship for Saliency Detection

Text Mining and Lexical Analysis

Frontmatter

A Practical Method of Identifying Chinese Metaphor Phrases from Corpus

Knowledge Extraction from Chinese Records of Cyber Attacks Based on a Semantic Grammar

Increasing Topic Coherence by Aggregating Topic Models

Learning Chinese-Japanese Bilingual Word Embedding by Using Common Characters

Content and Document Analysis

Frontmatter

Analyzing Topic-Sentiment and Topic Evolution over Time from Social Media

An Unsupervised Framework Towards Sci-Tech Compound Entity Recognition

Identifying Helpful Online Reviews with Word Embedding Features

Facial Texture Analysis for Recognition of Human Gender

Enterprise Knowledge

Frontmatter

Quantitative Analysis Academic Evaluation Based on Attenuation-Mechanism

When IT Leveraging Competence Meets Uncertainty and Complexity with Social Capital in New Product Development

Adoption Factors for Crowdsourcing Based Medical Information Platforms

Crowd Label Aggregation Under a Belief Function Framework

Formal Semantics and Fuzzy Logic

Frontmatter

Weighted Node Importance Contribution Correlation Matrix for Identifying China’s Core Metro Technologies with Patent Network Analysis

Situation-Aware Rating Prediction Using Fuzzy Rules

Adaptive Conceding Strategies for Negotiating Agents Based on Interval Type-2 Fuzzy Logic

Semiotic Rules Generation and Inferences Reasoning for Movie Documents

Knowledge Engineering

Frontmatter

Generic Model for Adaptable Caching in the Knowledge-Oriented Web Engineering

Automatic Construction of Generalization Hierarchies for Publishing Anonymized Data

Schema-Based Query Rewriting in SPARQL

Knowledge Enrichment and Visualization

Frontmatter

Finding the Optimal Users to Mention in the Appropriate Time on Twitter

Extracting Knowledge from Web Tables Based on DOM Tree Similarity

Single Image Dehazing Using Hölder Coefficient

Transfer Learning Based on A+ for Image Super-Resolution

Intuitive Knowledge Connectivity: Design and Prototyping of Cross-Platform Knowledge Networks

Knowledge Management

Frontmatter

Intellectual Capital and Boundary-Crossing Management Knowledge

KPD: An Investigation into the Usability of Knowledge Portal in DMAIC Knowledge Management

Knowledge Management and Intellectual Capital in the Logistics Service Industry

Knowledge Retrieval

Frontmatter

Digitalizing Seismograms Using a Neighborhood Backtracking Method

A Document Modeling Method Based on Deep Generative Model and Spectral Hashing

Best Guided Backtracking Search Algorithm for Numerical Optimization Problems

Domain Specific Cross-Lingual Knowledge Linking Based on Similarity Flooding

LSSL-SSD: Social Spammer Detection with Laplacian Score and Semi-supervised Learning

Knowledge Systems and Security

Frontmatter

i-Shield: A System to Protect the Security of Your Smartphone

Rule Management in Clinical Decision Support Expert System for the Alzheimer Disease

Stability Analysis of Switched Systems

Implicit and Explicit Trust in Collaborative Filtering

Neural Networks and Artificial Intelligence

Frontmatter

A Subset Space Perspective on Agents Cooperating for Knowledge

Context-Aware Tree-Based Convolutional Neural Networks for Natural Language Inference

Learning Embeddings of API Tokens to Facilitate Deep Learning Based Program Processing

Closed-Loop Product Lifecycle Management Based on a Multi-agent System for Decision Making in Collaborative Design

Ontologies

Frontmatter

A Benchmark for Ontologies Merging Assessment

TRSO: A Tourism Recommender System Based on Ontology

Generic Ontology Design Patterns: Qualitatively Graded Configuration

Recommendation Algorithms and Systems

Frontmatter

CUT: A Combined Approach for Tag Recommendation in Software Information Sites

CoSoLoRec: Joint Factor Model with Content, Social, Location for Heterogeneous Point-of-Interest Recommendation

Evidential Item-Based Collaborative Filtering

Backmatter