Skip to main content

2015 | Buch

Machine Learning and Knowledge Discovery in Databases

European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part II

herausgegeben von: Annalisa Appice, Pedro Pereira Rodrigues, Vítor Santos Costa, João Gama, Alípio Jorge, Carlos Soares

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The three volume set LNAI 9284, 9285, and 9286 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2015, held in Porto, Portugal, in September 2015. The 131 papers presented in these proceedings were carefully reviewed and selected from a total of 483 submissions. These include 89 research papers, 11 industrial papers, 14 nectar papers, 17 demo papers. They were organized in topical sections named: classification, regression and supervised learning; clustering and unsupervised learning; data preprocessing; data streams and online learning; deep learning; distance and metric learning; large scale learning and big data; matrix and tensor analysis; pattern and sequence mining; preference learning and label ranking; probabilistic, statistical, and graphical approaches; rich data; and social and graphs. Part III is structured in industrial track, nectar track, and demo track.

Inhaltsverzeichnis

Frontmatter

Matrix and Tensor Analysis

Frontmatter
BoostMF: Boosted Matrix Factorisation for Collaborative Ranking

Personalised recommender systems are widely used information filtering for information retrieval, where matrix factorisation (MF) has become popular as a model-based approach to personalised recommendation. Classical MF methods, which directly approximate low rank factor matrices by minimising some rating prediction criteria, do not achieve a satisfiable performance for the task of top-N recommendation. In this paper, we propose a novel MF method, namely BoostMF, that formulates factorisation as a learning problem and integrates boosting into factorisation. Rather than using boosting as a wrapper, BoostMF directly learns latent factors that are optimised toward the top-N recommendation. The proposed method is evaluated against a set of state-of-the-art methods on three popular public benchmark datasets. The experimental results demonstrate that the proposed method achieves significant improvement over these baseline methods for the task of top-N recommendation.

Nipa Chowdhury, Xiongcai Cai, Cheng Luo
Convex Factorization Machines

Factorization machines are a generic framework which allows to mimic many factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfortunately, factorization machines involve a non-convex optimization problem and are thus subject to bad local minima. In this paper, we propose a convex formulation of factorization machines based on the nuclear norm. Our formulation imposes fewer restrictions on the learned model and is thus more general than the original formulation. To solve the corresponding optimization problem, we present an efficient globally-convergent two-block coordinate descent algorithm. Empirically, we demonstrate that our approach achieves comparable or better predictive accuracy than the original factorization machines on 4 recommendation tasks and scales to datasets with 10 million samples.

Mathieu Blondel, Akinori Fujino, Naonori Ueda
Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks

Matrix factorizations are a popular tool to mine regularities from data. There are many ways to interpret the factorizations, but one particularly suited for data mining utilizes the fact that a matrix product can be interpreted as a sum of rank-1 matrices. Then the factorization of a matrix becomes the task of finding a small number of rank-1 matrices, sum of which is a good representation of the original matrix. Seen this way, it becomes obvious that many problems in data mining can be expressed as matrix factorizations with correct definitions of what a rank-1 matrix and a sum of rank-1 matrices mean. This paper develops a unified theory, based on generalized outer product operators, that encompasses many pattern set mining tasks. The focus is on the computational aspects of the theory and studying the computational complexity and approximability of many problems related to generalized matrix factorizations. The results immediately apply to a large number of data mining problems, and hopefully allow generalizing future results and algorithms, as well.

Pauli Miettinen
Scalable Bayesian Non-negative Tensor Factorization for Massive Count Data

We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a “topic”. We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as

multiway

topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.

Changwei Hu, Piyush Rai, Changyou Chen, Matthew Harding, Lawrence Carin
A Practical Approach to Reduce the Learning Bias Under Covariate Shift

Covariate shift is a specific class of selection bias that arises when the marginal distributions of the input features X are different in the source and the target domains while the conditional distributions of the target Y given X are the same. A common technique to deal with this problem, called importance weighting, amounts to reweighting the training instances in order to make them resemble the test distribution. However this usually comes at the expense of a reduction of the effective sample size. In this paper, we show analytically that, while the unweighted model is globally more biased than the weighted one, it may locally be less biased on low importance instances. In view of this result, we then discuss a manner to optimally combine the weighted and the unweighted models in order to improve the predictive performance in the target domain. We conduct a series of experiments on synthetic and real-world data to demonstrate the efficiency of this approach.

Van-Tinh Tran, Alex Aussem
Hyperparameter Optimization with Factorized Multilayer Perceptrons

In machine learning, hyperparameter optimization is a challenging task that is usually approached by experienced practitioners or in a computationally expensive brute-force manner such as grid-search. Therefore, recent research proposes to use observed hyperparameter performance on already solved problems (i.e. data sets) in order to speed up the search for promising hyperparameter configurations in the sequential model based optimization framework.

In this paper, we propose multilayer perceptrons as surrogate models as they are able to model highly nonlinear hyperparameter response surfaces. However, since interactions of hyperparameters, data sets and metafeatures are only implicitly learned in the subsequent layers, we improve the performance of multilayer perceptrons by means of an explicit factorization of the interaction weights and call the resulting model a factorized multilayer perceptron. Additionally, we evaluate different ways of obtaining predictive uncertainty, which is a key ingredient for a decent tradeoff between exploration and exploitation. Our experimental results on two public meta data sets demonstrate the efficiency of our approach compared to a variety of published baselines. For reproduction purposes, we make our data sets and all the program code publicly available on our supplementary webpage.

Nicolas Schilling, Martin Wistuba, Lucas Drumond, Lars Schmidt-Thieme
Hyperparameter Search Space Pruning – A New Component for Sequential Model-Based Hyperparameter Optimization

The optimization of hyperparameters is often done manually or exhaustively but recent work has shown that automatic methods can optimize hyperparameters faster and even achieve better final performance. Sequential model-based optimization (SMBO) is the current state of the art framework for automatic hyperparameter optimization. Currently, it consists of three components: a surrogate model, an acquisition function and an initialization technique. We propose to add a fourth component, a way of pruning the hyperparameter search space which is a common way of accelerating the search in many domains but yet has not been applied to hyperparameter optimization. We propose to discard regions of the search space that are unlikely to contain better hyperparameter configurations by transferring knowledge from past experiments on other data sets as well as taking into account the evaluations already done on the current data set.

Pruning as a new component for SMBO is an orthogonal contribution but nevertheless we compare it to surrogate models that learn across data sets and extensively investigate the impact of pruning with and without initialization for various state of the art surrogate models. The experiments are conducted on two newly created meta-data sets which we make publicly available. One of these meta-data sets is created on 59 data sets using 19 different classifiers resulting in a total of about 1.3 million experiments. This is by more than four times larger than all the results collaboratively collected by OpenML.

Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme
Multi-Task Learning with Group-Specific Feature Space Sharing

When faced with learning a set of inter-related tasks from a limited amount of usable data, learning each task independently may lead to poor generalization performance. (MTL) exploits the latent relations between tasks and overcomes data scarcity limitations by co-learning all these tasks simultaneously to offer improved performance. We propose a novel Multi-Task Multiple Kernel Learning framework based on Support Vector Machines for binary classification tasks. By considering pair-wise task affinity in terms of similarity between a pair’s respective feature spaces, the new framework, compared to other similar MTL approaches, offers a high degree of flexibility in determining how similar feature spaces should be, as well as which pairs of tasks should share a common feature space in order to benefit overall performance. The associated optimization problem is solved via a block coordinate descent, which employs a consensus-form Alternating Direction Method of Multipliers algorithm to optimize the Multiple Kernel Learning weights and, hence, to determine task affinities. Empirical evaluation on seven data sets exhibits a statistically significant improvement of our framework’s results compared to the ones of several other Clustered Multi-Task Learning methods.

Niloofar Yousefi, Michael Georgiopoulos, Georgios C. Anagnostopoulos
Opening the Black Box: Revealing Interpretable Sequence Motifs in Kernel-Based Learning Algorithms

This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences—or

motifs

—truly underlying the machine’s predictions. The proposed framework views motifs as free parameters in a probabilistic model, which is solved through a global optimization approach. In contrast to prevalent approaches, the proposed method can discover even difficult, long motifs, and could be combined with any kernel-based learning algorithm that is based on an adequate sequence kernel. We show that, by using a discriminate kernel machine such as a support vector machine, the approach can reveal discriminative motifs underlying the kernel predictor. We demonstrate the efficacy of our approach through a series of experiments on synthetic and real data, including problems from handwritten digit recognition and a large-scale

human

splice site data set from the domain of computational biology.

Marina M.-C. Vidovic, Nico Görnitz, Klaus-Robert Müller, Gunnar Rätsch, Marius Kloft

Pattern and Sequence Mining

Frontmatter
Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

In pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are not (anti-)monotonic, which makes it difficult to generate patterns satisfying these constraints. In this paper we introduce the notion of projection-antimonotonicity and

$$\vartheta -\sum $$

o

$$\psi \iota \alpha $$

algorithm that allows efficient generation of the best patterns for some nonmonotonic constraints. In this paper we consider stability and

$$\Delta $$

-measure, which are nonmonotonic constraints, and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches.

Aleksey Buzmakov, Sergei O. Kuznetsov, Amedeo Napoli
Non-parametric Jensen-Shannon Divergence

Quantifying the difference between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form, and hence, before we can measure their divergence we first need to assume a distribution or perform estimation. For exploratory purposes this is unsatisfactory, as we want to explore the data, not our expectations. In this paper we study how to non-parametrically measure the divergence between two distributions. More in particular, we formalise the well-known Jensen-Shannon divergence using cumulative distribution functions. This allows us to calculate divergences directly and efficiently from data without the need for estimation. Moreover, empirical evaluation shows that our method performs very well in detecting differences between distributions, outperforming the state of the art in both statistical power and efficiency for a wide range of tasks.

Hoang-Vu Nguyen, Jilles Vreeken
Swap Randomization of Bases of Sequences for Mining Satellite Image Times Series

Swap randomization has been shown to be an effective technique for assessing the significance of data mining results such as Boolean matrices, frequent itemsets, correlations or clusterings. Basically, instead of applying statistical tests on selected attributes, the global structure of the actual dataset is taken into account by checking whether obtained results are likely or not to occur in randomized datasets whose column and row margins are equal to the ones of the actual dataset. In this paper, a swap randomization approach for bases of sequences is proposed with the aim of assessing sequential patterns extracted from Satellite Image Time Series (SITS). This assessment relies on the spatiotemporal locations of the extracted patterns. Using an entropy-based measure, the locations obtained on the actual dataset and a single swap randomized dataset are compared. The potential and generality of the proposed approach is evidenced by experiments on both optical and radar SITS.

Nicolas Méger, Christophe Rigotti, Catherine Pothier
The Difference and the Norm — Characterising Similarities and Differences Between Databases

Suppose we are given a set of databases, such as sales records over different branches. How can we characterise the differences and the norm between these datasets? That is, what are the patterns that characterise the general distribution, and what are those that are important to describe the individual datasets? We study how to discover these pattern sets simultaneously and without redundancy – automatically identifying those patterns that aid describing the overall distribution, as well as those pointing out those that are characteristic for specific databases. We define the problem in terms of the Minimum Description Length principle, and propose the

DiffNorm

algorithm to approximate the MDL-optimal summary directly from data. Empirical evaluation on synthetic and real-world data shows that

DiffNorm

efficiently discovers descriptions that accurately characterise the difference and the norm in easily understandable terms.

Kailash Budhathoki, Jilles Vreeken

Preference Learning and Label Ranking

Frontmatter
Dyad Ranking Using A Bilinear Plackett-Luce Model

Label ranking is a specific type of preference learning problem, namely the problem of learning a model that maps instances to rankings over a finite set of predefined alternatives. These alternatives are identified by their name or

label

while not being characterized in terms of any properties or features that could be potentially useful for learning. In this paper, we consider a generalization of the label ranking problem that we call

dyad ranking

. In dyad ranking, not only the instances but also the alternatives are represented in terms of attributes. For learning in the setting of dyad ranking, we propose an extension of an existing label ranking method based on the Plackett-Luce model, a statistical model for rank data. Moreover, we present first experimental results confirming the usefulness of the additional information provided by the feature description of alternatives.

Dirk Schäfer, Eyke Hüllermeier
Fast Training of Support Vector Machines for Survival Analysis

Survival analysis is a commonly used technique to identify important predictors of adverse events and develop guidelines for patient’s treatment in medical research. When applied to large amounts of patient data, efficient optimization routines become a necessity. We propose efficient training algorithms for three kinds of linear survival support vector machines: 1) ranking-based, 2) regression-based, and 3) combined ranking and regression. We perform optimization in the primal using truncated Newton optimization and use order statistic trees to lower computational costs of training. We employ the same optimization technique and extend it for non-linear models too. Our results demonstrate the superiority of our proposed optimization scheme over existing training algorithms, which fail due to their inherently high time and space complexities when applied to large datasets. We validate the proposed survival models on 6 real-world datasets, and show that pure ranking-based approaches outperform regression and hybrid models.

Sebastian Pölsterl, Nassir Navab, Amin Katouzian
Superset Learning Based on Generalized Loss Minimization

In standard supervised learning, each training instance is associated with an outcome from a corresponding output space (e.g., a class label in classification or a real number in regression). In the superset learning problem, the outcome is only characterized in terms of a superset—a subset of candidates that covers the true outcome but may also contain additional ones. Thus, superset learning can be seen as a specific type of weakly supervised learning, in which training examples are ambiguous. In this paper, we introduce a generic approach to superset learning, which is motivated by the idea of performing model identification and “data disambiguation” simultaneously. This idea is realized by means of a generalized risk minimization approach, using an extended loss function that compares precise predictions with set-valued observations. As an illustration, we instantiate our meta learning technique for the problem of label ranking, in which the output space consists of all permutations of a fixed set of items. The label ranking method thus obtained is compared to existing approaches tackling the same problem.

Eyke Hüllermeier, Weiwei Cheng

Probabilistic, Statistical, and Graphical Approaches

Frontmatter
Bayesian Modelling of the Temporal Aspects of Smart Home Activity with Circular Statistics

Typically, when analysing patterns of activity in a smart home environment, the daily patterns of activity are either ignored completely or summarised into a high-level “hour-of-day” feature that is then combined with sensor activities. However, when summarising the temporal nature of an activity into a coarse feature such as this, not only is information lost after discretisation, but also the strength of the periodicity of the action is ignored. We propose to model the temporal nature of activities using circular statistics, and in particular by performing Bayesian inference with Wrapped Normal

$$\mathcal {(WN)}$$

and

$$\mathcal {WN}$$

Mixture

$$\mathcal {(WNM)}$$

models. We firstly demonstrate the accuracy of inference on toy data using both Gibbs sampling and Expectation Propagation (EP), and then show the results of the inference on publicly available smart-home data. Such models can be useful for analysis or prediction in their own right, or can be readily combined with larger models incorporating multiple modalities of sensor activity.

Tom Diethe, Niall Twomey, Peter Flach
Message Scheduling Methods for Belief Propagation

Approximate inference in large and densely connected graphical models is a challenging but highly relevant problem. Belief propagation, as a method for performing approximate inference in loopy graphs, has shown empirical success in many applications. However, convergence of belief propagation can only be guaranteed for simple graphs. Whether belief propagation converges depends strongly on the applied message update scheme, and specialized schemes can be highly beneficial. Yet, residual belief propagation is the only established method utilizing this fact to improve convergence properties. In experiments, we observe that residual belief propagation fails to converge if local oscillations occur and the same sequence of messages is repeatedly updated. To overcome this issue, we propose two novel message update schemes. In the first scheme we add noise to oscillating messages. In the second scheme we apply weight decay to gradually reduce the influence of these messages and consequently enforce convergence. Furthermore, in contrast to previous work, we consider the correctness of the obtained marginals and observe significant performance improvements when applying the proposed message update schemes to various Ising models with binary random variables.

Christian Knoll, Michael Rath, Sebastian Tschiatschek, Franz Pernkopf
Output-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs

We introduce an adaptive output-sensitive Metropolis-Hastings algorithm for probabilistic models expressed as programs, Adaptive Lightweight Metropolis-Hastings (AdLMH). This algorithm extends Lightweight Metropolis-Hastings (LMH) by adjusting the probabilities of proposing random variables for modification to improve convergence of the program output. We show that AdLMH converges to the correct equilibrium distribution and compare convergence of AdLMH to that of LMH on several test problems to highlight different aspects of the adaptation scheme. We observe consistent improvement in convergence on the test problems.

David Tolpin, Jan-Willem van de Meent, Brooks Paige, Frank Wood
Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Real-world planning problems frequently involve mixtures of continuous and discrete state variables and actions, and are formulated in environments with an unknown number of objects. In recent years, probabilistic programming has emerged as a natural approach to capture and characterize such complex probability distributions with general-purpose inference methods. While it is known that a probabilistic programming language can be easily extended to represent Markov Decision Processes (MDPs) for planning tasks, solving such tasks is challenging. Building on related efforts in reinforcement learning, we introduce a conceptually simple but powerful planning algorithm for MDPs realized as a probabilistic program. This planner constructs approximations to the optimal policy by importance sampling, while exploiting the knowledge of the MDP model. In our empirical evaluations, we show that this approach has wide applicability on domains ranging from strictly discrete to strictly continuous to hybrid ones, handles intricacies such as unknown objects, and is argued to be competitive given its generality.

Davide Nitti, Vaishak Belle, Luc De Raedt
Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning

The need for feasible inference in Probabilistic Graphical Models (PGMs) has lead to tractable models like Sum-Product Networks (SPNs). Their highly expressive power and their ability to provide exact and tractable inference make them very attractive for several real world applications, from computer vision to NLP. Recently, great attention around SPNs has focused on structure learning, leading to different algorithms being able to learn both the network and its parameters from data. Here, we enhance one of the best structure learner,

LearnSPN

, aiming to improve both the structural quality of the learned networks and their achieved likelihoods. Our algorithmic variations are able to learn simpler, deeper and more robust networks. These results have been obtained by exploiting some insights in the building process done by

LearnSPN

, by hybridizing the network adopting tree-structured models as leaves, and by blending bagging estimations into mixture creation. We prove our claims by empirically evaluating the learned SPNs on several benchmark datasets against other competitive SPN and PGM structure learners.

Antonio Vergari, Nicola Di Mauro, Floriana Esposito
Sparse Bayesian Recurrent Neural Networks

Recurrent neural networks (RNNs) have recently gained renewed attention from the machine learning community as effective methods for modeling variable-length sequences. Language modeling, handwriting recognition, and speech recognition are only few of the application domains where RNN-based models have achieved the state-of-the-art performance currently reported in the literature. Typically, RNN architectures utilize simple linear, logistic, or softmax output layers to perform data modeling and prediction generation. In this work, for the first time in the literature, we consider using a

sparse Bayesian

regression or classification model as the output layer of RNNs, inspired from the automatic relevance determination (ARD) technique. The notion of ARD is to continually create new components while detecting when a component starts to overfit, where overfit manifests itself as a

precision hyperparameter posterior

tending to infinity. This way, our method manages to train sparse RNN models, where the number of effective (“active”) recurrently connected hidden units is selected in a data-driven fashion, as part of the model inference procedure. We develop efficient and scalable training algorithms for our model under the stochastic variational inference paradigm, and derive elegant predictive density expressions with computational costs comparable to conventional RNN formulations. We evaluate our approach considering its application to challenging tasks dealing with both regression and classification problems, and exhibit its favorable performance over the state-of-the-art.

Sotirios P. Chatzis
Structured Prediction of Sequences and Trees Using Infinite Contexts

Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an

unbounded

context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on three tasks: morphological parsing, syntactic parsing and part-of-speech tagging.

Ehsan Shareghi, Gholamreza Haffari, Trevor Cohn, Ann Nicholson
Temporally Coherent Role-Topic Models (TCRTM): Deinterlacing Overlapping Activity Patterns

The Temporally Coherent Role-Topic Model (TCRTM) is a probabilistic graphical model for analyzing overlapping, loosely temporally structured activities in heterogeneous populations. Such structure appears in many domains where activities have temporal coherence, but no strong ordering. For instance, editing a PowerPoint presentation may involve opening files, typing text, and downloading images. These events occur together in time, but without fixed ordering or duration. Further, several different activities may overlap – the user might check email while editing the presentation. Finally, the user population has subgroups; for example, managers, salespeople and engineers have different activity distributions. TCRTM automatically infers an appropriate set of roles and activity types, and segments users’ event streams into high-level activity instance descriptions. On two real-world datasets involving computer user monitoring and debit card transactions we show that TCRTM extracts semantically meaningful structure and improves hold-out perplexity score by a factor of five compared to standard models.

Evgeniy Bart, Bob Price, John Hanley
The Blind Leading the Blind: Network-Based Location Estimation Under Uncertainty

We propose a probabilistic method for inferring the geographical locations of linked objects, such as users in a social network. Unlike existing methods, our model does not assume that the exact locations of any subset of the linked objects, like neighbors in a social network, are known. The method efficiently leverages prior knowledge on the locations, resulting in high geolocation accuracies even if none of the locations are initially known. Experiments are conducted for three scenarios: geolocating users of a location-based social network, geotagging historical church records, and geotagging Flickr photos. In each experiment, the proposed method outperforms two state-of-the-art network-based methods. Furthermore, the last experiment shows that the method can be employed not only to network-based but also to content-based location estimation.

Eric Malmi, Arno Solin, Aristides Gionis
Weighted Rank Correlation: A Flexible Approach Based on Fuzzy Order Relations

Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall’s tau and Spearman’s rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called

scaled gamma

, is related to Goodman and Kruskal’s gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way. The usefulness of our class of weighted rank correlation measures is shown by means of experimental studies using both synthetic and real-world ranking data.

Sascha Henzgen, Eyke Hüllermeier

Rich Data

Frontmatter
Concurrent Inference of Topic Models and Distributed Vector Representations

Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architecture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generating distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisticated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.

Debakar Shamanta, Sheikh Motahar Naim, Parang Saraf, Naren Ramakrishnan, M. Shahriar Hossain
Differentially Private Analysis of Outliers

This paper presents an investigation of differentially private analysis of distance-based outliers. Outlier detection aims to identify instances that are apparently distant from other instances. Meanwhile, the objective of differential privacy is to conceal the presence (or absence) of any particular instance. Outlier detection and privacy protection are therefore intrinsically conflicting tasks. In this paper, we present differentially private queries for counting outliers that appear in a given subspace, instead of reporting the outliers detected. Our analysis of the global sensitivity of outlier counts reveals that regular global sensitivity-based methods can make the outputs too noisy, particularly when the dimensionality of the given subspace is high. Noting that the counts of outliers are typically expected to be small compared to the number of data, we introduce a mechanism based on the smooth upper bound of the local sensitivity. This study is the first trial to ensure differential privacy for distance-based outlier analysis. The experimentally obtained results show that our method achieves better utility than global sensitivity-based methods do.

Rina Okada, Kazuto Fukuchi, Jun Sakuma
Inferring Unusual Crowd Events from Mobile Phone Call Detail Records

The pervasiveness and availability of mobile phone data offer the opportunity of discovering usable knowledge about crowd behavior in urban environments. Cities can leverage such knowledge to provide better services (e.g., public transport planning, optimized resource allocation) and safer environment. Call Detail Record (CDR) data represents a practical data source to detect and monitor unusual events considering the high level of mobile phone penetration, compared with GPS equipped and open devices. In this paper, we propose a methodology that is able to detect unusual events from CDR data, which typically has low accuracy in terms of space and time resolution. Moreover, we introduce a concept of unusual event that involves a large amount of people who expose an unusual mobility behavior. Our careful consideration of the issues that come from coarse-grained CDR data ultimately leads to a completely general framework that can detect unusual crowd events from CDR data effectively and efficiently. Through extensive experiments on real-world CDR data for a large city in Africa, we demonstrate that our method can detect unusual events with 16% higher recall and over 10

$$\times $$

higher precision, compared to state-of-the-art methods. We implement a visual analytics prototype system to help end users analyze detected unusual crowd events to best suit different application scenarios. To the best of our knowledge, this is the first work on the detection of unusual events from CDR data with considerations of its temporal and spatial sparseness and distinction between user unusual activities and daily routines.

Yuxiao Dong, Fabio Pinelli, Yiannis Gkoufas, Zubair Nabi, Francesco Calabrese, Nitesh V. Chawla
Learning Pretopological Spaces for Lexical Taxonomy Acquisition

In this paper, we propose a new methodology for semi-supervised acquisition of lexical taxonomies. Our approach is based on the theory of pretopology that offers a powerful formalism to model semantic relations and transforms a list of terms into a structured term space by combining different discriminant criteria. In order to learn a parameterized pretopological space, we define the Learning Pretopological Spaces strategy based on genetic algorithms. In particular, rare but accurate pieces of knowledge are used to parameterize the different criteria defining the pretopological term space. Then, a structuring algorithm is used to transform the pretopological space into a lexical taxonomy. Results over three standard datasets evidence improved performances against state-of-the-art associative and pattern-based approaches.

Guillaume Cleuziou, Gaël Dias
Multidimensional Prediction Models When the Resolution Context Changes

Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may also be interested in predicting sales for all vegetables (higher up in the product hierarchy) for next Friday (lower down in the time dimension). While the domain and data are the same, the

operating context

is different. We explore several approaches for multidimensional data when predictions have to be made at different levels (or contexts) of aggregation. One method relies on the same resolution, another approach aggregates predictions bottom-up, a third approach disaggregates predictions top-down and a final technique corrects predictions using the relation between levels. We show how these strategies behave when the resolution context changes, using several machine learning techniques in four application domains.

Adolfo Martínez-Usó, José Hernández-Orallo
Semi-supervised Subspace Co-Projection for Multi-class Heterogeneous Domain Adaptation

Heterogeneous domain adaptation aims to exploit labeled training data from a source domain for learning prediction models in a target domain under the condition that the two domains have different input feature representation spaces. In this paper, we propose a novel semi-supervised subspace co-projection method to address multi-class heterogeneous domain adaptation. The proposed method projects the instances of the two domains into a co-located latent subspace to bridge the feature divergence gap across domains, while simultaneously training prediction models in the co-projected representation space with labeled training instances from both domains. It also exploits the unlabeled data to promote the consistency of co-projected subspaces from the two domains based on a maximum mean discrepancy criterion. Moreover, to increase the stability and discriminative informativeness of the subspace co-projection, we further exploit the error-correcting output code schemes to incorporate more binary prediction tasks shared across domains into the learning process. We formulate this semi-supervised learning process as a non-convex joint minimization problem and develop an alternating optimization algorithm to solve it. To investigate the empirical performance of the proposed approach, we conduct experiments on cross-lingual text classification and cross-domain digit image classification tasks with heterogeneous feature spaces. The experimental results demonstrate the efficacy of the proposed method on these heterogeneous domain adaptation problems.

Min Xiao, Yuhong Guo
Towards Computation of Novel Ideas from Corpora of Scientific Text

In this work we present a method for the computation of novel ‘ideas’ from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of

problem

and

solution

phrases via a target-word matching approach. By defining an idea as a co-occurring

$$<$$

problem

,

solution

$$>$$

pair,

known-idea

triples can be constructed through the additional assignment of a relevance value (computed via either phrase co-occurrence or an ‘idea frequency-inverse document frequency’ score). The resulting triples are then fed into a collaborative filtering algorithm, where problem-phrases are considered as

users

and solution-phrases as the

items

to be recommended. The final output is a ranked list of novel idea candidates, which hold potential for researchers to integrate into their hypothesis generation processes. This approach is evaluated using a subset of publications from the journal

Science

, with precision, recall and F-Measure results for a variety of model parametrizations indicating that the system is capable of generating useful novel ideas in an automated fashion.

Haixia Liu, James Goulding, Tim Brailsford

Social and Graphs

Frontmatter
Discovering Audience Groups and Group-Specific Influencers

Recently, user influence in social networks has been studied extensively. Many applications related to social influence depend on quantifying influence and finding the most influential users of a social network. Most existing work studies the global influence of users, i.e. the aggregated influence that a user has on the entire network. It is often overlooked that users may be significantly more influential to some audience groups than others. In this paper, we propose

AudClus

, a method to detect

audience groups

and identify

group-specific influencers

simultaneously. With extensive experiments on real data, we show that AudClus is effective in both the task of detecting audience groups and the task of identifying influencers of audience groups. We further show that AudClus makes possible for insightful observations on the relation between audience groups and influencers. The proposed method leads to various applications in areas such as viral marketing, expert finding, and data visualization.

Shuyang Lin, Qingbo Hu, Jingyuan Zhang, Philip S. Yu
Estimating Potential Customers Anywhere and Anytime Based on Location-Based Social Networks

Acquiring the knowledge about the volume of customers for places and time of interest has several benefits such as determining the locations of new retail stores and planning advertising strategies. This paper aims to estimate the

number of potential customers

of arbitrary query locations and any time of interest in modern urban areas. Our idea is to consider existing established stores as a kind of

sensors

because the near-by human activities of the retail stores characterize the geographical properties, mobility patterns, and social behaviors of the target customers. To tackle the task based on store sensors, we develop a method called

Potential Customer Estimator

(PCE), which models the spatial and temporal correlation between existing stores and query locations using geographical, mobility, and features on location-based social networks. Experiments conducted on NYC Foursquare and Gowalla data, with three popular retail stores, Starbucks, McDonald’s, and Dunkin’ Donuts exhibit superior results over state-of-the-art approaches.

Hsun-Ping Hsieh, Cheng-Te Li, Shou-De Lin
Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

This paper studies precision matrix estimation for multiple related Gaussian graphical models from a dataset consisting of different classes, based upon the formulation of this problem as group graphical lasso. In particular, this paper proposes a novel hybrid covariance thresholding algorithm that can effectively identify zero entries in the precision matrices and split a large joint graphical lasso problem into many small subproblems. Our hybrid covariance thresholding method is superior to existing uniform thresholding methods in that our method can split the precision matrix of each individual class using different partition schemes and thus, split group graphical lasso into much smaller subproblems, each of which can be solved very fast. This paper also establishes necessary and sufficient conditions for our hybrid covariance thresholding algorithm. Experimental results on both synthetic and real data validate the superior performance of our thresholding method over the others.

Qingming Tang, Chao Yang, Jian Peng, Jinbo Xu
Fast Inbound Top-K Query for Random Walk with Restart

Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the

inbound top-k

(

Ink

) query. Given a query node

q

and a number

k

, the

Ink

query aims at retrieving

k

nodes in the graph that have the largest weighted RWR scores

to

q

.

Ink

queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the

Ink

query in large graphs. We propose two algorithms, namely

Squeeze

and

Ripple

, both of which can accurately answer the

Ink

query in a fast and incremental manner. To identify the top-

k

nodes,

Squeeze

iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph.

Ripple

employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the

vicinity

of

q

, the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded.

Ripple

incrementally expands the vicinity until the top-

k

result set can be obtained. Our extensive experiments on real-life graph data sets show that

Ink

queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

Chao Zhang, Shan Jiang, Yucheng Chen, Yidan Sun, Jiawei Han
Finding Community Topics and Membership in Graphs

Community detection in networks is a broad problem with many proposed solutions. Existing methods frequently make use of edge density and node attributes; however, the methods ultimately have different definitions of community and build strong assumptions about community features into their models. We propose a new method for community detection, which estimates both per-community feature distributions (topics) and per-node community membership. Communities are modeled as connected subgraphs with nodes sharing similar attributes. Nodes may join multiple communities and share common attributes with each. Communities have an associated probability distribution over attributes and node attributes are modeled as draws from a mixture distribution. We make two basic assumptions about community structure: communities are densely connected and have a small network diameter. These assumptions inform the estimation of community topics and membership assignments without being too prescriptive. We present competitive results against state-of-the-art methods for finding communities in networks constructed from NSF awards, the DBLP repository, and the Scratch online community.

Matt Revelle, Carlotta Domeniconi, Mack Sweeney, Aditya Johri
Finding Dense Subgraphs in Relational Graphs

This paper considers the problem of finding large dense subgraphs in relational graphs, i.e., a set of graphs which share a common vertex set. We present an approximation algorithm for finding the densest common subgraph in a relational graph set based on an extension of Charikar’s method for finding the densest subgraph in a single graph. We also present a simple greedy heuristic which can be implemented efficiently for analysis of larger graphs. We give graph dependent bounds on the quality of the solutions returned by our methods. Lastly, we show by empirical evaluation on several benchmark datasets that our method out-performs existing approaches.

Vinay Jethava, Niko Beerenwinkel
Generalized Modularity for Community Detection

Detecting the underlying community structure of networks is an important problem in complex network analysis. Modularity is a well-known quality function introduced by Newman, that measures how vertices in a community share more edges than what would be expected in a randomized network. However, this limited view on vertex similarity leads to limits in what can be resolved by modularity. To overcome these limitations, we propose a generalized modularity measure called GM which has a more sophisticated interpretation of vertex similarity. In particular, GM also takes into account the number of longer paths between vertices, compared to what would be expected in a randomized network. We also introduce a unified version of GM which detects communities of unipartite and (near-)bipartite networks without knowing the structure type in advance. Experiments on different synthetic and real data sets, demonstrate GM performs strongly in comparison to several existing approaches, particularly for small-world networks.

Mohadeseh Ganji, Abbas Seifi, Hosein Alizadeh, James Bailey, Peter J. Stuckey
Handling Oversampling in Dynamic Networks Using Link Prediction

Oversampling is a common characteristic of data representing dynamic networks. It introduces noise into representations of dynamic networks, but there has been little work so far to compensate for it. Oversampling can affect the quality of many important algorithmic problems on dynamic networks, including link prediction. Link prediction seeks to predict edges that will be added to the network given previous snapshots. We show that not only does oversampling affect the quality of link prediction, but that we can use link prediction to recover from the effects of oversampling. We also introduce a novel generative model of noise in dynamic networks that represents oversampling. We demonstrate the results of our approach on both synthetic and real-world data.

Benjamin Fish, Rajmonda S. Caceres
Hierarchical Sparse Dictionary Learning

Sparse coding plays a key role in high dimensional data analysis. One critical challenge of sparse coding is to design a dictionary that is both adaptive to the training data and generalizable to unseen data of same type. In this paper, we propose a novel dictionary learning method to build an adaptive dictionary regularized by an a-priori over-completed dictionary. This leads to a sparse structure of the learned dictionary over the a-priori dictionary, and a sparse structure of the data over the learned dictionary. We apply the hierarchical sparse dictionary learning approach on both synthetic data and real-world high-dimensional time series data. The experimental results demonstrate that the hierarchical sparse dictionary learning approach reduces overfitting and enhances the generalizability of the learned dictionary. Moreover, the learned dictionary is optimized to adapt to the given data and result in a more compact dictionary and a more robust sparse representation. The experimental results on real datasets demonstrate that the proposed approach can successfully characterize the heterogeneity of the given data, and leads to a better and more robust dictionary.

Xiao Bian, Xia Ning, Geoff Jiang
Latent Factors Meet Homophily in Diffusion Modelling

Diffusion is an important dynamics that helps spreading information within an online social network. While there are already numerous models for single item diffusion, few have studied diffusion of multiple items, especially when items can

interact

with one another due to their inter-similarity. Moreover, the well-known homophily effect is rarely considered explicitly in the existing diffusion models. This work therefore fills this gap by proposing a novel model called

Topic level Interaction Homophily Aware Diffusion

(TIHAD) to include both latent factor level interaction among items and homophily factor in diffusion. The model determines item interaction based on latent factors and edge strengths based on homophily factor in the computation of social influence. An algorithm for training TIHAD model is also proposed. Our experiments on synthetic and real datasets show that: (a) homophily increases diffusion significantly, and (b) item interaction at topic level boosts diffusion among similar items. A case study on hashtag diffusion in Twitter also shows that TIHAD outperforms the baseline model in the hashtag adoption prediction task.

Minh-Duc Luu, Ee-Peng Lim
Maintaining Sliding-Window Neighborhood Profiles in Interaction Networks

Large networks are being generated by applications that keep track of relationships between different data entities. Examples include online social networks recording interactions between individuals, sensor networks logging information exchanges between sensors, and more. There is a large body of literature on computing exact or approximate properties on large networks, although most methods assume static networks. On the other hand, in most modern real-world applications, networks are highly dynamic and continuous interactions along existing connections are generated. Furthermore, it is desirable to consider that old edges become less important, and their contribution to the current view of the network diminishes over time.

We study the problem of maintaining the neighborhood profile of each node in an

interaction network

. Maintaining such a profile has applications in modeling network evolution and monitoring the importance of the nodes of the network over time. We present an online streaming algorithm to maintain neighborhood profiles in the sliding-window model. The algorithm is highly scalable as it permits parallel processing and the computation is node centric, hence it scales easily to very large networks on a distributed system, like Apache Giraph. We present results from both serial and parallel implementations of the algorithm for different social networks. The summary of the graph is maintained such that query of any window length can be performed.

Rohit Kumar, Toon Calders, Aristides Gionis, Nikolaj Tatti
Response-Guided Community Detection: Application to Climate Index Discovery

Discovering climate indices–time series that summarize spatiotemporal climate patterns–is a key task in the climate science domain. In this work, we approach this task as a problem of

response-guided community detection

; that is, identifying communities in a graph associated with a response variable of interest. To this end, we propose a general strategy for response-guided community detection that explicitly incorporates information of the response variable during the community detection process, and introduce a graph representation of spatiotemporal data that leverages information from multiple variables.

We apply our proposed methodology to the discovery of climate indices associated with seasonal rainfall variability. Our results suggest that our methodology is able to capture the underlying patterns known to be associated with the response variable of interest and to improve its predictability compared to existing methodologies for data-driven climate index discovery and official forecasts.

Gonzalo A. Bello, Michael Angus, Navya Pedemane, Jitendra K. Harlalka, Fredrick H. M. Semazzi, Vipin Kumar, Nagiza F. Samatova
Robust Classification of Information Networks by Consistent Graph Learning

Graph regularization-based methods have achieved great success for network classification by making the label-link consistency assumption, i.e., if two nodes are linked together, they are likely to belong to the same class. However, in a real-world network, there exist links that connect nodes of different classes. These inconsistent links raise a big challenge for graph regularization and deteriorate the classification performance significantly. To address this problem, we propose a novel algorithm, namely

Consistent Graph Learning

, which is robust to the inconsistent links of a network. In particular, given a network and a small number of labeled nodes, we aim at learning a consistent network with more consistent and fewer inconsistent links than the original network. Since the link information of a network is naturally represented by a set of relation matrices, the learning of a consistent network is reduced to learning consistent relation matrices under some constraints. More specifically, we achieve it by joint graph regularization on the nuclear norm minimization of consistent relation matrices together with

$$\ell _1$$

-norm minimization on the difference matrices between the original relation matrices and the learned consistent ones subject to certain constraints. Experiments on both homogeneous and heterogeneous network datasets show that the proposed method outperforms the state-of-the-art methods.

Shi Zhi, Jiawei Han, Quanquan Gu
Backmatter
Metadaten
Titel
Machine Learning and Knowledge Discovery in Databases
herausgegeben von
Annalisa Appice
Pedro Pereira Rodrigues
Vítor Santos Costa
João Gama
Alípio Jorge
Carlos Soares
Copyright-Jahr
2015
Electronic ISBN
978-3-319-23525-7
Print ISBN
978-3-319-23524-0
DOI
https://doi.org/10.1007/978-3-319-23525-7

Premium Partner