Skip to main content

Über dieses Buch

This book constitutes the proceedings of the 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019, held in Kazan, Russia, in July 2019.

The 24 full papers and 10 short papers were carefully reviewed and selected from 134 submissions (of which 21 papers were rejected without being reviewed). The papers are organized in topical sections on general topics of data analysis; natural language processing; social network analysis; analysis of images and video; optimization problems on graphs and network structures; analysis of dynamic behaviour through event data.



General Topics of Data Analysis


Optimizing Q-Learning with K-FAC Algorithm

In this work, we present intermediate results of the application of Kronecker-factored Approximate curvature (K-FAC) algorithm to Q-learning problem. Being more expensive to compute than plain stochastic gradient descent, K-FAC allows the agent to converge a bit faster in terms of epochs compared to Adam on simple reinforcement learning tasks and tend to be more stable and less strict to hyperparameters selection. Considering the latest results we show that DDQN with K-FAC learns more quickly than with other optimizers and improves constantly in contradiction to similar with Adam or RMSProp.

Roman Beltiukov

Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Most decision tree induction algorithms are based on a greedy top-down recursive partitioning strategy for tree growth. In this paper, we propose several methods for induction of decision trees and their ensembles based on evolutionary algorithms. The main difference of our approach is using real-valued vector representation of decision tree that allows to use a large number of different optimization algorithms, as well as optimize the whole tree or ensemble for avoiding local optima. Differential evolution and evolution strategies were chosen as optimization algorithms, as they have good results in reinforcement learning problems. We test the predictive performance of this methods using several public UCI data sets, and the proposed methods show better quality than classical methods.

Evgeny Dolotov, Nikolai Zolotykh

An Algorithm for Constructing a Topological Skeleton for Semi-structured Spatial Data Based on Persistent Homology

The construction and use of a topological skeleton for processing semi-structured information based on persistent homology methods is considered in the article. In the work, the main topological feature for analysis of object is a hole. The application of the developed algorithm to solve the actual problem of geoinformatics in the matching of spatial objects at different scales of map is shown. Comparison of topological skeletons at different tree depths is demonstrated.

Sergey Eremeev, Semyon Romanov

Comparison of Machine Learning Algorithms in Restaurant Revenue Prediction

In this paper, we address several aspects of applying classical machine learning algorithms to a regression problem. We compare the predictive power to validate our approach on a data about revenue of a large Russian restaurant chain. We pay special attention to solve two problems: data heterogeneity and a high number of correlated features. We describe methods for considering heterogeneity—observations weighting and estimating models on subsamples. We define a weighting function via Mahalanobis distance in the space of features and show its predictive properties on following methods: ordinary least squares regression, elastic net, support vector regression, and random forest.

Stepan Gogolev, Evgeniy M. Ozhegov

Online Augmentation for Quality Improvement of Neural Networks for Classification of Single-Channel Electrocardiograms

Currently, on the market, there are mobile devices that are capable of reading a person’s single-lead electrocardiogram (ECG). These ECGs can be used to solve problems of determining various diseases. Neural networks are onearameters of augmentations of the approaches to solving such problems. In this paper, the usage of online augmentation during the training of neural networks was proposed to improve the quality of the ECGs classification. The possibility of using various types of online augmentations was explored. The most promising methods were highlighted. Experimental studies showed that the quality of the classification was improved for various tasks and various neural network architectures.

Valeriia Guryanova

Guided Layer-Wise Learning for Deep Models Using Side Information

Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layer-wise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included in the maximum likelihood estimation as a prior. Thus, DR includes class label information for greedy pretraining of deep belief networks which result in a better weight initialization for fine-tuning methods. On the other hand, for discriminative training of deep neural networks, DR is defined as a distance over the features and included in the learning objective. With our experimental tests, we show that DR can help the backpropagation to cope with vanishing gradient problems and to provide faster convergence and smaller generalization errors.

Pavel Sulimov, Elena Sukmanova, Roman Chereshnev, Attila Kertész-Farkas

Multi-instance Learning for Structure-Activity Modeling for Molecular Properties

In this paper, the approach of multi-instance learning is used for modeling the biological properties of molecules. We have proposed two approaches for the implementation of multi-instance learning. Both approaches are based on the idea of representing the features describing the molecule as a one vector, which is produced from different representations (instances) of the molecule. Models based on the approach of multi-instance learning were compared with classical modeling methods. Also, it is shown that in some cases, the approach of multi-instance learning allows to achieve greater accuracy in predicting the properties of molecules.

Dmitry V. Zankov, Maxim D. Shevelev, Alexandra V. Nikonenko, Pavel G. Polishchuk, Asima I. Rakhimbekova, Timur I. Madzhidov

Cross-Efficiency of International Sanctions: Application of Data Envelopment Analysis and Network Methodology

In this paper we provide the methodology for evaluating effectiveness of international sanctions using Data Envelopment Analysis (DEA), which we use for generating the network matrix for further analysis. DEA is a non-parametric technique used to compare performance of similar units, such as departments or organizations. DEA has wide applications in all industries, and has been successfully used to compare performance of hospitals, banks, universities, etc. The most important advantage of this technique is that it can handle multiple input and output variables, even those not generally comparable to each other. We use the “Threat and Imposition of Sanctions (TIES)” Data 4.0 for analysis. This database contains the largest number of cases of international sanctions (1412 from the years 1945–2005) imposed by some countries on others, takes into account simultaneous sanction imposition, and also estimates the cost of all sanctions - both for those who receive and those who impose them. As input variables for DEA model we use the impact of sender commitment, anticipated target and sender economic costs, and actual target and sender economic costs. As the output variable, we use the outcome of sanctions for senders. We describe how to use DEA cross-efficiency outputs to build the network of sanction episodes. Our proposed combination of DEA and network methodology allows us to cluster sanction episodes depending on their outcomes, and provides explanations of higher efficiency of one group of sanction episodes over the others.

Dmitry Zaytsev, Valentina Kuskova, Polina Lushnikova, Gregory Khvatsky

Natural Language Processing


Using Semantic Information for Coreference Resolution with Neural Networks in Russian

This paper describes an experiment aimed at improving the quality of coreference resolution for Russian by combining one of the most recent developments in the field, employment of neural networks, with benefits of using semantic information. The task of coreference resolution has been the target of intensive research, and the interest at using neural networks, successfully tested in other tasks of natural language processing, has been gradually growing. The role that semantic information plays for the task of coreference resolution has been recognized by researchers, but the impact of semantic features on the performance of neural networks has not been yet described in detail. Here we describe the process of integrating features derived from open-source semantic information into the coreference resolution model based on a neural network, and evaluate its performance in comparison with the base model. The obtained results demonstrate quality on par with state-of-the-art systems, which serves to re-establish the importance of semantic features in coreference resolution, as well as the applicability of neural networks for the task.

Ilya Azerkovich

A Method of Semantic Change Detection Using Diachronic Corpora Data

The article proposes a method for detecting semantic change using diachronic corpora data. The method is based on the distributional hypothesis. The analysis is performed using frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram. To obtain the word co-occurrence profile in its new meaning, syntactic bigrams that contributed most to the word distribution change are selected and their time series are clustered. The method is tested on a group of English and Russian words which gained new meanings in the 20th century. The obtained results show that the proposed method allows one to detect semantics changes, as well as to determine the time of these changes.

Vladimir Bochkarev, Anna Shevlyakova, Valery Solovyev

Automated Approach to Rhythm Figures Search in English Text

Text rhythm is recognized as being one of the most important subject areas of modern linguistic studies. There is a considerable amount of literature on the analysis of rhythm in poetry and literary prose. However, few researchers have addressed the problem of using automated tools for rhythm analysis, whereas automated methods can be of great benefit to this cause, especially when the research is conducted on large text corpora. This paper presents a new automated approach to integrated search of rhythm figures in fiction including anaphora, epiphora, anadiplosis, symploce and simple repetition provided for by an original lexical tool designed within the framework of the research. The ad hoc experiments have proved this approach to be reliable and informative.

Elena Boychuk, Inna Vorontsova, Elena Shliakhtina, Ksenia Lagutina, Olga Belyaeva

Adapting the Graph2Vec Approach to Dependency Trees for NLP Tasks

In recent works on learning representations for graph structures, methods have been proposed both for the representation of nodes and edges for large graphs, and for representation of graphs as a whole. This paper considers the popular graph2vec approach, which shows quite good results for ordinary graphs. In the field of natural language processing, however, a graph structure called a dependency tree is often used to express the connections between words in a sentence. We show that the graph2vec approach applied to dependency trees is unsatisfactory, which is due to the WL Kernel. In this paper, an adaptation of this kernel for dependency trees has been proposed, as well as 3 other types of kernels that take into account the specific features of dependency trees. This new vector representation can be used in NLP tasks where it is important to model syntax (e.g. authorship attribution, intention labeling, targeted sentiment analysis etc.). Universal Dependencies treebanks were clustered to show the consistency and validity of the proposed tree representation methods.

Oleg Durandin, Alexey Malafeev

Recognition of Parts of Speech Using the Vector of Bigram Frequencies

This paper describes how to automatically recognize parts of speech and other grammatical categories of a word such as gender and number. Unlike some previous works, the vector of syntactic bigram frequencies (including the considered word) is used as the source data for recognition of parts of speech and the grammatical categories. Data on frequencies of syntactic bigrams were obtained from the Russian subcorpus of Google Books Ngram. We used part–of–speech tags available in Google Books Ngram, as well as data on parts of speech and grammatical categories of words obtained from the electronic dictionary Open Corpora. To train the model, we selected words from the list of 100.000 most frequent words that don’t have homonyms and are found in both Google Books Ngram and Open Corpora. A multilayer perceptron with an output layer of the softmax type was used as a recognizer. The vector of frequencies of syntactic bigrams including the test word and one of the 10.000 most frequent words was at the inputs of the network. The neural network was trained by the criterion of minimum cross–entropy. When recognizing parts of speech on the test sample, the average recognition accuracy was 99.1%. Nouns and verbs were recognized best of all (with the accuracy of 99.77% and 99.62%, respectively). The recognition accuracy of the word number was 99.61%. The achieved recognition accuracy of the word gender was substantially lower, it was just 91.9%.

Stanislav Khristoforov, Vladimir Bochkarev, Anna Shevlyakova

Formalization of Medical Records Using an Ontology: Patient Complaints

Medical records contain a textual description of such important information as patients’ complaints, diseases progress and therapy. An extraction of this information could allow starting with processing information stored in medical databases. In this article we introduce a short description of a medical ontology storing information on patients’ complaints. We also describe an algorithm that uses this ontology for extraction of claims from texts of medical records. The algorithm combines both syntactic properties, and peculiarities, of a text and connections between diseases’ properties and their values. The algorithm corrects syntactical mistakes according to the hierarchical information from the ontology. The resulting algorithm was proved on 3000 clinical records of Department of Neurosurgery of FEFU.

Eduard Klyshinsky, Valeriya V. Gribova, Carina Shakhgeldyan, Elena A. Shalfeeva, Dmitry B. Okun, Boris I. Geltser, Tatiana A. Gorbach, Olesia D. Karpik

A Deep Learning Method Study of User Interest Classification

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labelled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler feedforward networks. Classification accuracy was evaluated for different architectures, text representations and sets of miscellaneous parameters.

Alexey Malafeev, Kirill Nikolaev

Morpheme Segmentation for Russian: Evaluation of Convolutional Neural Network Models

This paper is aimed at evaluating the performance of existing models of morphemic analysis for Russian based on convolutional neural networks. The models were trained on a relatively small amount of annotated training data (38,368 words). We tuned the hyperparameters to accommodate the harder task setting, which helped improve the accuracy of the model. In addition to testing 15 different configurations on the available test set, a new sample of 800 words containing roots that are missing in the training sample (e.g. neologisms and recent loan words) was manually created and annotated for morphemic structure (the new dataset is made available to the community). The effectiveness of the models was evaluated on this sample, and it turned out that the performance of the CNN models was much worse on this set (an almost 30% drop in word accuracy). We performed a classification of errors made by the best model both on the standard test set and the new one.

Lyudmila Maltina, Alexey Malafeev

Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition

Named entity recognition is an important part of the Information Extraction process (extracting structured data from unstructured or semi-structured computer-readable documents). To highlight in the text of people, organizations, geographical locations, etc., many approaches are used. Although, well-known bidirectional LSTM neural networks, show good results, there are points for improvement. Usually, the word embedding vector are used as the input layer, but the main disadvantage of the last vector models (word2vec, GLOVe, FastText) is that they do not consider the context of documents.In this paper we present the effective neural network based on the deeply pre-trained bidirectional BERT model, which was introduced in the fall of 2018, in the task of named entity recognition for the Russian language. The BERT model, trained for a long time on large unannotated corpuses of texts, were used in our work in two modes: feature extraction and fine-tuning for the NER task. Evaluation of the results was carried out on the FactRuEval dataset and the BiLSTM network (FastText + CNN + extra) was taken as the baseline. Our model, built on fine-tuned deep contextual BERT model, shows good results.

Eugeny Mukhin

Expert Assessment of Synonymic Rows in RuWordNet

This article explores the principles of synsets in the RuWordNet thesaurus and synonyms in the classical dictionaries of Russian synonyms (N = 10) to identify discrepancies and improve the principles of organising synsets in RuWordNet. The relevance of the study is determined by the demand for WordNet resources in natural language processing tasks. The authors selected 102 RuWordNet thesaurus synsets, including nouns (N = 34), adjectives (N = 34) and verbs (N = 34). The meanings of the lexemes were correlated according to the data given in Russian language thesauri (N = 2). The comparative method and an independent expert assessment of RuWordNet revealed a number of discrepancies and inaccuracies in the representation of synsets concerning polysemy, hypo-hyperonymic relationships, lexical meanings of words and parts-of-speech synonymy. On the basis of this study, the authors recommend the elimination of individual shortcomings in the construction of the RuWord-Net synsets, in particular the polysemy and parts-of-speech synonymy.

Valery Solovyev, Gulnara Gimaletdinova, Liliia Khalitova, Liliya Usmanova

Text Mining for Evaluation of Candidates Based on Their CVs

The problem of CV (or resume) text mining becomes increasingly relevant nowadays as long as it could simplify the evaluation of future employees and their suitability for the post for which they apply. The paper proposes a procedure for automatic information extraction from text documents, namely from candidate’s CVs. The described algorithm is based on Natural Language Processing methods and allows to transform text information into categorical features or classes. These features may further be used as inputs for a machine learning model to predict the suitability of the candidate for the position. Besides the general method, the description of the experiments is given in which the algorithm was used for clusterization of future employees according to their previous position and job spheres they worked in. The obtained classes were used to predict the probability of the candidate’s turnover in the first six months. Their addition allowed to raise the model score.

Maria Tikhonova

Vec2graph: A Python Library for Visualizing Word Embeddings as Graphs

Visualization as a means of easy conveyance of ideas plays a key role in communicating linguistic theory through its applications. User-friendly NLP visualization tools allow researchers to get important insights for building, challenging, proving or rejecting their hypotheses. At the same time, visualizations provide general public with some understanding of what computational linguists investigate.In this paper, we present vec2graph: a ready-to-use Python 3 library visualizing vector representations (for example, word embeddings) as dynamic and interactive graphs. It is aimed at users with beginners’ knowledge of software development, and can be used to easily produce visualizations suitable for the Web. We describe key ideas behind vec2graph, its hyperparameters, and its integration into existing word embedding frameworks.

Nadezda Katricheva, Alyaxey Yaskevich, Anastasiya Lisitsina, Tamara Zhordaniya, Andrey Kutuzov, Elizaveta Kuzmenko

Social Network Analysis


How to Prevent Harmful Information Spreading in Social Networks Using Simulation Tools

The paper discusses the problems of preventing harmful information spreading in social Networks. Social networks are widespread nowadays and are used not only for managers and marketers propagation of advertising, promotion of goods, but also by attackers to spread harmful information. Thus, there is a need to counter the attackers. This paper presents simulation tools and several features that contribute to the successful application for modeling social networks and examine different strategies preventing rumors and harmful information spreading. The authors cite an example of a simulation model for identifying intruders in a social network, software tools and the results of simulation experiments.

Ivan Dmitriev, Elena Zamyatina

Effect of Social Graph Structure on the Utilization Rate in a Flat Organization

The goal of our research is to investigate how the communication structure of an organization affects its performance. In the paper, we study a simulation model of a self-organizing team conducting scientific research. The key parameter of the model is the social graph of the organization, which defines the team creation process. For this model, we formally define the average utilization rate of the group. Under some natural condition, the utilization rate is a function of the social graph. Lower and upper bounds of this characteristic are established. The obtained result has evident practical meaning and policy implications for organization management.

Rostislav Yavorskiy, Tamara Voznesenskaya, Ilya Samonenko

Analysis of Images and Video


Discernment of Smoke in Image Frames

Smoke is a challenging object to detect because of its changing texture, color and shape etc. These features can be extracted with the help of learning algorithms like regression, SVM, decision tree, but they do not provide an optimum result when provided with the large dataset and comparatively, accuracy of the deep learning algorithm increases. The reason of the increase in the accuracy of the algorithm is that, it is trained on the provided dataset and with the increase in the number of input data, the extraction of the dominant features of the desired object will also increase, so that it can able define as well as the detect the similar object. For the detection of the smoke, convolutional neural network is used, which take an image as an input. Transfer learning is used in the algorithm, in which VGG-19 network is used but it does not provide the satisfying results, so it is modified by introducing batch normalization layers in the network. The batch normalization increases the converges rate of the network. The accuracy increases by 3% when the number of epochs increase from 10 to 50.

Gurgeet Singh Bhogal, Anil Kumar Rawat

Face Recognition Using DELF Feature Descriptors on RGB-D Data

Face recognition is a very challenging and important task in many areas. This sort of algorithms is used in security systems, for person authorization tasks, person reidentification, etc. In face recognition speed and quality are very important, but in this research, we concentrate on quality improvement. In this paper we propose a new solution for face recognition using RGB-D data obtained with Kinect sensor. The proposed solution is based on a new type of feature descriptors – Deep Learning Features (DELF), which showed high classification results on Google landmarks dataset. We compared DELF descriptors with HOG, MSER and SURF descriptors using two classification schemes: error correcting output codes (ECOC) and decision trees. Conducted experiments showed an improvement in classification quality using F1 score metric when face recognition is based on DELF descriptors and ECOC classifier.

Andrey Kuznetsov

Efficient Information Support of the Automatic Process and Production Control System

The paper describes a method of efficient specialized information support for the automatic process control system by saving only information about key elements of an image with a complex structure. When carrying out studies, the authors introduced a concept of the image with a complex structure, including an object under study and a set of elements affecting its uniformity. To describe the image, the authors introduced a structural unit of information with regard to the object under study, including a brightness class and brightness histogram, an initial point and description of the border; for elements inside the object under study: an initial point of the area of every element and a description of an element border. To implement a structural unit of information, the authors developed a functional diagram of software, consisting of two blocks: a block forming a mathematical description of the image to be stored in the corporate data warehouse; and a block using a mathematical description for an expert evaluation subject to requirements for a recovery of an initial image.

Oksana Logunova, Ivan Bagaev, Daria Arefeva, Evgeniy Garbar

An Ensemble of Learning Machine Models for Plant Recognition

Plant recognition is an important problem and can be performed manually by specialists, but in this case, a lot of time is required and therefore is low-efficiency. Thus, automatic plant recognition is an important area of research. In this paper, we propose an ensemble of models to increase the performance of plant recognition. The ensemble of models presents a composite model which has two-level architecture. At the first level of the stacking model, convolutional neural networks are used, which demonstrate high performance in solving problems of object recognition. At the second level, gradient boosting methods are used. The model is taught using an open database of plant images containing 12 different species. Experiments with a plant dataset show that the proposed model is significantly better than other classification methods. High classification accuracy makes the model very useful for supporting the plant recognition system for working in real conditions.

Vladimir Mokeev

Hand Gestures Detection, Tracking and Classification Using Convolutional Neural Network

The article describes a software pipeline for detecting, tracking and classification of static hand gestures of the Russian Sign Language in a video stream using computer vision and deep learning techniques. The dataset used for this task is original, includes 10 classes and consists of more than 2000 unique images. The solution includes a hand detection module that uses a color mask, a gesture tracking module, a static gestures classification module in the detected region of the image based on convolutional neural network, as well as an auxiliary image preprocessing module and dataset augmentation module.

Oleg Potkin, Andrey Philippovich

Synthesizing Data Using Variational Autoencoders for Handling Class Imbalanced Deep Learning

This paper addresses the complex problem of learning from unbalanced datasets due to which traditional algorithms may perform poorly. Classification algorithms used for learning tend to favor the larger, less important classes in such problems. In this work, to handle unbalanced data problem, we synthesize data using variational autoencoders (VAE) on raw training samples and then, use various input sources (raw, combination of raw and synthetic) to train different models. We evaluate our method using multiple criteria on SVHN dataset which consists of complex images, and perform a comprehensive comparative analysis of popular CNN architectures when there is balanced and unbalanced data and determine which operates best in class imbalance problem. We found that data synthesis via VAE is reliable and robust, and can help to classify real data with higher accuracy than traditional (unbalanced) data. Our results demonstrate the strength of using VAE to solve the class imbalance problem.

Taimoor Shakeel Sheikh, Adil Khan, Muhammad Fahim, Muhammad Ahmad

New Approach for Fast Residual Strain Estimation Through Rational 2D Diffraction Pattern Processing

Highly brilliant synchrotron X-ray beams enable fast non-destructive evaluation of residual strains even in large and thick engineering objects under operando conditions. Large datasets of 2D patterns are acquired at high rate, meaning that near real time evaluation demands fast data processing and mapping of residual strain. The somewhat time-consuming traditional data processing algorithms involve the conversion of each 2D X-ray diffraction pattern within a substantial stack to 1D intensity plots by radial-azimuth sectoral binning (“caking”), followed by peak fitting to determine peak center positions. A new approach was realized and programmed as the open-source code to perform residual strain evaluation by direct ‘polar transformation’ of 2D X-ray diffraction patterns. We compare residual strain calculations by ‘polar transformation’ and ‘caking’ respectively for an Aluminum alloy bar containing a Friction Stir Weld and subjected to ex-situ three-point bending. ‘Polar transformation’ method shows good agreement with the traditional ‘caking’ technique, but also exhibits excellent speed, and robustness.

Eugene S. Statnik, Fatih Uzun, Alexei I. Salimon, Alexander M. Korsunsky

Fast Identification of Fingerprint

The paper discusses a method of fast identification of fingerprints based on templates. The method is based on the properties of local neighborhoods of minutiae and consists of two stages. At the first stage, the neighborhoods of minutiae are compared independently from the request and reference templates. For each pair of minutiae, their similarity is estimated. To improve performance rating classes are introduced. They filter out unlikely pairs of minutiae. Some of the best ratings for each pair of minutiae of request and reference templates are accumulated in the histogram. As a result of the first stage, a histogram is constructed for the entire database. Based on the histogram, some of the most similar pairs of templates are selected. At the second stage, the already selected pairs of templates are compared additionally, taking into account the consolidation of minutiae. A significant increase in identification performance, in contrast to other publications, is achieved by eliminating false candidates in the first stage.

Vladimir Gudkov

Training Effective Model for Real-Time Detection of NSFW Photos and Drawings

Convolutional Neural Networks (CNN) show state of the art results on variety of tasks. The paper presents the scheme how to prepare highly accurate (97% on the test set) and fast CNN for detection not suitable or safe for work (NSFW) images. The present research focuses on investigating questions concerning identifying NSFW pictures with nudity by neural networks. One of the main features of the present work is considering the NSFW class of images not only in terms of natural human nudity but also include cartoons and other drawn pictures containing obscene images of the primary sexual characteristics. Another important considered issue is collecting representative dataset for the problem. The research includes the review of existing nudity detection methods, which are provided by traditional machine learning techniques and quite new neural networks based approaches. In addition, several important problems in NSFW pictures filtering are considered in the study.

Dmirty Zhelonkin, Nikolay Karpov

Optimization Problems on Graphs and Network Structures


Prediction of New Itinerary Markets for Airlines via Network Embedding

Network planning is one of the most significant problems that airlines solve every day. Currently, airlines utilise traveller decision choice modelling, which has certain drawbacks. It analyses each market independently, which does not consider the entire Airline network information with its dynamic structure formation based on competition factors.In the paper, we show that Airline network structure provides an accurate prediction for the current network and for future lines. We compare several approaches for Airline network link prediction via structural network embeddings, which are interpreted as new itinerary markets estimation.

Dmitrii Kiselev, Ilya Makarov

Developing a Network-Based Method for Building Associative Series of Hashtags

Nowadays hashtags are an important mechanism of semantic navigation in social media. In the current research we considered the problem of building associative series of hashtags for Instagram. These series should meet two criteria: they should be relatively short and should not have wide semantic gaps between sequential hashtags. A generator of such series could be used for creating recommendations for increasing the quantity of hashtags in messages.We gathered information from approximately 15 million messages from Instagram and built a co-occurrence network for hashtags from these messages. To build associative series we defined a formal definition of the semantic path building problem as a multicriteria optimization problem on the co-occurrence network.We developed a combined optimization function for both criteria from the semantic path building problem. For measuring semantic similarity between hashtags, we use a metric based on vector embedding of hashtags by word2vec algorithm. Using empirical paths built by different algorithms, we graduated the parameters of the combined optimization function. The function with these parameters could be used in Dijkstra’s algorithm and in a specialized greedy algorithm for building semantic paths which look as appropriate associative series.

Sergey Makrushin, Nikita Blokhin

Analysis of Dynamic Behavior Through Event Data


Method of Determining User Preferences for the Personalized Recommender Systems for Public Transport Passengers

The question of creating a navigation recommender system based on user preferences arose with the development of recommender systems. The paper presents the theoretical and algorithmic aspects of making a personalized recommender system (mobile service) designed for public transport users. The main focus is to identify and formalize the concept of “user preferences”, which is based on modern personalized recommender systems. Informal (verbal) and formal (mathematical) formulations of the corresponding problems of determining “user preferences” in a specific spatial-temporal context are presented: the preferred stops definition and the preferred “transport correspondences” definition. The first task can be represented as a classification problem. Thus, it represented using well-known pattern recognition and machine learning methods. In this paper, we use an approach based on the estimation algorithm proposed by Yu.I. Zhuravlev and nonparametric estimation of Parzen probability density. The second task is to find estimates for a series of conditional distributions. The experiments were conducted on data from the mobile application “Pribyvalka-63”. The application is a part of the service, currently used to inform Samara residents about the public transport movement.

Aleksandr A. Borodinov, Vladislav V. Myasnikov


Weitere Informationen

Premium Partner