main-content

## Über dieses Buch

This book constitutes refereed proceedings of the 19th International Conference on Mathematical Optimization Theory and Operations Research, MOTOR 2020, held in July 2020. Due to the COVID-19 pandemic the conference was held online.
The 25 full papers and 8 short papers presented in this volume were carefully reviewed and selected from a total of 102 submissions. The papers in the volume are organised according to the following topical headings: ​combinatorial optimization; mathematical programming; global optimization; game theory and mathematical economics; heuristics and metaheuristics; machine learning and data analysis.

## Inhaltsverzeichnis

### A 0.3622-Approximation Algorithm for the Maximum k-Edge-Colored Clustering Problem

In the Max k-Edge-Colored Clustering problem (abbreviated as MAX-k-EC) we are given an undirected graph and k colors. Each edge of the graph has a color and a nonnegative weight. The goal is to color the vertices so as to maximize the total weight of the edges whose colors coincide with the colors of their endpoints. The problem was introduced by Angel et al. [3]. In this paper we give a polynomial-time algorithm for MAX-k-EC with an approximation factor $$\frac{4225}{11664}\approx 0.3622$$ which significantly improves the best previously known approximation bound $$\frac{49}{144}\approx 0.3402$$ established by Alhamdan and Kononov [2].

Alexander Ageev, Alexander Kononov

### On the Proximity of the Optimal Values of the Multi-dimensional Knapsack Problem with and Without the Cardinality Constraint

We study the proximity of the optimal value of the m-dimensional knapsack problem to the optimal value of that problem with the additional restriction that only one type of items is allowed to include in the solution. We derive exact and asymptotic formulas for the precision of such approximation, i.e. for the infinum of the ratio of the optimal value for the objective functions of the problem with the cardinality constraint and without it. In particular, we prove that the precision tends to $$0.59136\dots /m$$ if $$n\rightarrow \infty$$ and m is fixed. Also, we give the class of the worst multi-dimensional knapsack problems for which the bound is attained. Previously, similar results were known only for the case $$m=1$$ .

Aleksandr Yu. Chirkov, Dmitry V. Gribanov, Nikolai Yu. Zolotykh

### An Approximation Algorithm for a Semi-supervised Graph Clustering Problem

Clustering problems form an important section of data analysis. In machine learning clustering problems are usually classified as unsupervised learning. Semi-supervised clustering problems are also considered. In these problems relatively few objects are labeled (i.e., are assigned to clusters), whereas a large number of objects are unlabeled.We consider the most visual formalization of a version of semi-supervised clustering. In this problem one has to partition a given set of n objects into k clusters ( $$k < n$$ ). A collection of k pairwise disjoint nonempty subsets of objects is fixed. No two objects from different subsets of this collection may belong to the same cluster and all objects from any subset must belong to the same cluster. Similarity of objects is determined by an undirected graph. Vertices of this graph are in one-to-one correspondence with objects, and edges connect similar objects. One has to partition the vertices of the graph into pairwise disjoint groups (clusters) minimizing the number of edges between clusters and the number of missing edges inside clusters.The problem is NP-hard for any fixed $$k \ge 2$$ . For $$k = 2$$ we present a polynomial time approximation algorithm and prove a performance guarantee of this algorithm.

Victor Il’ev, Svetlana Il’eva, Alexander Morshinin

### Exact Algorithm for the One-Dimensional Quadratic Euclidean Cardinality-Weighted 2-Clustering with Given Center Problem

We consider a strongly NP-hard problem of clustering a finite set of points in Euclidean space into two clusters. In this problem, we find a partition of the input set minimizing the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. The weight factors are the cardinalities of the corresponding clusters and the centers are defined as follows. The center of the first cluster is unknown and determined as the centroid, while the center of the other one is given as input (is the origin without loss of generality). In this paper, we present a polynomial-time exact algorithm for the one-dimensional case of the problem.

### Bilevel Models for Investment Policy in Resource-Rich Regions

This article continues the research of the authors into cooperation between public and private investors in the natural resource sector. This work aims to analyze the partnership mechanisms in terms of efficiency, using the game-theoretical Stackelberg model. Such mechanisms determine the investment policy of the state and play an important role in addressing a whole range of issues related to the strategic management of the natural resource sector in Russia. For bilevel mathematical programming problems, the computational complexity will be evaluated and effective solution algorithms based on metaheuristics and allowing solving large-dimensional problems will be developed. This opens up the possibility of a practical study on the real data of the properties of Stackelberg equilibrium, which determines the design of the mechanism for forming investment policies. The simulation results will allow not only to assess the impact of various factors on the effectiveness of the generated subsoil development program but also to formulate the basic principles that should guide the state in the management process.

Sergey Lavlinskii, Artem Panin, Alexander Plyasunov

### One Problem of the Sum of Weighted Convolution Differences Minimization, Induced by the Quasiperiodic Sequence Recognition Problem

We consider an unexplored discrete optimization problem of summing the elements of two numerical sequences. One of them belongs to the given set (alphabet) of sequences, while another one is given. We have to minimize the sum of M terms (M is unknown), each of them being the difference between the unweighted auto-convolution of the first sequence stretched to some length and the weighted convolution of this stretched sequence with the subsequence of the second one. We show that this problem is equivalent to the problem of recognizing a quasiperiodic sequence as a sequence induced by some sequence U from the given alphabet.We have constructed the algorithm which finds the exact solution to this problem in polynomial time. The numerical simulation demonstrates that this algorithm can be used to solve modeled applied problems of noise-proof processing of quasiperiodic signals.

Sergey Khamidullin, Liudmila Mikhailova

### Stability Analysis for Pricing

This article is dedicated to finding stable solutions to change input data on the example of pricing problems. In other words, we investigate stability analysis problems based on pricing problems.Initial pricing problems can be described as the following Stackelberg game. There are a company and its potential clients. First, the company sets prices at own facilities for a homogeneous product. After that, each client chooses the facility in which the minimum of his costs is achieved. The cost consists of purchase and transportation prices. At the same time, clients can make a purchase only if their budget allows it. The goal is to establish prices at which the maximum profit of the company is achieved. In the generalized problem of competitive pricing, two companies compete with each other for the client demand. They set prices sequentially. Clients are also the last to decide.For the pricing of one company, we discuss the computational complexity and algorithm solution of the stability analysis problem for three different pricing strategies. We also look at the competitive pricing problem with uniform pricing when the same price is set at all facilities. In conclusion, we discuss the relationship between the computational complexity of stability analysis problems and initial problems.

Artem A. Panin, Alexander V. Plyasunov

### Easy NP-hardness Proofs of Some Subset Choice Problems

We consider the following subset choice problems: given a family of Euclidean vectors, find a subset having the largest a) norm of the sum of its elements; b) square of the norm of the sum of its elements divided by the cardinality of the subset. The NP-hardness of these problems was proved in two papers about ten years ago by reduction of 3-SAT problem. However, that proofs were very tedious and hard to read. In the current paper much easier and natural proofs are presented.

Artem V. Pyatkin

### Sensitive Instances of the Cutting Stock Problem

We consider the well-known cutting stock problem (CSP). The gap of a CSP instance is the difference between its optimal function value and optimal value of its continuous relaxation. For most instances of CSP the gap is less than 1 and the maximal known gap $$6/5=1.2$$ was found by Rietz and Dempe [11]. Their method is based on constructing instances with large gaps from so-called sensitive instances with some additional constraints, which are hard to fulfill. We adapt our method presented in [15] to search for sensitive instances with required properties and construct a CSP instance with gap $$77/64=1.203125$$ . We also present several instances with large gaps much smaller than previously known.

Artem V. Ripatti, Vadim M. Kartak

### Some Estimates on the Discretization of Geometric Center-Based Problems in High Dimensions

We consider the following concept. A set C in multidimensional real space is said to be a $$(1+\varepsilon )$$ -collection for a set X if C contains a $$(1+\varepsilon )$$ -approximation of every point of space with respect to the Euclidean distances to all the elements of X. A $$(1+\varepsilon )$$ -collection allows to find approximate solutions of any geometric center-based problem where it is required to choose points in space (centers) minimizing a continuity-type function which depends on the distances from the input points to the centers. In fact, it gives a universal reduction of such problems to their discrete versions where all the centers must belong to a prescribed set of points. As was shown recently, for every fixed $$\varepsilon >0$$ and any finite set in high-dimensional space, there exists a $$(1+\varepsilon )$$ -collection which consists of a polynomial number of points and can be constructed by a polynomial-time algorithm. We slightly improve this algorithm and supplement it with a lower bound for the cardinality of $$(1+\varepsilon )$$ -collections in the worst case. Also, we show the non-existence of polynomial $$(1+\varepsilon )$$ -collections for some sets of points in the case of $$\ell _\infty$$ distances.

### Mathematical Programming

#### Frontmatter

In the paper, we generalize the approach Gasnikov et al. 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases ofsomeclassical sets.

Aleksandr Beznosikov, Abdurakhmon Sadiev, Alexander Gasnikov

### On Multiple Coverings of Fixed Size Containers with Non-Euclidean Metric by Circles of Two Types

The paper is devoted to the multiple covering problem by circles of two types. The number of circles of each class is given as well as a ratio radii. The circle covering problem is usually studied in the case when the distance between points is Euclidean. We assume that the distance is determined using some particular metric arising in logistics, which, generally speaking, is not Euclidean. The numerical algorithm is suggested and implemented. It based on an optical-geometric approach, which is developed by the authors in recent years and previously used only for circles of an equal radius. The results of a computational experiment are presented and discussed.

Alexander Kazakov, Anna Lempert, Quang Mung Le

### Analogues of Switching Subgradient Schemes for Relatively Lipschitz-Continuous Convex Programming Problems

Recently some specific classes of non-smooth and non-Lipsch-itz convex optimization problems were considered by Yu. Nesterov and H. Lu. We consider convex programming problems with similar smoothness conditions for the objective function and functional constraints. We introduce a new concept of an inexact model and propose some analogues of switching subgradient schemes for convex programming problems for the relatively Lipschitz-continuous objective function and functional constraints. Some class of online convex optimization problems is considered. The proposed methods are optimal in the class of optimization problems with relatively Lipschitz-continuous objective and functional constraints.

Alexander A. Titov, Fedor S. Stonyakin, Mohammad S. Alkousa, Seydamet S. Ablaev, Alexander V. Gasnikov

### Constructing Mixed Algorithms on the Basis of Some Bundle Method

In the paper, a method is proposed for minimizing a nondifferentiable convex function. This method belongs to a class of bundle methods. In the developed method it is possible to periodically produce discarding all previously constructed cutting planes that form the model of the objective function. These discards are applied when approximation of the epigraph of the objective function is sufficiently good in the a neighborhood of the current iteration point, and the quality of this approximation is estimated by using the model of the objective function. It is proposed an approach for constructing mixed minimization algorithms on the basis of the developed bundle method with involving any relaxation methods. The opportunity to mix the developed bundle method with other methods is provided as follows. In the proposed method during discarding the cutting planes the main iteration points are fixed with the relaxation condition. Any relaxation minimization method can be used to build these points. Moreover, the convergence of all such mixed algorithms will be guaranteed by the convergence of the developed bundle method. It is important to note that the procedures for updating cutting planes introduced in the bundle method will be transferred to mixed algorithms. The convergence of the proposed method is investigated, its properties are discussed, an estimate of the accuracy of the solution and estimation of the complexity of finding an approximate solution are obtained.

Rashid Yarullin

### Near-Optimal Hyperfast Second-Order Method for Convex Optimization

In this paper, we present a new Hyperfast Second-Order Method with convergence rate $$O(N^{-5})$$ up to a logarithmic factor for the convex function with Lipshitz 3rd derivative. This method based on two ideas. The first comes from the superfast second-order scheme of Yu. Nesterov (CORE Discussion Paper 2020/07, 2020). It allows implementing the third-order scheme by solving subproblem using only the second-order oracle. This method converges with rate $$O(N^{-4})$$ . The second idea comes from the work of Kamzolov et al. (arXiv:2002.01004). It is the inexact near-optimal third-order method. In this work, we improve its convergence and merge it with the scheme of solving subproblem using only the second-order oracle. As a result, we get convergence rate $$O(N^{-5})$$ up to a logarithmic factor. This convergence rate is near-optimal and the best known up to this moment.

Dmitry Kamzolov

### On a Solving Bilevel D.C.-Convex Optimization Problems

This work addresses the optimistic statement of a bilevel optimization problem with a general d.c. optimization problem at the upper level and a convex optimization problem at the lower level. First, we use the reduction of the bilevel problem to a nonconvex mathematical optimization problem using the well-known Karush-Kuhn-Tucker approach. Then we employ the novel Global Search Theory and Exact Penalty Theory to solve the resulting nonconvex optimization problem. Following this theory, the special method of local search in this problem is constructed. This method takes into account the structure of the problem in question.

Andrei V. Orlov

### Strongly Convex Optimization for the Dual Formulation of Optimal Transport

In this paper we experimentally check a hypothesis, that dual problem to discrete entropy regularized optimal transport problem possesses strong convexity on a certain compact set. We present a numerical estimation technique of parameter of strong convexity and show that such an estimate increases the performance of an accelerated alternating minimization algorithm for strongly convex functions applied to the considered problem.

Nazarii Tupitsa, Alexander Gasnikov, Pavel Dvurechensky, Sergey Guminov

### Nonlinear Models of Convergence

A significant issue in studies of economic development is whether economies (countries, regions of a country, etc.) converge to one another in terms of per capita income. In this paper, nonlinear asymptotically subsiding trends of the income gap in a pair of economies model the convergence process. A few specific forms of such trends are proposed: log-exponential trend, exponential trend, and fractional trend. A pair of economies is deemed converging if time series of their income gap is stationary about any of these trends. To test for stationarity, standard unit root tests are applied with non-standard test statistics that are estimated for each kind of trends.

Konstantin Gluschenko

### A Game-Theoretic Approach to Team Formation in The Voice Show

This paper considers a game-theoretic model of a competition in which experts (players) seek to enroll two contestants into their own teams. The quality of the contestants is characterized by two random parameters, the first corresponding to the vocal talent of a contestant and the second to his appearance. The first quality parameter is known to the players, whereas the second is hidden from them. Each expert chooses an appropriate contestant based on the value of the known quality parameter only. The winner is the player whose team includes a contestant with the maximum sum of both quality parameters. This game is described by the best-choice model with incomplete information. The optimal strategies and payoffs of the players for different situations (subgames) of the game are found. The results of numerical simulation are presented.

Anna Ivashko, Vladimir Mazalov, Alexander Mazurov

### Weak Berge Equilibrium

Various concepts of solutions can be employed in the non-cooperative game theory. The Berge equilibrium is one of such solutions. The Berge equilibrium is an altruistic concept of equilibrium. In this concept, the players act on the principle “One for all and all for one!” The Berge equilibrium solves such well known paradoxes in the game theory as the “Prisoner’s Dilemma”, “Battle of the sexes” and many others. At the same time, the Berge equilibrium rarely exist in pure strategies. Moreover, in finite games, the Berge equilibrium may not exist in the class of mixed strategies. The paper proposes the concept of a weak Berge equilibrium. Unlike the Berge equilibrium, the moral basis of this equilibrium is the Hippocratic Oath “First do no harm”. On the other hand, all Berge equilibria are some weak Berge equilibria. The properties of the weak Berge equilibrium have been investigated. The existence of the weak Berge equilibrium in mixed strategies has been established for finite games. A numerical weak Berge equilibrium approximate search method, based on 3LP-algorithm, is proposed. The weak Berge equilibria for finite 3-person non-cooperative games are computed.

Konstantin Kudryavtsev, Ustav Malkov, Vladislav Zhukovskiy

### On Contractual Approach in Competitive Economies with Constrained Coalitional Structures

We establish a theorem that equilibria in an exchange economy can be described as allocations that are stable under the possibilities: (i) agents can partially and asymmetrically break current contracts, after that (ii) a new mutually beneficial contract can be concluded in a coalition of a size not more than 1 plus the maximum number of products that are not indifferent to the coalition members.The presented result generalizes previous ones on a Pareto improvement in an exchange economy with l commodities that requires the active participation of no more than $$l+1$$ traders. This concerned with Pareto optimal allocations, but we also describe equilibria. Thus according to the contractual approach to arrive at equilibrium only coalitions of constrained size can be applied that essentially raise the confidence of contractual modeling.

Valeriy Marakulin

### Pontryagin’s Maximum Principle for Non-cooperative Differential Games with Continuous Updating

The paper is devoted to the optimality conditions as determined by Pontryagin’s maximum principle for a non-cooperative differential game with continuous updating. Here it is assumed that at each time instant players have or use information about the game structure defined for the closed time interval with a fixed duration. The major difficulty in such a setting is how to define players’ behavior as the time evolves. Current time continuously evolves with an updating interval. As a solution for a non-cooperative game model, we adopt an open-loop Nash equilibrium within a setting of continuous updating. Theoretical results are demonstrated on an advertising game model, both initial and continuous updating versions are considered. A comparison of non-cooperative strategies and trajectories for both cases are presented.

Ovanes Petrosian, Anna Tur, Jiangjing Zhou

### Feasible Set Properties of an Optimal Exploitation Problem for a Binary Nonlinear Ecosystem Model with Reducible Step Operator

Previously, the authors proposed a formalization of renewable resources rational use problem based on the representation of controlled system as a discrete dynamical system. In the particular case of structured ecosystem described by Leslie’s binary model, despite its non-linearity, it turned out that all optimal controls preserving this system belong to the certain hyperplane. This paper explores the conditions under which the positive boundary of a feasible set of problem with so-called quasi-preserving controls also contain a part of some hyperplane. In the process, we used a generalization of classical concept of map irreducibility—the concept of local irreducibility. Statements about the influence of the irreducibility property of discrete dynamical system step operator on the properties of an feasible set positive boundary are proved.

Alexander I. Smirnov, Vladimir D. Mazurov

### Monopolistic Competition Model with Retailing

We study the two-level interaction “producer - retailer - consumer” in the monopolistic competition frame. The industry is organized by Dixit-Stiglitz type, the retailer is the only monopolist. The utility function is quadratic. The case of the retailer’s leadership is considered. Two types of retailer behavior are studied, namely: with/without free entry conditions. It turned out that the government needs to stimulate the retailer by paying him subsidies to increase social welfare. A similar result was obtained with respect to consumer surpluses.

### Mutation Rate Control in the Evolutionary Algorithm with a Self-adjusting Lower Bound

We consider the 2-rate $$(1+\lambda )$$ Evolutionary Algorithm, a heuristic that evaluates $$\lambda$$ search points per each iteration and keeps in the memory only a best-so-far solution. The algorithm uses a dynamic probability distribution from which the radius at which the $$\lambda$$ “offspring” are sampled. It has previously been observed that the performance of the 2-rate $$(1+\lambda )$$ Evolutionary Algorithm crucially depends on the threshold at which the mutation rate is capped to prevent it from converging to zero. This effect is an issue already when focusing on the simple-structured OneMax problem, the problem of minimizing the Hamming distance to an unknown bit string. Here, a small lower bound is preferable when $$\lambda$$ is small, whereas a larger lower bound is better for large $$\lambda$$ .We introduce a secondary parameter control scheme, which adjusts the lower bound during the run. We demonstrate, by extensive experimental means, that our algorithm performs decently on all OneMax problems, independently of the offspring population size. It therefore appropriately removes the dependency on the lower bound. We also evaluate our algorithm on several other benchmark problems, and show that it works fine provided the number of offspring, $$\lambda$$ , is not too large.

Kirill Antonov, Arina Buzdalova, Carola Doerr

### An Experimental Study of Operator Choices in the Genetic Algorithm

We study the influence of the particular choice of operators on the running time of the recently proposed $$(1+(\lambda ,\lambda ))$$ genetic algorithm. In particular, we consider three choices for the mutation operator, six choices of the crossover operator, three strategies for sampling population sizes based on non-integer parameter values, and four choices of what to do when the best mutant is better than the parent.We test all these 216 configurations on four optimization problems and in three adjustment flavours: the fixed $$\lambda$$ , the unlimited self-adjustment of $$\lambda$$ and the logarithmically capped one. For each of these configurations, we consider both the default values of the hyperparameters and the ones produced by the irace parameter tuning tool.The result of our experimental analysis showed that there exists a configuration that is robust on linear functions and is roughly two times faster compared to the initially proposed one and 12% faster on the OneMax problem compared to one of the similar previous studies. An even more robust configuration exists, which has a slightly worse performance on OneMax but is better on satisfiability problems. Such configurations can be the default choices for the practical evaluation of the $$(1+(\lambda ,\lambda ))$$ GA.

Anton Bassin, Maxim Buzdalov

### Optimal Investment in the Development of Oil and Gas Field

Let an oil and gas field consists of clusters in each of which an investor can launch at most one project. During the implementation of a particular project, all characteristics are known, including annual production volumes, necessary investment volumes, and profit. The total amount of investments that the investor spends on developing the field during the entire planning period we know. It is required to determine which projects to implement in each cluster so that, within the total amount of investments, the profit for the entire planning period is maximum.The problem under consideration is NP-hard. However, it is solved by dynamic programming with pseudopolynomial time complexity. Nevertheless, in practice, there are additional constraints that do not allow solving the problem with acceptable accuracy at a reasonable time. Such restrictions, in particular, are annual production volumes. In this paper, we considered only the upper constraints that are dictated by the pipeline capacity. For the investment optimization problem with such additional restrictions, we obtain qualitative results, propose an approximate algorithm, and investigate its properties. Based on the results of a numerical experiment, we conclude that the developed algorithm builds a solution close (in terms of the objective function) to the optimal one.

Adil Erzin, Roman Plotnikov, Alexei Korobkin, Gregory Melidi, Stepan Nazarenko

### Genetic Algorithms with the Crossover-Like Mutation Operator for the k-Means Problem

Progress in the development of automatic grouping (clustering) methods, based on solving the p-median and similar problems, is mainly aimed at increasing the computational efficiency of the algorithms, their applicability to larger problems, accuracy, and stability of their results. The researchers’ efforts are focused on the development of compromise heuristic algorithms that provide a fairly quick solution with minimal error. The Genetic Algorithms (GAs) with greedy agglomerative crossover procedure and other special GAs for the considered problems demonstrate the best values of the objective function (sum of squared distances) for many practically important problems. Usually, such algorithms do not use any mutation operator, which is common for other GAs.We propose new GAs for the k-means problem, which use the same procedures as both the crossover and mutation operators. We compared a simple GA for the k-means problem with one-point crossover and its modifications with the uniform random mutation and our new crossover-like mutation. In addition, we compared the GAs with greedy heuristic crossover procedures to their modifications which include the crossover-like mutation. The comparison results show that the idea of our new mutation operator is able to improve significantly the results of the simplest GA as well as the genetic algorithms with greedy agglomerative crossover operator.

Lev Kazakovtsev, Guzel Shkaberina, Ivan Rozhnov, Rui Li, Vladimir Kazakovtsev

### Using Merging Variables-Based Local Search to Solve Special Variants of MaxSAT Problem

In this paper we study the inversion of discrete functions associated with some hard combinatorial problems. Inversion of such a function is considered in the form of a special variant of the well-known MaxSAT problem. To solve the latter we apply the previously developed local search method based on the Merging Variables Principle (MVP). The main novelty is that we combine MVP with evolutionary strategies to leave local extrema generated by Merging Variables Hill Climbing algorithm. The results of computational experiments show the effectiveness of the proposed technique in application to inversion of several cryptographic hash functions and to one problem of combinatorial optimization, which is a variant of the Facility Location Problem.

Ilya V. Otpuschennikov, Alexander A. Semenov

### Fast Heuristic for Vehicle Routing Problem on Trees

In this paper we propose an efficient heuristic for the Vehicle Routing Problem on Trees (TVRP). An initial solution is constructed with a greedy algorithm based on the Depth-First Search (DFS) approach. To optimize initial solutions obtained by our DFS heuristic, Ruin-and-Recreate (RR) method is then applied. For diversification purposes a randomization mechanism is added to the construction of initial solutions and DFS+RR algorithm is executed multiple times until the best found solution stops changing. The results of our approach are compared with the solutions obtained by the exact model of Chandran & Raghavan (2008). The computational experiments show that the suggested heuristic is fast and finds solutions which differ from optimal ones less than by 1% in average.

Irina Utkina, Olga Bukanova, Mikhail V. Batsyn

### Network Distances for Weighted Digraphs

The interpretation of the biological mechanisms through the systems biology approach involves the representation of the molecular components in an integrated system, namely a network, where the interactions among them are much more informative than the single components. The definition of the dissimilarity between complex biological networks is fundamental to understand differences between conditions, states, and treatments. It is, therefore, challenging to identify the most suitable distance measures for this kind of analysis. In this work, we aim at testing several measures to define the distance among sample- and condition-specific metabolic networks. The networks are represented as directed, weighted graphs, due to the nature of the metabolic reactions. We used four different case studies and exploited Support Vector Machine classification to define the performance of each measure.

Ilaria Granata, Mario Rosario Guarracino, Lucia Maddalena, Ichcha Manipur

### Decomposition/Aggregation K-means for Big Data

Well-known and widely applied k-means clustering heuristic is used for solving Minimum Sum-of-Square Clustering problem. In solving large size problems, there are two major drawbacks of this technique: (i) since it has to process the large input dataset, it has heavy computational costs and (ii) it has a tendency to converge to one of the local minima of poor quality. In order to reduce the computational complexity, we propose a clustering technique that works on subsets of the entire dataset in a stream like fashion. Using different heuristics the algorithm transforms the Big Data into Small Data, clusters it and uses obtained centroids to initialize the original Big Data. It is especially sensitive for Big Data as the better initialization gives the faster convergence. This approach allows effective parallelization. The proposed technique evaluates dynamically parameters of clusters from sequential data portions (windows) by aggregating corresponding criteria estimates. With fixed clustering time our approach makes progress through a number of partial solutions and aggregates them in a better one. This is done in comparing to a single solution which can be obtained by regular k-means-type clustering on the whole dataset in the same time limits. Promising results are reported on instances from the literature and synthetically generated data with several millions of entities.

### On the Optimization Models for Automatic Grouping of Industrial Products by Homogeneous Production Batches

We propose an optimization model of automatic grouping (clustering) based on the k-means model with the Mahalanobis distance measure. This model uses training (parameterization) procedure for the Mahalanobis distance measure by calculating the averaged estimation of the covariance matrix for a training sample. In this work, we investigate the application of the k-means algorithm for the problem of automatic grouping of devices, each of which is described by a large number of measured parameters, with various distance measures: Euclidean, Manhattan, Mahalanobis. If we have a sample with the composition known in advance, we use it as a training (parameterizing) sample from which we can calculate the averaged estimation of the covariance matrix of homogeneous production batches using the Mahalanobis distance. We propose a new clustering model based on the k-means algorithm with the Mahalanobis distance with the averaged (weighted average) estimation of the covariance matrix. We used various optimization models based on the k-means model in our computational experiments for the automatic grouping (clustering) of electronic radio components based on data from their non-destructive testing results. As a result, our new model of automatic grouping allows us to reach the highest accuracy by the Rand index.

Guzel Sh. Shkaberina, Viktor I. Orlov, Elena M. Tovbis, Lev A. Kazakovtsev

### Backmatter

Weitere Informationen