Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 5-6/2014

01-09-2014

Discovering bands from graphs

Author: Nikolaj Tatti

Published in: Data Mining and Knowledge Discovery | Issue 5-6/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Discovering the underlying structure of a given graph is one of the fundamental goals in graph mining. Given a graph, we can often order vertices in a way that neighboring vertices have a higher probability of being connected to each other. This implies that the edges form a band around the diagonal in the adjacency matrix. Such structure may rise for example if the graph was created over time: each vertex had an active time interval during which the vertex was connected with other active vertices. The goal of this paper is to model this phenomenon. To this end, we formulate an optimization problem: given a graph and an integer \(K\), we want to order graph vertices and partition the ordered adjacency matrix into \(K\) bands such that bands closer to the diagonal are more dense. We measure the goodness of a segmentation using the log-likelihood of a log-linear model, a flexible family of distributions containing many standard distributions. We divide the problem into two subproblems: finding the order and finding the bands. We show that discovering bands can be done in polynomial time with isotonic regression, and we also introduce a heuristic iterative approach. For discovering the order we use Fiedler order accompanied with a simple combinatorial refinement. We demonstrate empirically that our heuristic works well in practice.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Here, we sorted the data using the Fiedler order, see Sect. 5.
 
2
We can easily show that this choice is unique.
 
5
The full version of the Mammals dataset is available for research purposes from the Societas Europaea Mammalogica at http://​www.​european-mammals.​org.
 
6
NOW public release 030717 available from Fortelius (2005).
 
Literature
go back to reference Alvarez-Hamelin JI, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the k-core decomposition. In: NIPS, pp 41–50 Alvarez-Hamelin JI, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the k-core decomposition. In: NIPS, pp 41–50
go back to reference Atkins JE, Boman EG, Hendrickson B (1999) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28(1):297–310CrossRefMathSciNet Atkins JE, Boman EG, Hendrickson B (1999) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28(1):297–310CrossRefMathSciNet
go back to reference Ayer M, Brunk H, Ewing G, Reid W (1955) An empirical distribution function for sampling with incomplete information. Ann Math Stat 26(4):641–647 Ayer M, Brunk H, Ewing G, Reid W (1955) An empirical distribution function for sampling with incomplete information. Ann Math Stat 26(4):641–647
go back to reference Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284 Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284
go back to reference Bernaola-Galván P, Román-Roldán R, Oliver JL (1996) Compositional segmentation and long-range fractal correlations in DNA sequences. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Top 53(5):5181–5189 Bernaola-Galván P, Román-Roldán R, Oliver JL (1996) Compositional segmentation and long-range fractal correlations in DNA sequences. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Top 53(5):5181–5189
go back to reference Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92 Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92
go back to reference Douglas D, Peucker T (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Can Cartogr 10(2):112–122CrossRef Douglas D, Peucker T (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Can Cartogr 10(2):112–122CrossRef
go back to reference Erdos P, Renyi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61MathSciNet Erdos P, Renyi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61MathSciNet
go back to reference Fiedler M (1975) A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslov Math J 25(4):619–633MathSciNet Fiedler M (1975) A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslov Math J 25(4):619–633MathSciNet
go back to reference Gionis A, Mannila H, Seppänen JK (2004) Geometric and combinatorial tiles in 0-1 data. In: ECML PKDD, pp 173–184 Gionis A, Mannila H, Seppänen JK (2004) Geometric and combinatorial tiles in 0-1 data. In: ECML PKDD, pp 173–184
go back to reference Hahsler M, Hornik K, Buchta C (2008) Getting things in order: an introduction to the R package seriation. J Stat Softw 25(3):1–34 Hahsler M, Hornik K, Buchta C (2008) Getting things in order: an introduction to the R package seriation. J Stat Softw 25(3):1–34
go back to reference Haiminen N, Gionis A (2004) Unimodal segmentation of sequences. In: ICDM, pp 106–113 Haiminen N, Gionis A (2004) Unimodal segmentation of sequences. In: ICDM, pp 106–113
go back to reference Himberg J, Korpiaho K, Mannila H, Tikanmäki J, Toivonen H (2001) Time series segmentation for context recognition in mobile devices. In: ICDM, pp 203–210 Himberg J, Korpiaho K, Mannila H, Tikanmäki J, Toivonen H (2001) Time series segmentation for context recognition in mobile devices. In: ICDM, pp 203–210
go back to reference Johnson DS, Krishnan S, Chhugani J, Kumar S, Venkatasubramanian S (2004) Compressing large boolean matrices using reordering techniques. In: VLDB, pp 13–23 Johnson DS, Krishnan S, Chhugani J, Kumar S, Venkatasubramanian S (2004) Compressing large boolean matrices using reordering techniques. In: VLDB, pp 13–23
go back to reference Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Mining of concurrent text and time series. In: KDD workshop on text mining, pp 37–44 Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Mining of concurrent text and time series. In: KDD workshop on text mining, pp 37–44
go back to reference Mannila H, Terzi E (2007) Nestedness and segmented nestedness. In: KDD, pp 480–489 Mannila H, Terzi E (2007) Nestedness and segmented nestedness. In: KDD, pp 480–489
go back to reference Mitchell-Jones A, Amori G, Bogdanowicz W, Krystufek B, Reijnders PH, Spitzenberger F, Stubbe M, Thissen J, Vohralik V, Zima J (1999) The atlas of European mammals. Academic Press, New York Mitchell-Jones A, Amori G, Bogdanowicz W, Krystufek B, Reijnders PH, Spitzenberger F, Stubbe M, Thissen J, Vohralik V, Zima J (1999) The atlas of European mammals. Academic Press, New York
go back to reference Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, pp 339–349 Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, pp 339–349
go back to reference Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: ICDE, pp 536–545 Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: ICDE, pp 536–545
go back to reference Spouge J, Wan H, Wilbur WJ (June 2003) Least squares isotonic regression in two dimensions. J Optim Theory Appl 117:585–605 Spouge J, Wan H, Wilbur WJ (June 2003) Least squares isotonic regression in two dimensions. J Optim Theory Appl 117:585–605
go back to reference Tatti N, Gionis A (2013) Discovering nested communities. In: ECML PKDD, pp 32–47 Tatti N, Gionis A (2013) Discovering nested communities. In: ECML PKDD, pp 32–47
go back to reference Tatti N, Vreeken J (2012) Discovering descriptive tile trees—by mining optimal geometric subtiles. In: ECML PKDD, pp 9–24 Tatti N, Vreeken J (2012) Discovering descriptive tile trees—by mining optimal geometric subtiles. In: ECML PKDD, pp 9–24
go back to reference Terzi E, Tsaparas P (2006) Efficient algorithms for sequence segmentation. In: SDM Terzi E, Tsaparas P (2006) Efficient algorithms for sequence segmentation. In: SDM
Metadata
Title
Discovering bands from graphs
Author
Nikolaj Tatti
Publication date
01-09-2014
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 5-6/2014
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-014-0359-9

Other articles of this Issue 5-6/2014

Data Mining and Knowledge Discovery 5-6/2014 Go to the issue

Premium Partner