Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5/2015

01.09.2015

Beyond rankings: comparing directed acyclic graphs

verfasst von: Eric Malmi, Nikolaj Tatti, Aristides Gionis

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Defining appropriate distance measures among rankings is a classic area of study which has led to many useful applications. In this paper, we propose a more general abstraction of preference data, namely directed acyclic graphs (DAGs), and introduce a measure for comparing DAGs, given that a vertex correspondence between the DAGs is known. We study the properties of this measure and use it to aggregate and cluster a set of DAGs. We show that these problems are \(\mathbf {NP}\)-hard and present efficient methods to obtain solutions with approximation guarantees. In addition to preference data, these methods turn out to have other interesting applications, such as the analysis of a collection of information cascades in a network. We test the methods on synthetic and real-world datasets, showing that the methods can be used to, e.g., find a set of influential individuals related to a set of topics in a network or to discover meaningful and occasionally surprising clustering structure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Most often the Kendall-tau distance is defined to be a value between 0 and 1 by normalizing with the total number of vertex pairs \({{|V|} \atopwithdelims ()2}\).
 
Literatur
Zurück zum Zitat Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23MathSciNetCrossRef Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23MathSciNetCrossRef
Zurück zum Zitat Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. pp 7–15 Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. pp 7–15
Zurück zum Zitat Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp 33–42 Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp 33–42
Zurück zum Zitat Bender MA, Fineman JT, Gilbert S, Tarjan RE (2011) A new approach to incremental cycle detection and related problems. arXiv:1112.0784 Bender MA, Fineman JT, Gilbert S, Tarjan RE (2011) A new approach to incremental cycle detection and related problems. arXiv:​1112.​0784
Zurück zum Zitat Borda J (1781) Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des Sciences Borda J (1781) Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des Sciences
Zurück zum Zitat Brandenburg F, Gleißner A, Hofmeier A (2012) Comparing and aggregating partial orders with Kendall tau distances. In: WALCOM: algorithms and computation. Lecture notes in computer science, vol 7157. Springer Berlin Heidelberg, pp 88–99 Brandenburg F, Gleißner A, Hofmeier A (2012) Comparing and aggregating partial orders with Kendall tau distances. In: WALCOM: algorithms and computation. Lecture notes in computer science, vol 7157. Springer Berlin Heidelberg, pp 88–99
Zurück zum Zitat Brandenburg F, Gleißner A, Hofmeier A (2013) The nearest neighbor Spearman footrule distance for bucket, interval, and partial orders. J Comb Optim 26(2):310–332MathSciNetCrossRefMATH Brandenburg F, Gleißner A, Hofmeier A (2013) The nearest neighbor Spearman footrule distance for bucket, interval, and partial orders. J Comb Optim 26(2):310–332MathSciNetCrossRefMATH
Zurück zum Zitat Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3):255–259CrossRefMATH Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3):255–259CrossRefMATH
Zurück zum Zitat Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. pp 613–622 Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. pp 613–622
Zurück zum Zitat Even G, Naor J, Schieber B, Sudan M (1995) Approximating minimum feedback sets and multi-cuts in directed graphs. In: Proceedings of the 4th international conference on integer programming and combinatorial optimization. pp 14–28 Even G, Naor J, Schieber B, Sudan M (1995) Approximating minimum feedback sets and multi-cuts in directed graphs. In: Proceedings of the 4th international conference on integer programming and combinatorial optimization. pp 14–28
Zurück zum Zitat Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717MathSciNetCrossRefMATH Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717MathSciNetCrossRefMATH
Zurück zum Zitat Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning. pp 561–568 Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning. pp 561–568
Zurück zum Zitat Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):21CrossRef Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):21CrossRef
Zurück zum Zitat Goodman LA, Kruskal WH (1972) Measures of association for cross classifications, iv: simplification of asymptotic variances. J Am Stat Assoc 67(338):415–421CrossRefMATH Goodman LA, Kruskal WH (1972) Measures of association for cross classifications, iv: simplification of asymptotic variances. J Am Stat Assoc 67(338):415–421CrossRefMATH
Zurück zum Zitat Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceedings of the 17th ACM conference on information and knowledge management. pp 499–508 Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceedings of the 17th ACM conference on information and knowledge management. pp 499–508
Zurück zum Zitat Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining. pp 241–250 Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining. pp 241–250
Zurück zum Zitat Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRef Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRef
Zurück zum Zitat Jiang X, Munger A, Bunke H (2001) An median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151CrossRef Jiang X, Munger A, Bunke H (2001) An median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151CrossRef
Zurück zum Zitat Kann V (1992) On the approximability of np-complete optimization problems. Ph.D. thesis, KTH Kann V (1992) On the approximability of np-complete optimization problems. Ph.D. thesis, KTH
Zurück zum Zitat Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, New York Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, New York
Zurück zum Zitat Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp 137–146 Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp 137–146
Zurück zum Zitat Kendall M (1976) Rank correlation methods, 4th edn. Hodder Arnold, London Kendall M (1976) Rank correlation methods, 4th edn. Hodder Arnold, London
Zurück zum Zitat Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. In: Proceedings of the 39th annual ACM symposium on theory of computing. pp 95–103 Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. In: Proceedings of the 39th annual ACM symposium on theory of computing. pp 95–103
Zurück zum Zitat Laming D (2003) Human judgment: the eye of the beholder. Cengage Learning EMEA Laming D (2003) Human judgment: the eye of the beholder. Cengage Learning EMEA
Zurück zum Zitat Macchia L, Bonchi F, Gullo F, Chiarandini L (2013) Mining summaries of propagations. In: Proceedings of the 13th IEEE international conference on data mining. pp 498–507 Macchia L, Bonchi F, Gullo F, Chiarandini L (2013) Mining summaries of propagations. In: Proceedings of the 13th IEEE international conference on data mining. pp 498–507
Zurück zum Zitat Madden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London Madden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London
Zurück zum Zitat Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledge-based intelligent information and engineering systems. pp 67–75 Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledge-based intelligent information and engineering systems. pp 67–75
Zurück zum Zitat Su H, Gionis A, Rousu J (2014) Structured prediction of network response. In: Proceedings of the 31st international conference on machine learning. pp 442–450 Su H, Gionis A, Rousu J (2014) Structured prediction of network response. In: Proceedings of the 31st international conference on machine learning. pp 442–450
Metadaten
Titel
Beyond rankings: comparing directed acyclic graphs
verfasst von
Eric Malmi
Nikolaj Tatti
Aristides Gionis
Publikationsdatum
01.09.2015
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5/2015
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-015-0406-1

Weitere Artikel der Ausgabe 5/2015

Data Mining and Knowledge Discovery 5/2015 Zur Ausgabe