Skip to main content


Weitere Artikel dieser Ausgabe durch Wischen aufrufen

01.12.2016 | Regular article | Ausgabe 1/2016 Open Access

EPJ Data Science 1/2016

Ground truth? Concept-based communities versus the external classification of physics manuscripts

EPJ Data Science > Ausgabe 1/2016
Vasyl Palchykov, Valerio Gemmetto, Alexey Boyarsky, Diego Garlaschelli
Wichtige Hinweise

Electronic Supplementary Material

The online version of this article (doi:10.​1140/​epjds/​s13688-016-0090-4) contains supplementary material.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors discussed and designed the experiments as well as contributed to the writing of the paper. VP and VG implemented and conducted the experiments. All authors read and approved the final manuscript.


Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node attributes) encountered when the original complex system is converted to a network. In this article we systematically show that the observed discrepancies may also be caused by a different reason: the external classification itself. For this end we use scientific publication data which (i) exhibit a well defined modular structure and (ii) hold an expert-made classification of research articles. Having represented the articles and the extracted scientific concepts both as a bipartite network and as its unipartite projection, we applied modularity optimization to uncover the inner thematic structure. The resulting clusters are shown to partly reflect the author-made classification, although some significant discrepancies are observed. A detailed analysis of these discrepancies shows that they may carry essential information about the system, mainly related to the use of similar techniques and methods across different (sub)disciplines, that is otherwise omitted when only the external classification is considered.

Unsere Produktempfehlungen

Premium-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf alle acht Fachgebiete von Springer Professional und damit auf über 45.000 Fachbücher und ca. 300 Fachzeitschriften.

Springer Professional "Wirtschaft+Technik"


Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

Springer Professional "Wirtschaft"


Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

Weitere Produktempfehlungen anzeigen
Data sets. The data file contains used metadata together with the lists of extracted concept identifiers for each manuscript under investigation. (zip)
Community detection results for articles submitted during year 2014. The figure consists of the inner composition of idf and bp partitions of arxivPhys2014 dataset and the corresponding category co-occurrence matrix. These results to a large extent reproduce the results obtained for the year 2013, thus verifying the conclusions made. (pdf)
Über diesen Artikel

Weitere Artikel der Ausgabe 1/2016

EPJ Data Science 1/2016 Zur Ausgabe

Premium Partner