Weitere Artikel dieser Ausgabe durch Wischen aufrufen
The online version of this article (doi:10.1140/epjds/s13688-016-0090-4) contains supplementary material.
The authors declare that they have no competing interests.
All authors discussed and designed the experiments as well as contributed to the writing of the paper. VP and VG implemented and conducted the experiments. All authors read and approved the final manuscript.
Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node attributes) encountered when the original complex system is converted to a network. In this article we systematically show that the observed discrepancies may also be caused by a different reason: the external classification itself. For this end we use scientific publication data which (i) exhibit a well defined modular structure and (ii) hold an expert-made classification of research articles. Having represented the articles and the extracted scientific concepts both as a bipartite network and as its unipartite projection, we applied modularity optimization to uncover the inner thematic structure. The resulting clusters are shown to partly reflect the author-made classification, although some significant discrepancies are observed. A detailed analysis of these discrepancies shows that they may carry essential information about the system, mainly related to the use of similar techniques and methods across different (sub)disciplines, that is otherwise omitted when only the external classification is considered.
Data sets. The data file contains used metadata together with the lists of extracted concept identifiers for each manuscript under investigation. (zip)13688_2016_90_MOESM1_ESM.zip
Community detection results for articles submitted during year 2014. The figure consists of the inner composition of idf and bp partitions of arxivPhys2014 dataset and the corresponding category co-occurrence matrix. These results to a large extent reproduce the results obtained for the year 2013, thus verifying the conclusions made. (pdf)13688_2016_90_MOESM2_ESM.pdf
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452-473 CrossRef
Newman ME (2012) Communities, modules and large-scale structure in networks. Nat Phys 8(1):25-31 CrossRef
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110 CrossRef
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008 CrossRef
Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci 2009(10):186-198 CrossRef
Shibata N, Kajikawa Y, Takeda Y, Matsushima K (2008) Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation 28(11):758-775 CrossRef
Herrera M, Roberts DC, Gulbahce N (2010) Mapping the evolution of scientific fields. PLoS ONE 5(5):e10355 CrossRef
Rosvall M, Bergstrom CT (2010) Mapping change in large networks. PLoS ONE 5(1):e8694 CrossRef
Chen P, Redner S (2010) Community structure of the physical review citation network. J Informetr 4(3):278-290 CrossRef
Hric D, Darst RK, Fortunato S (2014) Community detection in networks: structural communities versus ground truth. Phys Rev E 90(6):062805 CrossRef
Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5 CrossRef
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 44-54 CrossRef
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, New York, pp 29-42 CrossRef
Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181-213 CrossRef
Palchykov V, Kaski K, Kertész J, Barabási A-L, Dunbar RI (2012) Sex differences in intimate relationships. Sci Rep 2:370 CrossRef
Kovanen L, Kaski K, Kertész J, Saramäki J (2013) Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences. Proc Natl Acad Sci USA 110(45):18070-18075 CrossRef
Bothorel C, Cruz JD, Magnani M, Micenkova B (2015) Clustering attributed graphs: models, measures and methods. Netw Sci 3(3):408-444 CrossRef
Newman MEJ, Clauset A (2016) Structure and inference in annotated networks. Nat Commun 7:11863 CrossRef
Waltman L, Eck NJ (2012) A new methodology for constructing a publication-level classification system of science. J Am Soc Inf Sci Technol 63(12):2378-2392 CrossRef
Boyack KW, Klavans R (2010) Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately? J Am Soc Inf Sci Technol 61(12):2389-2404 CrossRef
Boyack KW, Newman D, Duhon RJ, Klavans R, Patek M, Biberstine JR, Schijvenaars B, Skupin A, Ma N, Börner K (2011) Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3):e18029 CrossRef
Glenisson P, Glänzel W, Janssens F, De Moor B (2005) Combining full text and bibliometric information in mapping scientific disciplines. Inf Process Manag 41(6):1548-1572 CrossRef
An electronic archive and distribution server for research articles. http://arxiv.org
Prokofyev R, Demartini G, Boyarsky A, Ruchayskiy O, Cudré-Mauroux P (2013) Ontology-based word sense disambiguation for scientific literature. In: European conference on information retrieval. Springer, Berlin, pp 594-605.
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113 CrossRef
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993-1022 MATH
Guimerà R, Sales-Pardo M, Amaral LAN (2007) Module identification in bipartite and directed networks. Phys Rev E 76(3):036102 CrossRef
Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. Phys Rev E 90(1):012805 CrossRef
Jacob EK (2004) Classification and categorization: a difference that makes a difference
- Ground truth? Concept-based communities versus the external classification of physics manuscripts
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA