Abstract.
Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated and all remaining words are set to their canonical form, without any number, gender or time flexion. Each sentence in the text is added to the network as a clique. A large number of written texts have been scrutinised, and it is found that texts have small-world as well as scale-free structures. The growth process of these networks has also been investigated, and a universal evolution of network quantifiers have been found among the set of texts written by distinct authors. Further analyses, based on shuffling procedures taken either on the texts or on the constructed networks, provide hints on the role played by the word frequency and sentence length distributions to the network structure.
Similar content being viewed by others
References
D.J. Watts, Small Worlds: The Dynamics of Networks between Order and Randomness (Princeton University Press, 1999)
M. Buchanan, Nexus: small world and the groundbreaking science of networks (W.W. Norton & Company, Inc., New York, 2002)
A.L. Barabasi, Linked: The New Science of Networks (Perseus Books Group, Cambridge MA, 2002)
S.N. Dorogovtsev, J.F.F. Mendes, Evolution of Networks: From Biological Nets to the Internet and WWW (Oxford Univ. Press, 2003)
R. Albert, A.L. Barabasi, Rev. Mod. Phys. 74, 47 (2002)
J. Camacho, R. Guimerà, L.A.N. Amaral, Phys. Rev. Lett. 88, 228102 (2002)
A.-L. Barabási, R. Albert, H. Jeong, Nature 401, 130 (1999)
J. Guare, Six Degrees of Separations: A Play (Vintage, New York, 1990)
D.J. Watts, S.H. Strogatz, Nature 393, 440 (1998)
A.L. Barabasi, R. Albert, Science 286, 509 (1999)
A.E. Motter, A.P.S. de Moura, Y.C. Lai, P. Dasgupta, Phys. Rev. E 65, 065102 (2002)
L.F. Costa, Intl. J. Mod. Phys. C 15, 371 (2004)
R.V. Solé, Nature 434, 289 (2005)
R.F.I. Cancho, R.V. Solé, Proc. R. Soc. London, Ser. B 268, 2261 (2001)
R.F.I. Cancho, R.V. Solé, R. Köhler, Phys. Rev. E 69, 051915 (2004)
R.F.I. Cancho, R.V. Solé, R. Köhler, e-print arXiv: cond-mat/0504154 (2005)
S.N. Dorogovtsev, J.F.F. Mendes, Proc. R. Soc. London, Ser. B 268, 2603 (2001)
V. Eguiluz, G. Cecchi, D.R. Chialvo, M. Baliki, A.V. Apkarian, Phys. Rev. Lett. 92, 018102 (2005)
G.K. Zipf, Human behaviour and the principle of least effort. An introduction to human ecology (Hafner, New York, 1972)
Unitex is a multilingual corpus processing system, based on automata-oriented technology developed by the Computational Linguistics Group of the Institut d'élétronique et d'informatique Gaspard-Monge (IGM-France) - http://infolingu.univ-mlv.fr/english/
Literary texts have been mostly obtained from Gutenberg Project website at www.gutenberg.org
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Caldeira, S., Petit Lobão, T., Andrade, R. et al. The network of concepts in written texts. Eur. Phys. J. B 49, 523–529 (2006). https://doi.org/10.1140/epjb/e2006-00091-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1140/epjb/e2006-00091-3