Skip to main content
Top
Published in: Evolutionary Intelligence 2/2011

01-06-2011 | Special Issue

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

Authors: Alaa Abi-Haidar, Luis M. Rocha

Published in: Evolutionary Intelligence | Issue 2/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present and study an agent-based model of T-Cell cross-regulation in the adaptive immune system, which we apply to binary classification. Our method expands an existing analytical model of T-cell cross-regulation (Carneiro et al. in Immunol Rev 216(1):48–68, 2007) that was used to study the self-organizing dynamics of a single population of T-Cells in interaction with an idealized antigen presenting cell capable of presenting a single antigen. With agent-based modeling we are able to study the self-organizing dynamics of multiple populations of distinct T-cells which interact via antigen presenting cells that present hundreds of distinct antigens. Moreover, we show that such self-organizing dynamics can be guided to produce an effective binary classification of antigens, which is competitive with existing machine learning methods when applied to biomedical text classification. More specifically, here we test our model on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge (Krallinger in The biocreative ii. 5 challenge overview, p 19, 2009). We study the robustness of our model’s parameter configurations, and show that it leads to encouraging results comparable to state-of-the-art classifiers. Our results help us understand both T-cell cross-regulation as a general principle of guided self-organization, as well as its applicability to document classification. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We use the terminology of self/nonself discrimination, though perhaps a more accurate description is classification of harmless vs. harmful substances; harmless can also include antigens from bacteria that are necessary for vertebrate bodies, and harmful can also include body’s own tumor cells.
 
2
A good, though already a bit dated, overview of the vertebrate immune system for the artificial life community is Hofmeyer’s [12].
 
3
The simplification of proliferation to mere duplication adopted in the canonical CRM model is maintained in our agent-based model to minimize the number of parameters (excluding proliferation rates) and the parameter search space
 
4
Every E f or R f has equal probability of binding to the APC that presents feature f
 
5
The list of common (stop) words includes 33 of the most common English words from which we manually excluded the word “with”, as we know it to be of importance to PPI
 
6
For feature extraction we used both the training data of Biocreative 2.5 and Biocreative 2 as described in [11]; all classifiers used the exact same feature set.
 
7
TF.IDF is a common text weighting measure to evaluate the importance of a feature/word in a document in a corpus. TF stands for term frequency in a document and IDF for inverse document frequency in the corpus.
 
8
Notice that this parameter search on the provided labeled training data uses only the information available to the teams participating in Biocreative 2.5 challenge, and none of the test data whose labels were revealed post-challenge.
 
9
\(\hbox{F-score} ={\frac{\hbox{2.Precision.Recall}}{\hbox{Precision} + \hbox{Recall}}}\) where \(\hbox{Precision} = {\frac{\hbox{TP}}{\hbox{TP} + \hbox{FP}}}\) and \(\hbox{Recall} ={\frac{\hbox{TP}}{\hbox{TP} + \hbox{FN}}}\). True Positives (TP) and False Positives (FP) are the classifier’s correct and incorrect predictions for relevant documents, while True Negatives (TN) and False Negatives (FN) are the correct and incorrect predictions for irrelevant documents.
 
Literature
1.
go back to reference Carneiro J, Leon K, Caramalho I, van den Dool C, Gardner R, Oliveira V, Bergman ML, Sepúlveda N, Paixão T, Faro J, Demengeot J (2007) When three is not a crowd: a crossregulation model of the dynamics and repertoire selection of regulatory cd4 t cells. Immunol Rev 216(1):48–68 Carneiro J, Leon K, Caramalho I, van den Dool C, Gardner R, Oliveira V, Bergman ML, Sepúlveda N, Paixão T, Faro J, Demengeot J (2007) When three is not a crowd: a crossregulation model of the dynamics and repertoire selection of regulatory cd4 t cells. Immunol Rev 216(1):48–68
2.
go back to reference Krallinger M (2009) The biocreative ii. 5 challenge overview, p 19 Krallinger M (2009) The biocreative ii. 5 challenge overview, p 19
3.
go back to reference Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond pubmed?. Mol Cell 21(5):589–594CrossRef Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond pubmed?. Mol Cell 21(5):589–594CrossRef
4.
go back to reference Jensen L, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129. doi:10.1038/nrg1768 CrossRef Jensen L, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129. doi:10.​1038/​nrg1768 CrossRef
5.
go back to reference Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–856CrossRef Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–856CrossRef
6.
go back to reference Hersh W, Bhupatiraju RT, Corley S (2004) Enhancing access to the bibliome: the trec genomics track. Medinfo 11(Pt 2):773–777 Hersh W, Bhupatiraju RT, Corley S (2004) Enhancing access to the bibliome: the trec genomics track. Medinfo 11(Pt 2):773–777
7.
go back to reference Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinform 6(Suppl 1):S1CrossRef Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinform 6(Suppl 1):S1CrossRef
8.
go back to reference Krallinger M, Valencia A (2007) Evaluating the detection and ranking of protein interaction relevant articles: the biocreative challenge interaction article sub-task (ias). In: Proceedings of the 2nd biocreative challenge evaluation workshop Krallinger M, Valencia A (2007) Evaluating the detection and ranking of protein interaction relevant articles: the biocreative challenge interaction article sub-task (ias). In: Proceedings of the 2nd biocreative challenge evaluation workshop
9.
go back to reference Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, CambridgeCrossRef Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, CambridgeCrossRef
10.
go back to reference Abi-Haidar A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM (2008) Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. p 9(Suppl 2):S11 Abi-Haidar A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM (2008) Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. p 9(Suppl 2):S11
12.
go back to reference Hofmeyr SA (2001) An interpretative introduction to the immune system. Design principles for the immune system and other distributed autonomous systems Hofmeyr SA (2001) An interpretative introduction to the immune system. Design principles for the immune system and other distributed autonomous systems
13.
go back to reference Segel LA, Cohen I (2001) Design principles for the immune system and other distributed autonomous systems. Oxford University Press, Oxford Segel LA, Cohen I (2001) Design principles for the immune system and other distributed autonomous systems. Oxford University Press, Oxford
14.
go back to reference Mitchell M (2006) Complex systems: network thinking. Artif Intell 170(18):1194–1212CrossRef Mitchell M (2006) Complex systems: network thinking. Artif Intell 170(18):1194–1212CrossRef
15.
go back to reference Peak D, West JD, Messinger SM, Mott KA (2004) Evidence for complex, collective dynamics and distributed emergent computation in plants. PNAS 101(4):918–922CrossRef Peak D, West JD, Messinger SM, Mott KA (2004) Evidence for complex, collective dynamics and distributed emergent computation in plants. PNAS 101(4):918–922CrossRef
19.
go back to reference Crutchfield J, Mitchell M (1995) The evolution of emergent computation. PNAS 92(23) Crutchfield J, Mitchell M (1995) The evolution of emergent computation. PNAS 92(23)
20.
go back to reference Rocha LM, Hordijk W (2005) Material representations: from the genetic code to the evolution of cellular automata. Artif Life 11(1–2):189–214CrossRef Rocha LM, Hordijk W (2005) Material representations: from the genetic code to the evolution of cellular automata. Artif Life 11(1–2):189–214CrossRef
21.
go back to reference Shalizi C, Haslinger R, Rouquier J-B, Klinkner K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73 Shalizi C, Haslinger R, Rouquier J-B, Klinkner K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73
23.
go back to reference Twycross J, Cayzer S (2002) An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK Twycross J, Cayzer S (2002) An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK
24.
go back to reference Dasgupta D, Nino F (2008) Immunological computation: theory and applications. AUERBACH Dasgupta D, Nino F (2008) Immunological computation: theory and applications. AUERBACH
25.
go back to reference Garrett SM (2003) A paratope is not an epitope: implications for immune networks and clonal selection. pp 217–228 Garrett SM (2003) A paratope is not an epitope: implications for immune networks and clonal selection. pp 217–228
26.
go back to reference Abi-Haidar A, Rocha LM (2008) Artificial immune systems (Proc. ICARIS), pp 36–47 Abi-Haidar A, Rocha LM (2008) Artificial immune systems (Proc. ICARIS), pp 36–47
27.
go back to reference Abi-Haidar A, Rocha LM (2008) Artificial life XI: 11th international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 1–9 Abi-Haidar A, Rocha LM (2008) Artificial life XI: 11th international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 1–9
28.
go back to reference Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):200415 Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):200415
29.
go back to reference Paul WE, Technologies IO (1993) Fundamental immunology. Raven Press, New York Paul WE, Technologies IO (1993) Fundamental immunology. Raven Press, New York
30.
go back to reference Burnet SFM (1959) The clonal selection theory of acquired immunity. Vanderbilt University Press, Nashville Burnet SFM (1959) The clonal selection theory of acquired immunity. Vanderbilt University Press, Nashville
31.
go back to reference De Castro LN, Timmis J (2002) Artificial immune systems: a new computational intelligence approach. Springer, BerlinMATH De Castro LN, Timmis J (2002) Artificial immune systems: a new computational intelligence approach. Springer, BerlinMATH
32.
go back to reference Sepulveda NH (2009) How is the t-cell repertoire shaped. Ph.D. thesis, Instituto Gulbenkian de Ciencia Sepulveda NH (2009) How is the t-cell repertoire shaped. Ph.D. thesis, Instituto Gulbenkian de Ciencia
33.
go back to reference Abi-Haidar A, Rocha LM (2010) ICARIS 2010: Proceedings of the 9th international conference on artificial immune systems. In: pp 237–249 Abi-Haidar A, Rocha LM (2010) ICARIS 2010: Proceedings of the 9th international conference on artificial immune systems. In: pp 237–249
34.
go back to reference Abi-Haidar A, Rocha LM (2010) Artificial life XII: twelfth international conference on the simulation and synthesis of living systems. In: pp 706–713 Abi-Haidar A, Rocha LM (2010) Artificial life XII: twelfth international conference on the simulation and synthesis of living systems. In: pp 706–713
35.
go back to reference Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam (CEAS) Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam (CEAS)
36.
go back to reference Joachims T (2002) Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer, Dordrecht Joachims T (2002) Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer, Dordrecht
37.
go back to reference Porter MF (1980) An algorithm for suffix stripping. Program 13(3):130–137 Porter MF (1980) An algorithm for suffix stripping. Program 13(3):130–137
38.
go back to reference Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation, pp 1015–1021 Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation, pp 1015–1021
Metadata
Title
Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics
Authors
Alaa Abi-Haidar
Luis M. Rocha
Publication date
01-06-2011
Publisher
Springer-Verlag
Published in
Evolutionary Intelligence / Issue 2/2011
Print ISSN: 1864-5909
Electronic ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-011-0052-5

Premium Partner