Skip to main content
Erschienen in: Evolutionary Intelligence 2/2011

01.06.2011 | Special Issue

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

verfasst von: Alaa Abi-Haidar, Luis M. Rocha

Erschienen in: Evolutionary Intelligence | Ausgabe 2/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present and study an agent-based model of T-Cell cross-regulation in the adaptive immune system, which we apply to binary classification. Our method expands an existing analytical model of T-cell cross-regulation (Carneiro et al. in Immunol Rev 216(1):48–68, 2007) that was used to study the self-organizing dynamics of a single population of T-Cells in interaction with an idealized antigen presenting cell capable of presenting a single antigen. With agent-based modeling we are able to study the self-organizing dynamics of multiple populations of distinct T-cells which interact via antigen presenting cells that present hundreds of distinct antigens. Moreover, we show that such self-organizing dynamics can be guided to produce an effective binary classification of antigens, which is competitive with existing machine learning methods when applied to biomedical text classification. More specifically, here we test our model on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge (Krallinger in The biocreative ii. 5 challenge overview, p 19, 2009). We study the robustness of our model’s parameter configurations, and show that it leads to encouraging results comparable to state-of-the-art classifiers. Our results help us understand both T-cell cross-regulation as a general principle of guided self-organization, as well as its applicability to document classification. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We use the terminology of self/nonself discrimination, though perhaps a more accurate description is classification of harmless vs. harmful substances; harmless can also include antigens from bacteria that are necessary for vertebrate bodies, and harmful can also include body’s own tumor cells.
 
2
A good, though already a bit dated, overview of the vertebrate immune system for the artificial life community is Hofmeyer’s [12].
 
3
The simplification of proliferation to mere duplication adopted in the canonical CRM model is maintained in our agent-based model to minimize the number of parameters (excluding proliferation rates) and the parameter search space
 
4
Every E f or R f has equal probability of binding to the APC that presents feature f
 
5
The list of common (stop) words includes 33 of the most common English words from which we manually excluded the word “with”, as we know it to be of importance to PPI
 
6
For feature extraction we used both the training data of Biocreative 2.5 and Biocreative 2 as described in [11]; all classifiers used the exact same feature set.
 
7
TF.IDF is a common text weighting measure to evaluate the importance of a feature/word in a document in a corpus. TF stands for term frequency in a document and IDF for inverse document frequency in the corpus.
 
8
Notice that this parameter search on the provided labeled training data uses only the information available to the teams participating in Biocreative 2.5 challenge, and none of the test data whose labels were revealed post-challenge.
 
9
\(\hbox{F-score} ={\frac{\hbox{2.Precision.Recall}}{\hbox{Precision} + \hbox{Recall}}}\) where \(\hbox{Precision} = {\frac{\hbox{TP}}{\hbox{TP} + \hbox{FP}}}\) and \(\hbox{Recall} ={\frac{\hbox{TP}}{\hbox{TP} + \hbox{FN}}}\). True Positives (TP) and False Positives (FP) are the classifier’s correct and incorrect predictions for relevant documents, while True Negatives (TN) and False Negatives (FN) are the correct and incorrect predictions for irrelevant documents.
 
Literatur
1.
Zurück zum Zitat Carneiro J, Leon K, Caramalho I, van den Dool C, Gardner R, Oliveira V, Bergman ML, Sepúlveda N, Paixão T, Faro J, Demengeot J (2007) When three is not a crowd: a crossregulation model of the dynamics and repertoire selection of regulatory cd4 t cells. Immunol Rev 216(1):48–68 Carneiro J, Leon K, Caramalho I, van den Dool C, Gardner R, Oliveira V, Bergman ML, Sepúlveda N, Paixão T, Faro J, Demengeot J (2007) When three is not a crowd: a crossregulation model of the dynamics and repertoire selection of regulatory cd4 t cells. Immunol Rev 216(1):48–68
2.
Zurück zum Zitat Krallinger M (2009) The biocreative ii. 5 challenge overview, p 19 Krallinger M (2009) The biocreative ii. 5 challenge overview, p 19
3.
Zurück zum Zitat Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond pubmed?. Mol Cell 21(5):589–594CrossRef Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond pubmed?. Mol Cell 21(5):589–594CrossRef
4.
Zurück zum Zitat Jensen L, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129. doi:10.1038/nrg1768 CrossRef Jensen L, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129. doi:10.​1038/​nrg1768 CrossRef
5.
Zurück zum Zitat Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–856CrossRef Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–856CrossRef
6.
Zurück zum Zitat Hersh W, Bhupatiraju RT, Corley S (2004) Enhancing access to the bibliome: the trec genomics track. Medinfo 11(Pt 2):773–777 Hersh W, Bhupatiraju RT, Corley S (2004) Enhancing access to the bibliome: the trec genomics track. Medinfo 11(Pt 2):773–777
7.
Zurück zum Zitat Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinform 6(Suppl 1):S1CrossRef Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinform 6(Suppl 1):S1CrossRef
8.
Zurück zum Zitat Krallinger M, Valencia A (2007) Evaluating the detection and ranking of protein interaction relevant articles: the biocreative challenge interaction article sub-task (ias). In: Proceedings of the 2nd biocreative challenge evaluation workshop Krallinger M, Valencia A (2007) Evaluating the detection and ranking of protein interaction relevant articles: the biocreative challenge interaction article sub-task (ias). In: Proceedings of the 2nd biocreative challenge evaluation workshop
9.
Zurück zum Zitat Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, CambridgeCrossRef Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, CambridgeCrossRef
10.
Zurück zum Zitat Abi-Haidar A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM (2008) Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. p 9(Suppl 2):S11 Abi-Haidar A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM (2008) Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. p 9(Suppl 2):S11
12.
Zurück zum Zitat Hofmeyr SA (2001) An interpretative introduction to the immune system. Design principles for the immune system and other distributed autonomous systems Hofmeyr SA (2001) An interpretative introduction to the immune system. Design principles for the immune system and other distributed autonomous systems
13.
Zurück zum Zitat Segel LA, Cohen I (2001) Design principles for the immune system and other distributed autonomous systems. Oxford University Press, Oxford Segel LA, Cohen I (2001) Design principles for the immune system and other distributed autonomous systems. Oxford University Press, Oxford
14.
Zurück zum Zitat Mitchell M (2006) Complex systems: network thinking. Artif Intell 170(18):1194–1212CrossRef Mitchell M (2006) Complex systems: network thinking. Artif Intell 170(18):1194–1212CrossRef
15.
Zurück zum Zitat Peak D, West JD, Messinger SM, Mott KA (2004) Evidence for complex, collective dynamics and distributed emergent computation in plants. PNAS 101(4):918–922CrossRef Peak D, West JD, Messinger SM, Mott KA (2004) Evidence for complex, collective dynamics and distributed emergent computation in plants. PNAS 101(4):918–922CrossRef
16.
19.
Zurück zum Zitat Crutchfield J, Mitchell M (1995) The evolution of emergent computation. PNAS 92(23) Crutchfield J, Mitchell M (1995) The evolution of emergent computation. PNAS 92(23)
20.
Zurück zum Zitat Rocha LM, Hordijk W (2005) Material representations: from the genetic code to the evolution of cellular automata. Artif Life 11(1–2):189–214CrossRef Rocha LM, Hordijk W (2005) Material representations: from the genetic code to the evolution of cellular automata. Artif Life 11(1–2):189–214CrossRef
21.
Zurück zum Zitat Shalizi C, Haslinger R, Rouquier J-B, Klinkner K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73 Shalizi C, Haslinger R, Rouquier J-B, Klinkner K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73
23.
Zurück zum Zitat Twycross J, Cayzer S (2002) An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK Twycross J, Cayzer S (2002) An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK
24.
Zurück zum Zitat Dasgupta D, Nino F (2008) Immunological computation: theory and applications. AUERBACH Dasgupta D, Nino F (2008) Immunological computation: theory and applications. AUERBACH
25.
Zurück zum Zitat Garrett SM (2003) A paratope is not an epitope: implications for immune networks and clonal selection. pp 217–228 Garrett SM (2003) A paratope is not an epitope: implications for immune networks and clonal selection. pp 217–228
26.
Zurück zum Zitat Abi-Haidar A, Rocha LM (2008) Artificial immune systems (Proc. ICARIS), pp 36–47 Abi-Haidar A, Rocha LM (2008) Artificial immune systems (Proc. ICARIS), pp 36–47
27.
Zurück zum Zitat Abi-Haidar A, Rocha LM (2008) Artificial life XI: 11th international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 1–9 Abi-Haidar A, Rocha LM (2008) Artificial life XI: 11th international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 1–9
28.
Zurück zum Zitat Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):200415 Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):200415
29.
Zurück zum Zitat Paul WE, Technologies IO (1993) Fundamental immunology. Raven Press, New York Paul WE, Technologies IO (1993) Fundamental immunology. Raven Press, New York
30.
Zurück zum Zitat Burnet SFM (1959) The clonal selection theory of acquired immunity. Vanderbilt University Press, Nashville Burnet SFM (1959) The clonal selection theory of acquired immunity. Vanderbilt University Press, Nashville
31.
Zurück zum Zitat De Castro LN, Timmis J (2002) Artificial immune systems: a new computational intelligence approach. Springer, BerlinMATH De Castro LN, Timmis J (2002) Artificial immune systems: a new computational intelligence approach. Springer, BerlinMATH
32.
Zurück zum Zitat Sepulveda NH (2009) How is the t-cell repertoire shaped. Ph.D. thesis, Instituto Gulbenkian de Ciencia Sepulveda NH (2009) How is the t-cell repertoire shaped. Ph.D. thesis, Instituto Gulbenkian de Ciencia
33.
Zurück zum Zitat Abi-Haidar A, Rocha LM (2010) ICARIS 2010: Proceedings of the 9th international conference on artificial immune systems. In: pp 237–249 Abi-Haidar A, Rocha LM (2010) ICARIS 2010: Proceedings of the 9th international conference on artificial immune systems. In: pp 237–249
34.
Zurück zum Zitat Abi-Haidar A, Rocha LM (2010) Artificial life XII: twelfth international conference on the simulation and synthesis of living systems. In: pp 706–713 Abi-Haidar A, Rocha LM (2010) Artificial life XII: twelfth international conference on the simulation and synthesis of living systems. In: pp 706–713
35.
Zurück zum Zitat Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam (CEAS) Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam (CEAS)
36.
Zurück zum Zitat Joachims T (2002) Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer, Dordrecht Joachims T (2002) Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer, Dordrecht
37.
Zurück zum Zitat Porter MF (1980) An algorithm for suffix stripping. Program 13(3):130–137 Porter MF (1980) An algorithm for suffix stripping. Program 13(3):130–137
38.
Zurück zum Zitat Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation, pp 1015–1021 Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation, pp 1015–1021
Metadaten
Titel
Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics
verfasst von
Alaa Abi-Haidar
Luis M. Rocha
Publikationsdatum
01.06.2011
Verlag
Springer-Verlag
Erschienen in
Evolutionary Intelligence / Ausgabe 2/2011
Print ISSN: 1864-5909
Elektronische ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-011-0052-5