Skip to main content
Top
Published in: Knowledge and Information Systems 3/2019

21-09-2018 | Regular Paper

Information theoretic-PSO-based feature selection: an application in biomedical entity extraction

Authors: Shweta Yadav, Asif Ekbal, Sriparna Saha

Published in: Knowledge and Information Systems | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Named entity recognition is a vital task for various applications related to biomedical natural language processing. It aims at extracting different biomedical entities from the text and classifying them into some predefined categories. The types could vary depending upon the genre and domain, such as gene versus non-gene in a coarse-grained scenario, or protein, DNA, RNA, cell line, and cell-type in a fine-grained scenario. In this paper, we present a novel filter-based feature selection technique utilizing the search capability of particle swarm optimization (PSO) for determining the most optimal feature combination. The technique yields in the most optimized feature set, that when used for classifiers learning, enhance the system performance. The proposed approach is assessed over four popular biomedical corpora, namely GENIA, GENETAG, AIMed, and Biocreative-II Gene Mention Recognition (BC-II). Our proposed model obtains the F score values of \(74.49\%\), \(91.11\%\), \(90.47\%\), \(88.64\%\) on GENIA, GENETAG, AIMed, and BC-II dataset, respectively. The efficiency of feature pruning through PSO is evident with significant performance gains, even with a much reduced set of features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Ando RK (2007) Biocreative II gene mention tagging system at IBM watson. In: Proceedings of the second biocreative challenge evaluation workshop, Centro Nacional de Investigaciones Oncologicas (CNIO) Madrid, Spain, vol 23, pp 101–103 Ando RK (2007) Biocreative II gene mention tagging system at IBM watson. In: Proceedings of the second biocreative challenge evaluation workshop, Centro Nacional de Investigaciones Oncologicas (CNIO) Madrid, Spain, vol 23, pp 101–103
2.
go back to reference Aronson AR (2001) Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the AMIA symposium, American Medical Informatics Association, p 17 Aronson AR (2001) Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the AMIA symposium, American Medical Informatics Association, p 17
3.
go back to reference Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann Math Stat 41(1):164–171MathSciNetMATHCrossRef Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann Math Stat 41(1):164–171MathSciNetMATHCrossRef
4.
go back to reference Bhadra T, Bandyopadhyay S (2015) Unsupervised feature selection using an improved version of differential evolution. Expert Syst Appl 42(8):4042–4053CrossRef Bhadra T, Bandyopadhyay S (2015) Unsupervised feature selection using an improved version of differential evolution. Expert Syst Appl 42(8):4042–4053CrossRef
5.
go back to reference Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(suppl 1):D267–D270CrossRef Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(suppl 1):D267–D270CrossRef
6.
go back to reference Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297MATH Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297MATH
7.
8.
go back to reference Danger R, Pla F, Molina A, Rosso P (2014) Towards a protein-protein interaction information extraction system: recognizing named entities. Knowl Based Syst 57:104–118CrossRef Danger R, Pla F, Molina A, Rosso P (2014) Towards a protein-protein interaction information extraction system: recognizing named entities. Knowl Based Syst 57:104–118CrossRef
9.
go back to reference Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: International conference on parallel problem solving from nature. Springer, pp 849–858 Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: International conference on parallel problem solving from nature. Springer, pp 849–858
10.
go back to reference Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. Proceedings of the sixth international symposium on micro machine and human science, New York, NY 1:39–43CrossRef Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. Proceedings of the sixth international symposium on micro machine and human science, New York, NY 1:39–43CrossRef
11.
go back to reference Ekbal A, Saha S (2013) Stacked ensemble coupled with feature selection for biomedical entity extraction. Knowl Based Syst 46:22–32CrossRef Ekbal A, Saha S (2013) Stacked ensemble coupled with feature selection for biomedical entity extraction. Knowl Based Syst 46:22–32CrossRef
12.
go back to reference Ekbal A, Saha S, Garbe CS (2010) Feature selection using multiobjective optimization for named entity recognition. In: 20th international conference on pattern recognition (ICPR), 2010. IEEE, pp 1937–1940 Ekbal A, Saha S, Garbe CS (2010) Feature selection using multiobjective optimization for named entity recognition. In: 20th international conference on pattern recognition (ICPR), 2010. IEEE, pp 1937–1940
13.
go back to reference Ekbal A, Saha S, Sikdar UK (2013) Biomedical named entity extraction: some issues of corpus compatibilities. SpringerPlus 2(1):1CrossRef Ekbal A, Saha S, Sikdar UK (2013) Biomedical named entity extraction: some issues of corpus compatibilities. SpringerPlus 2(1):1CrossRef
14.
go back to reference Ekbal A, Saha S, Bhattacharyya P et al (2016) A deep learning architecture for protein-protein interaction article identification. In: 23rd international conference on pattern recognition (ICPR), 2016. IEEE, pp 3128–3133 Ekbal A, Saha S, Bhattacharyya P et al (2016) A deep learning architecture for protein-protein interaction article identification. In: 23rd international conference on pattern recognition (ICPR), 2016. IEEE, pp 3128–3133
15.
go back to reference Finkel J, Dingare S, Nguyen H, Nissim M, Manning C, Sinclair G (2004) Exploiting context for biomedical entity recognition: from syntax to the web. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 88–91 Finkel J, Dingare S, Nguyen H, Nissim M, Manning C, Sinclair G (2004) Exploiting context for biomedical entity recognition: from syntax to the web. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 88–91
16.
go back to reference Finkel J, Dingare S, Manning CD, Nissim M, Alex B, Grover C (2005) Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinf 6(Suppl 1):S5CrossRef Finkel J, Dingare S, Manning CD, Nissim M, Alex B, Grover C (2005) Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinf 6(Suppl 1):S5CrossRef
17.
go back to reference Friedrich CM, Revillion T, Hofmann M, Fluck J (2006) Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features. In: Proceedings of the second international symposium on semantic mining in biomedicine (SMBM 2006), vol 7. BioMed Central Ltd, London, UK, pp 85–89 Friedrich CM, Revillion T, Hofmann M, Fluck J (2006) Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features. In: Proceedings of the second international symposium on semantic mining in biomedicine (SMBM 2006), vol 7. BioMed Central Ltd, London, UK, pp 85–89
18.
go back to reference Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13MATHCrossRef Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13MATHCrossRef
19.
go back to reference GuoDong Z, Jian S (2004) Exploring deep knowledge resources in biomedical name recognition. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 96–99 GuoDong Z, Jian S (2004) Exploring deep knowledge resources in biomedical name recognition. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 96–99
20.
go back to reference Gupta D, Tripathi S, Ekbal A, Bhattacharyya P (2016) A hybrid approach for entity extraction in code-mixed social media data. MONEY 25:66 Gupta D, Tripathi S, Ekbal A, Bhattacharyya P (2016) A hybrid approach for entity extraction in code-mixed social media data. MONEY 25:66
21.
go back to reference Gupta DK, Reddy KS, Ekbal A et al (2015) Pso-asent: Feature selection using particle swarm optimization for aspect based sentiment analysis. In: International conference on applications of natural language to information systems. Springer, pp 220–233 Gupta DK, Reddy KS, Ekbal A et al (2015) Pso-asent: Feature selection using particle swarm optimization for aspect based sentiment analysis. In: International conference on applications of natural language to information systems. Springer, pp 220–233
22.
go back to reference Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422MATHCrossRef Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422MATHCrossRef
23.
go back to reference Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato
24.
go back to reference Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J (2005) Prominer: organism-specific protein name detection using approximate string matching. BMC Bioinf 6(Suppl 1):S14CrossRef Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J (2005) Prominer: organism-specific protein name detection using approximate string matching. BMC Bioinf 6(Suppl 1):S14CrossRef
25.
go back to reference Kennedy J, Eberhart R (1997) A discrete binary version of the particle swarm algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics, 1997. Computational cybernetics and simulation, vol 5, pp 4104–4108 Kennedy J, Eberhart R (1997) A discrete binary version of the particle swarm algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics, 1997. Computational cybernetics and simulation, vol 5, pp 4104–4108
26.
go back to reference Kim JD, Ohta T, Tsuruoka Y, Tateisi Y, Collier N (2004) Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 70–75 Kim JD, Ohta T, Tsuruoka Y, Tateisi Y, Collier N (2004) Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 70–75
27.
go back to reference Kim S, Yoon J, Park KM, Rim HC (2005) Two-phase biomedical named entity recognition using a hybrid method. In: Natural language processing–IJCNLP 2005. Springer, pp 646–657 Kim S, Yoon J, Park KM, Rim HC (2005) Two-phase biomedical named entity recognition using a hybrid method. In: Natural language processing–IJCNLP 2005. Springer, pp 646–657
28.
go back to reference Kinoshita S, Cohen KB, Ogren PV, Hunter L (2005) Biocreative task1a: entity identification with a stochastic tagger. BMC bioinf 6(Suppl 1):S4CrossRef Kinoshita S, Cohen KB, Ogren PV, Hunter L (2005) Biocreative task1a: entity identification with a stochastic tagger. BMC bioinf 6(Suppl 1):S4CrossRef
29.
go back to reference Kittler J (1978) Feature set search algorithms. In: Chen CH (ed) Pattern recognition and signal processing. Sijthoff and Noordhoff, Alphen aan den Rijn, Netherlands, pp 41–60CrossRef Kittler J (1978) Feature set search algorithms. In: Chen CH (ed) Pattern recognition and signal processing. Sijthoff and Noordhoff, Alphen aan den Rijn, Netherlands, pp 41–60CrossRef
30.
go back to reference Kumar A, Ekbal A, Saha S, Bhattacharyya P et al (2016) A recurrent neural network architecture for de-identifying clinical records. In: Proceedings of the 13th international conference on natural language processing, pp 188–197 Kumar A, Ekbal A, Saha S, Bhattacharyya P et al (2016) A recurrent neural network architecture for de-identifying clinical records. In: Proceedings of the 13th international conference on natural language processing, pp 188–197
31.
go back to reference Kuo CJ, Chang YM, Huang HS, Lin KT, Yang BH, Lin YS, Hsu CN, Chung IF (2007) Rich feature set, unification of bidirectional parsing and dictionary filtering for high f-score gene mention tagging. In: Proceedings of the second biocreative challenge evaluation workshop. Centro Nacional de Investigaciones Oncologicas (CNIO) Madrid, Spain, vol 23, pp 105–107 Kuo CJ, Chang YM, Huang HS, Lin KT, Yang BH, Lin YS, Hsu CN, Chung IF (2007) Rich feature set, unification of bidirectional parsing and dictionary filtering for high f-score gene mention tagging. In: Proceedings of the second biocreative challenge evaluation workshop. Centro Nacional de Investigaciones Oncologicas (CNIO) Madrid, Spain, vol 23, pp 105–107
32.
go back to reference Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282–289 Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282–289
33.
go back to reference Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. arXiv preprint arXiv:​1603.​01360
34.
go back to reference Li L, Jin L, Jiang Z, Song D, Huang D (2015) Biomedical named entity recognition based on extended recurrent neural networks. In: IEEE international conference on bioinformatics and biomedicine (BIBM), 2015. IEEE, pp 649–652 Li L, Jin L, Jiang Z, Song D, Huang D (2015) Biomedical named entity recognition based on extended recurrent neural networks. In: IEEE international conference on bioinformatics and biomedicine (BIBM), 2015. IEEE, pp 649–652
35.
go back to reference McDonald R, Pereira F (2005) Identifying gene and protein mentions in text using conditional random fields. BMC Bioinf 6(Suppl 1):S6CrossRef McDonald R, Pereira F (2005) Identifying gene and protein mentions in text using conditional random fields. BMC Bioinf 6(Suppl 1):S6CrossRef
36.
go back to reference Mitsumori T, Fation S, Murata M, Doi K, Doi H (2005) Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinf 6(Suppl 1):S8CrossRef Mitsumori T, Fation S, Murata M, Doi K, Doi H (2005) Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinf 6(Suppl 1):S8CrossRef
37.
go back to reference Park KM, Kim SH, Rim HC, Hwang YS (2006) Me-based biomedical named entity recognition using lexical knowledge. ACM Trans Asian Lang Inf Process (TALIP) 5(1):4–21CrossRef Park KM, Kim SH, Rim HC, Hwang YS (2006) Me-based biomedical named entity recognition using lexical knowledge. ACM Trans Asian Lang Inf Process (TALIP) 5(1):4–21CrossRef
38.
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
39.
go back to reference Ponomareva N, Pla F, Molina A, Rosso P (2007) Biomedical named entity recognition: a poor knowledge hmm-based approach. In: Natural language processing and information systems. Springer, pp 382–387 Ponomareva N, Pla F, Molina A, Rosso P (2007) Biomedical named entity recognition: a poor knowledge hmm-based approach. In: Natural language processing and information systems. Springer, pp 382–387
40.
go back to reference Ramadan RM, Abdel-Kader RF (2009) Face recognition using particle swarm optimization-based selected features. Int J Signal Process Image Process Pattern Recognit 2(2):51–65 Ramadan RM, Abdel-Kader RF (2009) Face recognition using particle swarm optimization-based selected features. Int J Signal Process Image Process Pattern Recognit 2(2):51–65
41.
go back to reference Rindflesch TC, Tanabe L, Weinstein JN, Hunter L (2000) Edgar: extraction of drugs, genes and relations from the biomedical literature. In: Pacific symposium on biocomputing. Pacific Symposium on Biocomputing, NIH Public Access, p 517 Rindflesch TC, Tanabe L, Weinstein JN, Hunter L (2000) Edgar: extraction of drugs, genes and relations from the biomedical literature. In: Pacific symposium on biocomputing. Pacific Symposium on Biocomputing, NIH Public Access, p 517
42.
go back to reference Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. bioinformatics 23(19):2507–2517CrossRef Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. bioinformatics 23(19):2507–2517CrossRef
43.
go back to reference Saha S, Ekbal A, Sikdar UK (2015) Named entity recognition and classification in biomedical text using classifier ensemble. Int J Data Min Bioinf 11(4):365–391CrossRef Saha S, Ekbal A, Sikdar UK (2015) Named entity recognition and classification in biomedical text using classifier ensemble. Int J Data Min Bioinf 11(4):365–391CrossRef
44.
go back to reference Saha SK, Sarkar S, Mitra P (2009) Feature selection techniques for maximum entropy based biomedical named entity recognition. J Biomed Inf 42(5):905–911CrossRef Saha SK, Sarkar S, Mitra P (2009) Feature selection techniques for maximum entropy based biomedical named entity recognition. J Biomed Inf 42(5):905–911CrossRef
45.
go back to reference Segura-Bedmar I, Martínez P, Segura-Bedmar M (2008) Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems. Drug Discov Today 13(17):816–823CrossRef Segura-Bedmar I, Martínez P, Segura-Bedmar M (2008) Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems. Drug Discov Today 13(17):816–823CrossRef
46.
go back to reference Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 104–107 Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 104–107
47.
go back to reference Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549CrossRef Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549CrossRef
48.
go back to reference Smith L, Tanabe LK, Ando RJ, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich CM, Ganchev K (2008) Overview of biocreative ii gene mention recognition. Genome Biol 9(Suppl 2):S2CrossRef Smith L, Tanabe LK, Ando RJ, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich CM, Ganchev K (2008) Overview of biocreative ii gene mention recognition. Genome Biol 9(Suppl 2):S2CrossRef
49.
go back to reference Tanabe L, Wilbur WJ (2002) Tagging gene and protein names in biomedical text. Bioinformatics 18(8):1124–1132CrossRef Tanabe L, Wilbur WJ (2002) Tagging gene and protein names in biomedical text. Bioinformatics 18(8):1124–1132CrossRef
50.
go back to reference Tang B, Cao H, Wu Y, Jiang M, Xu H (2012) Clinical entity recognition using structural support vector machines with rich features. In: Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. ACM, pp 13–20 Tang B, Cao H, Wu Y, Jiang M, Xu H (2012) Clinical entity recognition using structural support vector machines with rich features. In: Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. ACM, pp 13–20
52.
go back to reference Tang B, Cao H, Wang X, Chen Q, Xu H (2014) Evaluating word representation features in biomedical named entity recognition tasks. BioMed Res Int Tang B, Cao H, Wang X, Chen Q, Xu H (2014) Evaluating word representation features in biomedical named entity recognition tasks. BioMed Res Int
53.
go back to reference Thang ND, Lee YK et al (2010) An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In: 10th IEEE/IPSJ international symposium on applications and the internet (SAINT), 2010. IEEE, pp 395–398 Thang ND, Lee YK et al (2010) An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In: 10th IEEE/IPSJ international symposium on applications and the internet (SAINT), 2010. IEEE, pp 395–398
54.
go back to reference Tjong Kim Sang EF, De Meulder F (2003) Introduction to the Conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol 4. Association for Computational Linguistics, pp 142–147 Tjong Kim Sang EF, De Meulder F (2003) Introduction to the Conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol 4. Association for Computational Linguistics, pp 142–147
55.
go back to reference Wang H, Zhao T, Tan H, Zhang S (2008) Biomedical named entity recognition based on classifiers ensemble. IJCSA 5(2):1–11 Wang H, Zhao T, Tan H, Zhang S (2008) Biomedical named entity recognition based on classifiers ensemble. IJCSA 5(2):1–11
56.
go back to reference Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classificationa machine learning approach. Comput Biol Chem 29(1):37–46MATHCrossRef Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classificationa machine learning approach. Comput Biol Chem 29(1):37–46MATHCrossRef
57.
go back to reference Yadav S, Ekbal A, Saha S, Bhattacharyya P (2016) Deep learning architecture for patient data de-identification in clinical records. In: Proceedings of the clinical natural language processing workshop (ClinicalNLP), pp 32–41 Yadav S, Ekbal A, Saha S, Bhattacharyya P (2016) Deep learning architecture for patient data de-identification in clinical records. In: Proceedings of the clinical natural language processing workshop (ClinicalNLP), pp 32–41
59.
go back to reference Yadav S, Ekbal A, Saha S (2017b) Feature selection for entity extraction from multiple biomedical corpora: a PSO-based approach. Soft Comput 21:1–24CrossRef Yadav S, Ekbal A, Saha S (2017b) Feature selection for entity extraction from multiple biomedical corpora: a PSO-based approach. Soft Comput 21:1–24CrossRef
60.
go back to reference Yadav S, Ekbal A, Saha S, Bhattacharyya P (2017c) Entity extraction in biomedical corpora: An approach to evaluate word embedding features with pso based feature selection. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: volume 1, Long Papers, vol 1, pp 1159–1170 Yadav S, Ekbal A, Saha S, Bhattacharyya P (2017c) Entity extraction in biomedical corpora: An approach to evaluate word embedding features with pso based feature selection. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: volume 1, Long Papers, vol 1, pp 1159–1170
61.
go back to reference Yadav S, Ekbal A, Saha S, Pathak PS, Bhattacharyya P (2017d) Patient data de-identification: a conditional random-field-based supervised approach. In: Handbook of research on applied cybernetics and systems science. IGI Global, pp 234–253 Yadav S, Ekbal A, Saha S, Pathak PS, Bhattacharyya P (2017d) Patient data de-identification: a conditional random-field-based supervised approach. In: Handbook of research on applied cybernetics and systems science. IGI Global, pp 234–253
62.
go back to reference Yadav S, Ekbal A, Saha S, Bhattacharyya P, Sheth A (2018a) Multi-task learning framework for mining crowd intelligence towards clinical treatment. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 2 (short papers), vol 2, pp 271–277 Yadav S, Ekbal A, Saha S, Bhattacharyya P, Sheth A (2018a) Multi-task learning framework for mining crowd intelligence towards clinical treatment. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 2 (short papers), vol 2, pp 271–277
63.
go back to reference Yadav S, Kumar A, Ekbal A, Saha S, Bhattacharyya P (2018b) Feature assisted bi-directional LSTM model for protein–protein interaction identification from biomedical texts. arXiv preprint arXiv:1807.02162 Yadav S, Kumar A, Ekbal A, Saha S, Bhattacharyya P (2018b) Feature assisted bi-directional LSTM model for protein–protein interaction identification from biomedical texts. arXiv preprint arXiv:​1807.​02162
64.
go back to reference Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inf 46(6):1088–1098CrossRef Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inf 46(6):1088–1098CrossRef
65.
go back to reference Zhang Y, Wang S, Phillips P, Ji G (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl Based Syst 64:22–31CrossRef Zhang Y, Wang S, Phillips P, Ji G (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl Based Syst 64:22–31CrossRef
66.
go back to reference Zhao S (2004) Named entity recognition in biomedical texts using an hmm model. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 84–87 Zhao S (2004) Named entity recognition in biomedical texts using an hmm model. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, pp 84–87
Metadata
Title
Information theoretic-PSO-based feature selection: an application in biomedical entity extraction
Authors
Shweta Yadav
Asif Ekbal
Sriparna Saha
Publication date
21-09-2018
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2019
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1265-z

Other articles of this Issue 3/2019

Knowledge and Information Systems 3/2019 Go to the issue

Premium Partner