Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 3/2015

01-06-2015 | Original Article

Bayesian networks for incomplete data analysis in form processing

Authors: Emilie Philippot, K. C. Santosh, Abdel Belaïd, Yolande Belaïd

Published in: International Journal of Machine Learning and Cybernetics | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we study Bayesian network (BN) for form identification based on partially filled fields. It uses electronic ink-tracing files without having any information about form structure. Given a form format, the ink-tracing files are used to build the BN by providing the possible relationships between corresponding fields using conditional probabilities, that goes from individual fields up to the complete model construction. To simplify the BN, we sub-divide a single form into three different areas: header, body and footer, and integrate them together, where we study three fundamental BN learning algorithms: Naive, Peter & Clark and maximum weighted spanning tree. Under this framework, we validate it with a real-world industrial problem i.e., electronic note-taking in form processing. The approach provides satisfactory results, attesting the interest of BN for exploiting the incomplete form analysis problems, in particular.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Belaïd A (2001) Recognition of table of contents for electronic library consulting. Int J Document Analysis Recogn 4(1):35–45CrossRef Belaïd A (2001) Recognition of table of contents for electronic library consulting. Int J Document Analysis Recogn 4(1):35–45CrossRef
2.
go back to reference Cho S-J, Kim JH (2003) Bayesian network modeling of hangul characters for on-line handwriting recognition. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 207–2011 Cho S-J, Kim JH (2003) Bayesian network modeling of hangul characters for on-line handwriting recognition. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 207–2011
3.
go back to reference Chow C, Liu, C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467CrossRefMATHMathSciNet Chow C, Liu, C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467CrossRefMATHMathSciNet
4.
go back to reference Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the International Conference on Machine Learning, pp 233–240. ACM Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the International Conference on Machine Learning, pp 233–240. ACM
5.
go back to reference Denoyer L, Gallinari P (2004) Bayesian network model for semi-structured document classification. Inf Process Manage 40(5):807–827CrossRef Denoyer L, Gallinari P (2004) Bayesian network model for semi-structured document classification. Inf Process Manage 40(5):807–827CrossRef
6.
go back to reference François O, Leray P (2006) Learning the tree augmented naive bayes classifier from incomplete datasets. In: Proceedings of European Workshop on Probabilistic Graphical Models, pp 91–98 François O, Leray P (2006) Learning the tree augmented naive bayes classifier from incomplete datasets. In: Proceedings of European Workshop on Probabilistic Graphical Models, pp 91–98
7.
go back to reference Friedman N, Goldszmidt M (1996) Building classifiers using bayesian networks. In Proceedings of the national conference on Artificial intelligence, vol 2 AAAI’96, pp 1277–1284 Friedman N, Goldszmidt M (1996) Building classifiers using bayesian networks. In Proceedings of the national conference on Artificial intelligence, vol 2 AAAI’96, pp 1277–1284
8.
go back to reference Friedman N, Geiger D, Goldszmidt M, Provan G, Langley P, Smyth P (1997) Bayesian network classifiers. 29:131–163 Friedman N, Geiger D, Goldszmidt M, Provan G, Langley P, Smyth P (1997) Bayesian network classifiers. 29:131–163
9.
go back to reference Hallouli K, Likforman-Sulem L, Sigelle M, Sigelle M (2002) A comparative study between decision fusion and data fusion in markovian printed character recognition. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 147–150 Hallouli K, Likforman-Sulem L, Sigelle M, Sigelle M (2002) A comparative study between decision fusion and data fusion in markovian printed character recognition. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 147–150
10.
go back to reference He Y-L, Wang R, Kwong S, Wang X-Z (2014) Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis. Inf Sci 259:252–268CrossRefMathSciNet He Y-L, Wang R, Kwong S, Wang X-Z (2014) Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis. Inf Sci 259:252–268CrossRefMathSciNet
11.
go back to reference Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011a) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241 Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011a) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241
12.
go back to reference Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011b) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241 Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011b) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241
13.
go back to reference Jensen FV (1996) Introduction to Bayesian Networks. Springer, New York, Inc., Secaucus, NJ, 1st edition Jensen FV (1996) Introduction to Bayesian Networks. Springer, New York, Inc., Secaucus, NJ, 1st edition
14.
go back to reference Jensen FV, Lauritzen SL, Olesen KG (1990) Bayesian updating in causal probabilistic networks by local computations. Comput Stat Q 4:269–282MathSciNet Jensen FV, Lauritzen SL, Olesen KG (1990) Bayesian updating in causal probabilistic networks by local computations. Comput Stat Q 4:269–282MathSciNet
15.
go back to reference Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. In: Zhou L, Ooi B, Meng X (eds.) Database Systems for Advanced Applications volume 3453 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 688–698. Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. In: Zhou L, Ooi B, Meng X (eds.) Database Systems for Advanced Applications volume 3453 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 688–698.
16.
go back to reference Jiang L, Wang D, Cai Z (2007) Scaling up the accuracy of bayesian network classifiers by m-estimate. In: Huang D-S, Heutte L, Loog M (eds.) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence volume 4682 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg, 475–484 Jiang L, Wang D, Cai Z (2007) Scaling up the accuracy of bayesian network classifiers by m-estimate. In: Huang D-S, Heutte L, Loog M (eds.) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence volume 4682 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg, 475–484
17.
go back to reference Jiang L, Zhang H, Cai Z (2009) A novel bayes model: Hidden naive bayes. IEEE Trans Knowledge Data Eng 21(10):1361–1371CrossRef Jiang L, Zhang H, Cai Z (2009) A novel bayes model: Hidden naive bayes. IEEE Trans Knowledge Data Eng 21(10):1361–1371CrossRef
18.
go back to reference Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive bayes and its application in text classification. Int J Artif Intell Tools 21(1):1250007 Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive bayes and its application in text classification. Int J Artif Intell Tools 21(1):1250007
20.
go back to reference Kebairi S, Taconet B, Zahour A, Ramdane S (1998) A statistical method for an automatic detection of form types. In Proceedings of International Workshop on Document Analysis Systems, pp 84–98 Kebairi S, Taconet B, Zahour A, Ramdane S (1998) A statistical method for an automatic detection of form types. In Proceedings of International Workshop on Document Analysis Systems, pp 84–98
21.
go back to reference Keogh E, Pazzani M (1999) Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230 Keogh E, Pazzani M (1999) Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230
22.
go back to reference Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: AAAI, pp 223–228 Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: AAAI, pp 223–228
23.
go back to reference Likforman-Sulem L, Sigelle M (2008) Recognition of degraded characters using dynamic bayesian networks. Pattern Recogn 41(10):3092–3103CrossRefMATH Likforman-Sulem L, Sigelle M (2008) Recognition of degraded characters using dynamic bayesian networks. Pattern Recogn 41(10):3092–3103CrossRefMATH
24.
go back to reference Likforman-Sulem L, Sigelle M (2009) Combination of dynamic bayesian network classifiers for the recognition of degraded characters. In: Proceedings of the SPIE International Symposium on Document Recognition and Retrieval, pp 1–10 Likforman-Sulem L, Sigelle M (2009) Combination of dynamic bayesian network classifiers for the recognition of degraded characters. In: Proceedings of the SPIE International Symposium on Document Recognition and Retrieval, pp 1–10
25.
go back to reference Mahjoub MA, Jayech K (2010) ndexation de structures de documents par rseaux baysiens, pp 163–178 Mahjoub MA, Jayech K (2010) ndexation de structures de documents par rseaux baysiens, pp 163–178
26.
go back to reference Naïm PW, Leray POP, Becker A (2007) Réseaux bayésiens. Eyrolles Naïm PW, Leray POP, Becker A (2007) Réseaux bayésiens. Eyrolles
27.
go back to reference Neapolitan R (2004) Learning Bayesian Networks. Prentice Hall, Upper Saddle River Neapolitan R (2004) Learning Bayesian Networks. Prentice Hall, Upper Saddle River
28.
go back to reference Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco
29.
go back to reference Philippot E, Belaïd Y, Belaïd A (2010) Bayesian networks learning algorithms for online form classification. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 1981–1984 Philippot E, Belaïd Y, Belaïd A (2010) Bayesian networks learning algorithms for online form classification. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 1981–1984
30.
go back to reference Piwowarski B, Denoyer L, Gallinari P (20002) Un modle pour la recherche d’information sur des documents structurs. In: Journes internationales d’Analyse statistique des Donnes Textuelles (JADT) Piwowarski B, Denoyer L, Gallinari P (20002) Un modle pour la recherche d’information sur des documents structurs. In: Journes internationales d’Analyse statistique des Donnes Textuelles (JADT)
31.
go back to reference Russell SJ, Norvig P (2003) Artificial Intelligence: a modern approach. Pearson Education Russell SJ, Norvig P (2003) Artificial Intelligence: a modern approach. Pearson Education
32.
go back to reference Santosh KC, Nattee C, Lamiroy B (2012) Relative positioning of stroke-based clustering: a new approach to online handwritten devanagari character recognition. Int J Image Graphics 12(2):1250016CrossRefMathSciNet Santosh KC, Nattee C, Lamiroy B (2012) Relative positioning of stroke-based clustering: a new approach to online handwritten devanagari character recognition. Int J Image Graphics 12(2):1250016CrossRefMathSciNet
33.
go back to reference Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, pp 1–47 Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, pp 1–47
34.
go back to reference Souafi-Bensafi S, Parizeau M, Lebourgeois F, Emptoz H (2002) Bayesian networks classifiers applied to documents. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 483 Souafi-Bensafi S, Parizeau M, Lebourgeois F, Emptoz H (2002) Bayesian networks classifiers applied to documents. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 483
35.
go back to reference Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. The MIT Press, Cambridge, second edition Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. The MIT Press, Cambridge, second edition
36.
go back to reference Subrahmanya N, Shin Y (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619CrossRef Subrahmanya N, Shin Y (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619CrossRef
37.
go back to reference Tran DC, Franco P, Ogier J-M (2010) Form recognition from ink strokes on tablet. In: Proceedings of International Workshop on Document Analysis Systems, pp 293–300 Tran DC, Franco P, Ogier J-M (2010) Form recognition from ink strokes on tablet. In: Proceedings of International Workshop on Document Analysis Systems, pp 293–300
38.
go back to reference Verron S, Tiplica T, Kobi A (2007) Multivariate control charts with a bayesian network. In: ICINCO-ICSO, pp 228–233 Verron S, Tiplica T, Kobi A (2007) Multivariate control charts with a bayesian network. In: ICINCO-ICSO, pp 228–233
39.
go back to reference Wang X.-Z., He Y.-L., Wang D. (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef Wang X.-Z., He Y.-L., Wang D. (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef
40.
go back to reference Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24CrossRefMATH Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24CrossRefMATH
41.
go back to reference Weissenbacher D (2006) Bayesian network, a model for nlp? In: Eleventh Conference of the European Chapter of the Association for Computational Linguistics EACL, pp 195–198 Weissenbacher D (2006) Bayesian network, a model for nlp? In: Eleventh Conference of the European Chapter of the Association for Computational Linguistics EACL, pp 195–198
42.
go back to reference Weissenbacher D, Nazarenko A (2011) Understand the effects of erroneous annotations produced by nlp pipelines, a case study on the pronominal anaphora resolution. Traitement Automatique des Langues 52(1):161–185 Weissenbacher D, Nazarenko A (2011) Understand the effects of erroneous annotations produced by nlp pipelines, a case study on the pronominal anaphora resolution. Traitement Automatique des Langues 52(1):161–185
43.
go back to reference Wong ML, Leung KS (2004) An efficient data mining method for learning bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4):378–404CrossRef Wong ML, Leung KS (2004) An efficient data mining method for learning bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4):378–404CrossRef
Metadata
Title
Bayesian networks for incomplete data analysis in form processing
Authors
Emilie Philippot
K. C. Santosh
Abdel Belaïd
Yolande Belaïd
Publication date
01-06-2015
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 3/2015
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-014-0234-4

Other articles of this Issue 3/2015

International Journal of Machine Learning and Cybernetics 3/2015 Go to the issue