Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 3/2015

01.06.2015 | Original Article

Bayesian networks for incomplete data analysis in form processing

verfasst von: Emilie Philippot, K. C. Santosh, Abdel Belaïd, Yolande Belaïd

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we study Bayesian network (BN) for form identification based on partially filled fields. It uses electronic ink-tracing files without having any information about form structure. Given a form format, the ink-tracing files are used to build the BN by providing the possible relationships between corresponding fields using conditional probabilities, that goes from individual fields up to the complete model construction. To simplify the BN, we sub-divide a single form into three different areas: header, body and footer, and integrate them together, where we study three fundamental BN learning algorithms: Naive, Peter & Clark and maximum weighted spanning tree. Under this framework, we validate it with a real-world industrial problem i.e., electronic note-taking in form processing. The approach provides satisfactory results, attesting the interest of BN for exploiting the incomplete form analysis problems, in particular.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Belaïd A (2001) Recognition of table of contents for electronic library consulting. Int J Document Analysis Recogn 4(1):35–45CrossRef Belaïd A (2001) Recognition of table of contents for electronic library consulting. Int J Document Analysis Recogn 4(1):35–45CrossRef
2.
Zurück zum Zitat Cho S-J, Kim JH (2003) Bayesian network modeling of hangul characters for on-line handwriting recognition. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 207–2011 Cho S-J, Kim JH (2003) Bayesian network modeling of hangul characters for on-line handwriting recognition. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 207–2011
3.
Zurück zum Zitat Chow C, Liu, C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467CrossRefMATHMathSciNet Chow C, Liu, C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467CrossRefMATHMathSciNet
4.
Zurück zum Zitat Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the International Conference on Machine Learning, pp 233–240. ACM Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the International Conference on Machine Learning, pp 233–240. ACM
5.
Zurück zum Zitat Denoyer L, Gallinari P (2004) Bayesian network model for semi-structured document classification. Inf Process Manage 40(5):807–827CrossRef Denoyer L, Gallinari P (2004) Bayesian network model for semi-structured document classification. Inf Process Manage 40(5):807–827CrossRef
6.
Zurück zum Zitat François O, Leray P (2006) Learning the tree augmented naive bayes classifier from incomplete datasets. In: Proceedings of European Workshop on Probabilistic Graphical Models, pp 91–98 François O, Leray P (2006) Learning the tree augmented naive bayes classifier from incomplete datasets. In: Proceedings of European Workshop on Probabilistic Graphical Models, pp 91–98
7.
Zurück zum Zitat Friedman N, Goldszmidt M (1996) Building classifiers using bayesian networks. In Proceedings of the national conference on Artificial intelligence, vol 2 AAAI’96, pp 1277–1284 Friedman N, Goldszmidt M (1996) Building classifiers using bayesian networks. In Proceedings of the national conference on Artificial intelligence, vol 2 AAAI’96, pp 1277–1284
8.
Zurück zum Zitat Friedman N, Geiger D, Goldszmidt M, Provan G, Langley P, Smyth P (1997) Bayesian network classifiers. 29:131–163 Friedman N, Geiger D, Goldszmidt M, Provan G, Langley P, Smyth P (1997) Bayesian network classifiers. 29:131–163
9.
Zurück zum Zitat Hallouli K, Likforman-Sulem L, Sigelle M, Sigelle M (2002) A comparative study between decision fusion and data fusion in markovian printed character recognition. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 147–150 Hallouli K, Likforman-Sulem L, Sigelle M, Sigelle M (2002) A comparative study between decision fusion and data fusion in markovian printed character recognition. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 147–150
10.
Zurück zum Zitat He Y-L, Wang R, Kwong S, Wang X-Z (2014) Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis. Inf Sci 259:252–268CrossRefMathSciNet He Y-L, Wang R, Kwong S, Wang X-Z (2014) Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis. Inf Sci 259:252–268CrossRefMathSciNet
11.
Zurück zum Zitat Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011a) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241 Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011a) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241
12.
Zurück zum Zitat Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011b) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241 Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011b) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241
13.
Zurück zum Zitat Jensen FV (1996) Introduction to Bayesian Networks. Springer, New York, Inc., Secaucus, NJ, 1st edition Jensen FV (1996) Introduction to Bayesian Networks. Springer, New York, Inc., Secaucus, NJ, 1st edition
14.
Zurück zum Zitat Jensen FV, Lauritzen SL, Olesen KG (1990) Bayesian updating in causal probabilistic networks by local computations. Comput Stat Q 4:269–282MathSciNet Jensen FV, Lauritzen SL, Olesen KG (1990) Bayesian updating in causal probabilistic networks by local computations. Comput Stat Q 4:269–282MathSciNet
15.
Zurück zum Zitat Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. In: Zhou L, Ooi B, Meng X (eds.) Database Systems for Advanced Applications volume 3453 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 688–698. Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. In: Zhou L, Ooi B, Meng X (eds.) Database Systems for Advanced Applications volume 3453 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 688–698.
16.
Zurück zum Zitat Jiang L, Wang D, Cai Z (2007) Scaling up the accuracy of bayesian network classifiers by m-estimate. In: Huang D-S, Heutte L, Loog M (eds.) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence volume 4682 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg, 475–484 Jiang L, Wang D, Cai Z (2007) Scaling up the accuracy of bayesian network classifiers by m-estimate. In: Huang D-S, Heutte L, Loog M (eds.) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence volume 4682 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg, 475–484
17.
Zurück zum Zitat Jiang L, Zhang H, Cai Z (2009) A novel bayes model: Hidden naive bayes. IEEE Trans Knowledge Data Eng 21(10):1361–1371CrossRef Jiang L, Zhang H, Cai Z (2009) A novel bayes model: Hidden naive bayes. IEEE Trans Knowledge Data Eng 21(10):1361–1371CrossRef
18.
Zurück zum Zitat Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive bayes and its application in text classification. Int J Artif Intell Tools 21(1):1250007 Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive bayes and its application in text classification. Int J Artif Intell Tools 21(1):1250007
20.
Zurück zum Zitat Kebairi S, Taconet B, Zahour A, Ramdane S (1998) A statistical method for an automatic detection of form types. In Proceedings of International Workshop on Document Analysis Systems, pp 84–98 Kebairi S, Taconet B, Zahour A, Ramdane S (1998) A statistical method for an automatic detection of form types. In Proceedings of International Workshop on Document Analysis Systems, pp 84–98
21.
Zurück zum Zitat Keogh E, Pazzani M (1999) Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230 Keogh E, Pazzani M (1999) Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230
22.
Zurück zum Zitat Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: AAAI, pp 223–228 Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: AAAI, pp 223–228
23.
Zurück zum Zitat Likforman-Sulem L, Sigelle M (2008) Recognition of degraded characters using dynamic bayesian networks. Pattern Recogn 41(10):3092–3103CrossRefMATH Likforman-Sulem L, Sigelle M (2008) Recognition of degraded characters using dynamic bayesian networks. Pattern Recogn 41(10):3092–3103CrossRefMATH
24.
Zurück zum Zitat Likforman-Sulem L, Sigelle M (2009) Combination of dynamic bayesian network classifiers for the recognition of degraded characters. In: Proceedings of the SPIE International Symposium on Document Recognition and Retrieval, pp 1–10 Likforman-Sulem L, Sigelle M (2009) Combination of dynamic bayesian network classifiers for the recognition of degraded characters. In: Proceedings of the SPIE International Symposium on Document Recognition and Retrieval, pp 1–10
25.
Zurück zum Zitat Mahjoub MA, Jayech K (2010) ndexation de structures de documents par rseaux baysiens, pp 163–178 Mahjoub MA, Jayech K (2010) ndexation de structures de documents par rseaux baysiens, pp 163–178
26.
Zurück zum Zitat Naïm PW, Leray POP, Becker A (2007) Réseaux bayésiens. Eyrolles Naïm PW, Leray POP, Becker A (2007) Réseaux bayésiens. Eyrolles
27.
Zurück zum Zitat Neapolitan R (2004) Learning Bayesian Networks. Prentice Hall, Upper Saddle River Neapolitan R (2004) Learning Bayesian Networks. Prentice Hall, Upper Saddle River
28.
Zurück zum Zitat Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco
29.
Zurück zum Zitat Philippot E, Belaïd Y, Belaïd A (2010) Bayesian networks learning algorithms for online form classification. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 1981–1984 Philippot E, Belaïd Y, Belaïd A (2010) Bayesian networks learning algorithms for online form classification. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 1981–1984
30.
Zurück zum Zitat Piwowarski B, Denoyer L, Gallinari P (20002) Un modle pour la recherche d’information sur des documents structurs. In: Journes internationales d’Analyse statistique des Donnes Textuelles (JADT) Piwowarski B, Denoyer L, Gallinari P (20002) Un modle pour la recherche d’information sur des documents structurs. In: Journes internationales d’Analyse statistique des Donnes Textuelles (JADT)
31.
Zurück zum Zitat Russell SJ, Norvig P (2003) Artificial Intelligence: a modern approach. Pearson Education Russell SJ, Norvig P (2003) Artificial Intelligence: a modern approach. Pearson Education
32.
Zurück zum Zitat Santosh KC, Nattee C, Lamiroy B (2012) Relative positioning of stroke-based clustering: a new approach to online handwritten devanagari character recognition. Int J Image Graphics 12(2):1250016CrossRefMathSciNet Santosh KC, Nattee C, Lamiroy B (2012) Relative positioning of stroke-based clustering: a new approach to online handwritten devanagari character recognition. Int J Image Graphics 12(2):1250016CrossRefMathSciNet
33.
Zurück zum Zitat Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, pp 1–47 Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, pp 1–47
34.
Zurück zum Zitat Souafi-Bensafi S, Parizeau M, Lebourgeois F, Emptoz H (2002) Bayesian networks classifiers applied to documents. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 483 Souafi-Bensafi S, Parizeau M, Lebourgeois F, Emptoz H (2002) Bayesian networks classifiers applied to documents. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 483
35.
Zurück zum Zitat Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. The MIT Press, Cambridge, second edition Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. The MIT Press, Cambridge, second edition
36.
Zurück zum Zitat Subrahmanya N, Shin Y (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619CrossRef Subrahmanya N, Shin Y (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619CrossRef
37.
Zurück zum Zitat Tran DC, Franco P, Ogier J-M (2010) Form recognition from ink strokes on tablet. In: Proceedings of International Workshop on Document Analysis Systems, pp 293–300 Tran DC, Franco P, Ogier J-M (2010) Form recognition from ink strokes on tablet. In: Proceedings of International Workshop on Document Analysis Systems, pp 293–300
38.
Zurück zum Zitat Verron S, Tiplica T, Kobi A (2007) Multivariate control charts with a bayesian network. In: ICINCO-ICSO, pp 228–233 Verron S, Tiplica T, Kobi A (2007) Multivariate control charts with a bayesian network. In: ICINCO-ICSO, pp 228–233
39.
Zurück zum Zitat Wang X.-Z., He Y.-L., Wang D. (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef Wang X.-Z., He Y.-L., Wang D. (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef
40.
Zurück zum Zitat Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24CrossRefMATH Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24CrossRefMATH
41.
Zurück zum Zitat Weissenbacher D (2006) Bayesian network, a model for nlp? In: Eleventh Conference of the European Chapter of the Association for Computational Linguistics EACL, pp 195–198 Weissenbacher D (2006) Bayesian network, a model for nlp? In: Eleventh Conference of the European Chapter of the Association for Computational Linguistics EACL, pp 195–198
42.
Zurück zum Zitat Weissenbacher D, Nazarenko A (2011) Understand the effects of erroneous annotations produced by nlp pipelines, a case study on the pronominal anaphora resolution. Traitement Automatique des Langues 52(1):161–185 Weissenbacher D, Nazarenko A (2011) Understand the effects of erroneous annotations produced by nlp pipelines, a case study on the pronominal anaphora resolution. Traitement Automatique des Langues 52(1):161–185
43.
Zurück zum Zitat Wong ML, Leung KS (2004) An efficient data mining method for learning bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4):378–404CrossRef Wong ML, Leung KS (2004) An efficient data mining method for learning bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4):378–404CrossRef
Metadaten
Titel
Bayesian networks for incomplete data analysis in form processing
verfasst von
Emilie Philippot
K. C. Santosh
Abdel Belaïd
Yolande Belaïd
Publikationsdatum
01.06.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 3/2015
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-014-0234-4

Weitere Artikel der Ausgabe 3/2015

International Journal of Machine Learning and Cybernetics 3/2015 Zur Ausgabe

Neuer Inhalt