Top

International Journal of Machine Learning and Cybernetics

Published in:

01-06-2015 | Original Article

Bayesian networks for incomplete data analysis in form processing

Authors: Emilie Philippot, K. C. Santosh, Abdel Belaïd, Yolande Belaïd

Published in: International Journal of Machine Learning and Cybernetics | Issue 3/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we study Bayesian network (BN) for form identification based on partially filled fields. It uses electronic ink-tracing files without having any information about form structure. Given a form format, the ink-tracing files are used to build the BN by providing the possible relationships between corresponding fields using conditional probabilities, that goes from individual fields up to the complete model construction. To simplify the BN, we sub-divide a single form into three different areas: header, body and footer, and integrate them together, where we study three fundamental BN learning algorithms: Naive, Peter & Clark and maximum weighted spanning tree. Under this framework, we validate it with a real-world industrial problem i.e., electronic note-taking in form processing. The approach provides satisfactory results, attesting the interest of BN for exploiting the incomplete form analysis problems, in particular.

next article Synchronization of a class of memristive neural networks with time delays via sampled-data control

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Belaïd A (2001) Recognition of table of contents for electronic library consulting. Int J Document Analysis Recogn 4(1):35–45CrossRef

Cho S-J, Kim JH (2003) Bayesian network modeling of hangul characters for on-line handwriting recognition. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 207–2011

Chow C, Liu, C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467CrossRefMATHMathSciNet

Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the International Conference on Machine Learning, pp 233–240. ACM

Denoyer L, Gallinari P (2004) Bayesian network model for semi-structured document classification. Inf Process Manage 40(5):807–827CrossRef

François O, Leray P (2006) Learning the tree augmented naive bayes classifier from incomplete datasets. In: Proceedings of European Workshop on Probabilistic Graphical Models, pp 91–98

Friedman N, Goldszmidt M (1996) Building classifiers using bayesian networks. In Proceedings of the national conference on Artificial intelligence, vol 2 AAAI’96, pp 1277–1284

Friedman N, Geiger D, Goldszmidt M, Provan G, Langley P, Smyth P (1997) Bayesian network classifiers. 29:131–163

Hallouli K, Likforman-Sulem L, Sigelle M, Sigelle M (2002) A comparative study between decision fusion and data fusion in markovian printed character recognition. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 147–150

10.

He Y-L, Wang R, Kwong S, Wang X-Z (2014) Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis. Inf Sci 259:252–268CrossRefMathSciNet

11.

Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011a) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241

12.

Hirayama J, Shinjo H, Takahashi T, Nagasaki T (2011b) Development of template-free form recognition system. In: Proceedings of the IAPR International Conference on Document Analysis and Recognition, pp 237–241

13.

Jensen FV (1996) Introduction to Bayesian Networks. Springer, New York, Inc., Secaucus, NJ, 1st edition

14.

Jensen FV, Lauritzen SL, Olesen KG (1990) Bayesian updating in causal probabilistic networks by local computations. Comput Stat Q 4:269–282MathSciNet

15.

Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. In: Zhou L, Ooi B, Meng X (eds.) Database Systems for Advanced Applications volume 3453 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 688–698.

16.

Jiang L, Wang D, Cai Z (2007) Scaling up the accuracy of bayesian network classifiers by m-estimate. In: Huang D-S, Heutte L, Loog M (eds.) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence volume 4682 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg, 475–484

17.

Jiang L, Zhang H, Cai Z (2009) A novel bayes model: Hidden naive bayes. IEEE Trans Knowledge Data Eng 21(10):1361–1371CrossRef

18.

Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive bayes and its application in text classification. Int J Artif Intell Tools 21(1):1250007

19.

Jiang L, Cai Z, Wang D, Zhang H (2013) Bayesian citation-knn with distance weighting. Int J Mach Learn Cybernetics. doi:10.1007/s13042-013-0152-x

20.

Kebairi S, Taconet B, Zahour A, Ramdane S (1998) A statistical method for an automatic detection of form types. In Proceedings of International Workshop on Document Analysis Systems, pp 84–98

21.

Keogh E, Pazzani M (1999) Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230

22.

Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: AAAI, pp 223–228

23.

Likforman-Sulem L, Sigelle M (2008) Recognition of degraded characters using dynamic bayesian networks. Pattern Recogn 41(10):3092–3103CrossRefMATH

24.

Likforman-Sulem L, Sigelle M (2009) Combination of dynamic bayesian network classifiers for the recognition of degraded characters. In: Proceedings of the SPIE International Symposium on Document Recognition and Retrieval, pp 1–10

25.

Mahjoub MA, Jayech K (2010) ndexation de structures de documents par rseaux baysiens, pp 163–178

26.

Naïm PW, Leray POP, Becker A (2007) Réseaux bayésiens. Eyrolles

27.

Neapolitan R (2004) Learning Bayesian Networks. Prentice Hall, Upper Saddle River

28.

Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco

29.

Philippot E, Belaïd Y, Belaïd A (2010) Bayesian networks learning algorithms for online form classification. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 1981–1984

30.

Piwowarski B, Denoyer L, Gallinari P (20002) Un modle pour la recherche d’information sur des documents structurs. In: Journes internationales d’Analyse statistique des Donnes Textuelles (JADT)

31.

Russell SJ, Norvig P (2003) Artificial Intelligence: a modern approach. Pearson Education

32.

Santosh KC, Nattee C, Lamiroy B (2012) Relative positioning of stroke-based clustering: a new approach to online handwritten devanagari character recognition. Int J Image Graphics 12(2):1250016CrossRefMathSciNet

33.

Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, pp 1–47

34.

Souafi-Bensafi S, Parizeau M, Lebourgeois F, Emptoz H (2002) Bayesian networks classifiers applied to documents. In: Proceedings of the IAPR International Conference on Pattern Recognition, pp 483

35.

Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. The MIT Press, Cambridge, second edition

36.

Subrahmanya N, Shin Y (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619CrossRef

37.

Tran DC, Franco P, Ogier J-M (2010) Form recognition from ink strokes on tablet. In: Proceedings of International Workshop on Document Analysis Systems, pp 293–300

38.

Verron S, Tiplica T, Kobi A (2007) Multivariate control charts with a bayesian network. In: ICINCO-ICSO, pp 228–233

39.

Wang X.-Z., He Y.-L., Wang D. (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef

40.

Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24CrossRefMATH

41.

Weissenbacher D (2006) Bayesian network, a model for nlp? In: Eleventh Conference of the European Chapter of the Association for Computational Linguistics EACL, pp 195–198

42.

Weissenbacher D, Nazarenko A (2011) Understand the effects of erroneous annotations produced by nlp pipelines, a case study on the pronominal anaphora resolution. Traitement Automatique des Langues 52(1):161–185

43.

Wong ML, Leung KS (2004) An efficient data mining method for learning bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4):378–404CrossRef

Title: Bayesian networks for incomplete data analysis in form processing
Authors: Emilie Philippot
K. C. Santosh
Abdel Belaïd
Yolande Belaïd
Publication date: 01-06-2015
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 3/2015
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-014-0234-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 3/2015

A kind of approximations of generalized rough set model

Identification of boundary shape using a hybrid approach

Computational reasoning based on complemented distributive lattices

Linear discriminant analysis for the small sample size problem: an overview

A data-driven study for evaluating fineness of cement by various predictors

Pricing electric power options by maximizing the utility of investment wealth with fuzzy measures