Skip to main content

2016 | OriginalPaper | Buchkapitel

A Word Prediction Methodology Based on Posgrams

verfasst von : Carmelo Spiccia, Agnese Augello, Giovanni Pilato

Erschienen in: Knowledge Discovery, Knowledge Engineering and Knowledge Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work introduces a two steps methodology for the prediction of missing words in incomplete sentences. In a first step the number of candidate words is restricted to the ones fulfilling the predicted part of speech; to this aim a novel algorithm based on “posgrams” analysis is also proposed. Then, in a second step, a word prediction algorithm is applied on the reduced words set. The work quantifies the advantages in predicting a word part of speech before predicting the word itself, in terms of accuracy and execution time. The methodology can be applied in several tasks, such as Text Autocompletion, Speech Recognition and Optical Text Recognition.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Witten, I.H., Cleary, J.G., Darragh, J.J.: The reactive keyboard: a new technology for text entry (1983) Witten, I.H., Cleary, J.G., Darragh, J.J.: The reactive keyboard: a new technology for text entry (1983)
2.
Zurück zum Zitat Darragh, J.J., Witten, I.H., James, M.L.: The reactive keyboard: a predictive typing aid. Computer 23(11), 41–49 (1990)CrossRef Darragh, J.J., Witten, I.H., James, M.L.: The reactive keyboard: a predictive typing aid. Computer 23(11), 41–49 (1990)CrossRef
3.
Zurück zum Zitat Carlberger, A., Carlberger, J., Magnuson, T., Hunnicutt, S., Palazuelos-Cagigas, S.E., Navarro, S.A.: Profet, a new generation of word prediction: an evaluation study. In: Proceedings, ACL Workshop on Natural Language Processing for Communication Aids, pp. 23–28 (1997) Carlberger, A., Carlberger, J., Magnuson, T., Hunnicutt, S., Palazuelos-Cagigas, S.E., Navarro, S.A.: Profet, a new generation of word prediction: an evaluation study. In: Proceedings, ACL Workshop on Natural Language Processing for Communication Aids, pp. 23–28 (1997)
4.
Zurück zum Zitat Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)MathSciNetCrossRefMATH Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)MathSciNetCrossRefMATH
5.
Zurück zum Zitat Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Sig. Process. 35(3), 400–401 (1987)CrossRef Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Sig. Process. 35(3), 400–401 (1987)CrossRef
6.
Zurück zum Zitat Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice (1980) Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice (1980)
7.
Zurück zum Zitat Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 181–184 (1995) Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 181–184 (1995)
8.
Zurück zum Zitat Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 1–4 (2006) Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 1–4 (2006)
9.
Zurück zum Zitat Zweig, G., Burges, C.J.C.: The Microsoft Research Sentence Completion Challenge. Microsoft Research Technical report, MSR-TR-2011-129 (2011) Zweig, G., Burges, C.J.C.: The Microsoft Research Sentence Completion Challenge. Microsoft Research Technical report, MSR-TR-2011-129 (2011)
10.
Zurück zum Zitat Gubbins, J., Vlachos, A.: Dependency language models for sentence completion. In: EMNLP, pp. 1405–1410 (2013) Gubbins, J., Vlachos, A.: Dependency language models for sentence completion. In: EMNLP, pp. 1405–1410 (2013)
11.
Zurück zum Zitat Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)CrossRef Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)CrossRef
12.
Zurück zum Zitat Spiccia, C., Augello, A., Pilato, G., Vassallo, G.: A word prediction methodology for automatic sentence completion. In: 2015 IEEE International Conference on Semantic Computing (ICSC), pp. 240–243 (2015) Spiccia, C., Augello, A., Pilato, G., Vassallo, G.: A word prediction methodology for automatic sentence completion. In: 2015 IEEE International Conference on Semantic Computing (ICSC), pp. 240–243 (2015)
13.
Zurück zum Zitat Agostaro, F., Pilato, G., Vassallo, G., Gaglio, S.: A sub-symbolic approach to word modelling for domain specific speech recognition. In: Proceedings, IEEE 7th International Workshop on Computer Architecture for Machine Perception (CAMP), pp. 321–326 (2005) Agostaro, F., Pilato, G., Vassallo, G., Gaglio, S.: A sub-symbolic approach to word modelling for domain specific speech recognition. In: Proceedings, IEEE 7th International Workshop on Computer Architecture for Machine Perception (CAMP), pp. 321–326 (2005)
14.
Zurück zum Zitat Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol. 194, pp. 137–186. Springer, Heidelberg (2006)CrossRef Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol. 194, pp. 137–186. Springer, Heidelberg (2006)CrossRef
15.
Zurück zum Zitat Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint (2012). arXiv:1206.6426 Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint (2012). arXiv:​1206.​6426
16.
Zurück zum Zitat Pachitariu, M., Sahani, M.: Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint (2013). arXiv:1301.5650 Pachitariu, M., Sahani, M.: Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint (2013). arXiv:​1301.​5650
17.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781 Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:​1301.​3781
18.
Zurück zum Zitat Kučera, F., Kučera, H.: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). Brown University (1979) Kučera, F., Kučera, H.: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). Brown University (1979)
19.
Zurück zum Zitat Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)CrossRef Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)CrossRef
21.
Zurück zum Zitat Calzolari, N., McNaught, J., Zampolli, A.: EAGLES Final Report: EAGLES Editors’ Introduction. EAG-EB-EI, Pisa (1996) Calzolari, N., McNaught, J., Zampolli, A.: EAGLES Final Report: EAGLES Editors’ Introduction. EAG-EB-EI, Pisa (1996)
23.
Zurück zum Zitat Stubbs, M.: An example of frequent English phraseology: distributions, structures and functions. Lang. Comput. 62(1), 89–105 (2007) Stubbs, M.: An example of frequent English phraseology: distributions, structures and functions. Lang. Comput. 62(1), 89–105 (2007)
24.
Zurück zum Zitat Lindquist, H.: Corpus Linguistics and the Description of English, pp. 102–103. Edinburg University Press, Edinburgh (2009) Lindquist, H.: Corpus Linguistics and the Description of English, pp. 102–103. Edinburg University Press, Edinburgh (2009)
25.
Zurück zum Zitat Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V.: The PAISA corpus of Italian web texts. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 36–43 (2014) Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V.: The PAISA corpus of Italian web texts. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 36–43 (2014)
26.
Zurück zum Zitat Doyle, A.C.: The adventures of Sherlock Holmes. Gutenberg Project, EBook #1661, Edition 12 (2002) Doyle, A.C.: The adventures of Sherlock Holmes. Gutenberg Project, EBook #1661, Edition 12 (2002)
Metadaten
Titel
A Word Prediction Methodology Based on Posgrams
verfasst von
Carmelo Spiccia
Agnese Augello
Giovanni Pilato
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-52758-1_9

Neuer Inhalt