Skip to main content

2017 | OriginalPaper | Buchkapitel

Distributed Representations for Words on Tables

verfasst von : Minoru Yoshida, Kazuyuki Matsumoto, Kenji Kita

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider a problem of word embedding for tables, and we obtain distributed representations for words found in tables. We propose a table word-embedding method, which considers both horizontal and vertical relations between cells to estimate appropriate word embedding for words in tables. We propose objective functions that make use of horizontal and vertical relations, both individually and jointly.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The total number of tables found in the corpus was 255,039.
 
2
In our data set, 266 (93.7%) out of 284 randomly sampled tables were row-wise.
 
3
In this research, we ignore tables that have no attribute names. Although this strategy can cause noise in the set of attribute vectors, the effects of such noise are small, because values are of many types and their frequency is relatively lower than that of the attributes.
 
4
The original paper of word2vec derived this objective by maximizing the probability of a word appearing in the given contexts, but here we ignore these derivations and consider only the following objectives as merely the score function for the purpose of obtaining word-embedding vectors.
 
5
In addition, note that only two of these four terms are used for each (wz) pair, rendering the SGD implementation for this model nearly the same as that of word2vec.
 
6
Although two (the first and second) terms are used for the word w and its vertical context word c, we can differentiate each term independently because there are no vectors appearing both of the terms, thus we can use the iteration method similar to that of word2vec.
 
7
Note that as a result, the size of the similarity and analogy task queries was reduced to 445 and 5,124, respectively.
 
Literatur
1.
Zurück zum Zitat Bollegala, D., Alsuhaibani, M., Maehara, T., Kawarabayashi, K.I.: Joint word representation learning using a corpus and a semantic lexicon. In: Proceedings of AAAI 2016, pp. 2690–2696 (2016) Bollegala, D., Alsuhaibani, M., Maehara, T., Kawarabayashi, K.I.: Joint word representation learning using a corpus and a semantic lexicon. In: Proceedings of AAAI 2016, pp. 2690–2696 (2016)
2.
Zurück zum Zitat Bollegala, D., Maehara, T., Yoshida, Y., Kawarabayashi, K.I.: Learning word representations from relational graphs. In: Proceedings of AAAI 2015, pp. 2146–2152 (2015) Bollegala, D., Maehara, T., Yoshida, Y., Kawarabayashi, K.I.: Learning word representations from relational graphs. In: Proceedings of AAAI 2015, pp. 2146–2152 (2015)
3.
Zurück zum Zitat Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)CrossRef Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)CrossRef
4.
Zurück zum Zitat Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, Burlington (2002) Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, Burlington (2002)
5.
Zurück zum Zitat Embley, D., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2), 66–86 (2006)CrossRef Embley, D., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2), 66–86 (2006)CrossRef
6.
Zurück zum Zitat Ji, S., Satish, N., Li, S., Dubey, P.: Parallelizing word2vec in shared and distributed memory. CoRR abs/ 1604.04661 (2016) Ji, S., Satish, N., Li, S., Dubey, P.: Parallelizing word2vec in shared and distributed memory. CoRR abs/ 1604.04661 (2016)
7.
Zurück zum Zitat Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1), 1338–1347 (2010)CrossRef Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1), 1338–1347 (2010)CrossRef
8.
Zurück zum Zitat Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of AAAI 2015, pp. 2181–2187 (2015) Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of AAAI 2015, pp. 2181–2187 (2015)
9.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119 (2013)
10.
Zurück zum Zitat Munoz, E., Hogan, A., Mileo, A.: Triplifying Wikipedia’s tables. In: Proceedings of the ISWC 2013 Workshop on Linked Data for Information Extraction (2013) Munoz, E., Hogan, A., Mileo, A.: Triplifying Wikipedia’s tables. In: Proceedings of the ISWC 2013 Workshop on Linked Data for Information Extraction (2013)
11.
Zurück zum Zitat Neelakantan, A., Roth, B., McCallum, A.: Compositional vector space models for knowledge base completion. In: Proceedings of ACL 2015, pp. 156–166 (2015) Neelakantan, A., Roth, B., McCallum, A.: Compositional vector space models for knowledge base completion. In: Proceedings of ACL 2015, pp. 156–166 (2015)
12.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
13.
Zurück zum Zitat Pimplikar, R., Sarawagi, S.: Answering table queries on the web using column keywords. Proc. VLDB Endowment 5(10), 908–919 (2012)CrossRef Pimplikar, R., Sarawagi, S.: Answering table queries on the web using column keywords. Proc. VLDB Endowment 5(10), 908–919 (2012)CrossRef
14.
Zurück zum Zitat Recht, B., Re, C., Wright, S.J., Niu, F.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of NIPS 2011, pp. 693–701 (2011) Recht, B., Re, C., Wright, S.J., Niu, F.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of NIPS 2011, pp. 693–701 (2011)
15.
Zurück zum Zitat Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of EMNLP 2015, pp. 1499–1509 (2015) Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of EMNLP 2015, pp. 1499–1509 (2015)
16.
Zurück zum Zitat Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI 2014, pp. 1112–1119 (2014) Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI 2014, pp. 1112–1119 (2014)
17.
Zurück zum Zitat Yin, P., Lu, Z., Li, H., Kao, B.: Neural enquirer: learning to query tables in natural language. In: Proceedings of IJCAI 2016, pp. 2308–2314 (2016) Yin, P., Lu, Z., Li, H., Kao, B.: Neural enquirer: learning to query tables in natural language. In: Proceedings of IJCAI 2016, pp. 2308–2314 (2016)
18.
Zurück zum Zitat Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recogn. 7(1), 1–16 (2004)CrossRef Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recogn. 7(1), 1–16 (2004)CrossRef
Metadaten
Titel
Distributed Representations for Words on Tables
verfasst von
Minoru Yoshida
Kazuyuki Matsumoto
Kenji Kita
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-57454-7_11