Skip to main content

2016 | OriginalPaper | Buchkapitel

Searching Web 2.0 Data Through Entity-Based Aggregation

verfasst von : Ekaterini Ioannou, Yannis Velegrakis

Erschienen in: Transactions on Computational Collective Intelligence XXI

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Entity-based searching has been introduced as a way of allowing users and applications to retrieve information about a specific real world object such as a person, an event, or a location. Recent advances in crawling, information extraction, and data exchange technologies have brought a new era in data management, typically referred to through the term Web 2.0. Entity searching over Web 2.0 data facilitates the retrieval of relevant information from the plethora of data available in semantic and social web applications.
Effective entity searching over a variety of sources requires the integration of the different pieces of information that refer to the same real world entity. Entity-based aggregation of Web 2.0 data is an effective mechanism towards this direction. Adopting the suggestions of the Linked Data movement, aggregators are able to efficiently match and merge the data that refer to the same real world object.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aizawa, A., Oyama, K.: A fast linkage detection scheme for multi-source information integration. In: WIRI, pp. 30–39 (2005) Aizawa, A., Oyama, K.: A fast linkage detection scheme for multi-source information integration. In: WIRI, pp. 30–39 (2005)
2.
Zurück zum Zitat Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. PVLDB 1(1), 230–244 (2008) Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. PVLDB 1(1), 230–244 (2008)
3.
Zurück zum Zitat Amer-Yahia, S., Markl, V., Halevy, A.Y., Doan, A., Alonso, G., Kossmann, D., Weikum, G.: Databases and Web 2.0 panel at VLDB 2007. SIGMOD Rec. 37, 49–52 (2008)CrossRef Amer-Yahia, S., Markl, V., Halevy, A.Y., Doan, A., Alonso, G., Kossmann, D., Weikum, G.: Databases and Web 2.0 panel at VLDB 2007. SIGMOD Rec. 37, 49–52 (2008)CrossRef
4.
Zurück zum Zitat Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: VLDB (2002) Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: VLDB (2002)
5.
Zurück zum Zitat Bhattacharya, I., Getoor, L.: Deduplication and group detection using links. In: LinkKDD (2004) Bhattacharya, I., Getoor, L.: Deduplication and group detection using links. In: LinkKDD (2004)
6.
Zurück zum Zitat Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)CrossRef Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)CrossRef
7.
Zurück zum Zitat Cohen, W.: Data integration using similarity joins and a word-based information representation language. TOIS 18(3), 288–321 (2000)CrossRef Cohen, W.: Data integration using similarity joins and a word-based information representation language. TOIS 18(3), 288–321 (2000)CrossRef
8.
Zurück zum Zitat Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IIWeb Co-located with IJCAI, pp. 73–78 (2003) Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IIWeb Co-located with IJCAI, pp. 73–78 (2003)
9.
Zurück zum Zitat Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26, 83–94 (2005) Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26, 83–94 (2005)
10.
Zurück zum Zitat Doan, A., Lu, Y., Lee, Y., Han, J.: Object matching for information integration: a profiler-based approach. In: IIWeb Co-located with IJCAI, pp. 53–58 (2003) Doan, A., Lu, Y., Lee, Y., Han, J.: Object matching for information integration: a profiler-based approach. In: IIWeb Co-located with IJCAI, pp. 53–58 (2003)
11.
Zurück zum Zitat Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference, pp. 85–96 (2005) Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference, pp. 85–96 (2005)
12.
Zurück zum Zitat Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. TKDE 19, 1–16 (2007) Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. TKDE 19, 1–16 (2007)
13.
Zurück zum Zitat Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)CrossRef Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)CrossRef
14.
Zurück zum Zitat Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. J. Data Semant. 7(3), 46–76 (2011) Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. J. Data Semant. 7(3), 46–76 (2011)
15.
Zurück zum Zitat Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. 7, 3–12 (2005)CrossRef Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. 7, 3–12 (2005)CrossRef
16.
Zurück zum Zitat Ioannou, E., Garofalakis, M.: Query analytics over probabilistic databases with unmerged duplicates. TKDE 27(8), 2245–2260 (2015) Ioannou, E., Garofalakis, M.: Query analytics over probabilistic databases with unmerged duplicates. TKDE 27(8), 2245–2260 (2015)
17.
Zurück zum Zitat Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1), 429–438 (2010) Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1), 429–438 (2010)
18.
Zurück zum Zitat Ioannou, E., Niederée, C., Nejdl, W.: Probabilistic entity linkage for heterogeneous information spaces. In: Bellahsène, Z., Léonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 556–570. Springer, Heidelberg (2008)CrossRef Ioannou, E., Niederée, C., Nejdl, W.: Probabilistic entity linkage for heterogeneous information spaces. In: Bellahsène, Z., Léonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 556–570. Springer, Heidelberg (2008)CrossRef
19.
Zurück zum Zitat Ioannou, E., Niederée, C., Velegrakis, Y.: Enabling entity-based aggregators for web 2.0 data. In: WWW, pp. 1119–1120 (2010) Ioannou, E., Niederée, C., Velegrakis, Y.: Enabling entity-based aggregators for web 2.0 data. In: WWW, pp. 1119–1120 (2010)
20.
Zurück zum Zitat Ioannou, E., Sathe, S., Bonvin, N., Jain, A., Bondalapati, S., Skobeltsyn, G., Niederée, C., Miklos, Z.: Entity search with Necessity. In: WebDB (2009) Ioannou, E., Sathe, S., Bonvin, N., Jain, A., Bondalapati, S., Skobeltsyn, G., Niederée, C., Miklos, Z.: Entity search with Necessity. In: WebDB (2009)
21.
Zurück zum Zitat Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in practice. In: VLDB, pp. 1078–1086 (2004) Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in practice. In: VLDB, pp. 1078–1086 (2004)
22.
Zurück zum Zitat McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD, pp. 169–178 (2000) McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD, pp. 169–178 (2000)
23.
Zurück zum Zitat Miklós, Z., et al.: From Web data to entities and back. In: Pernici, B. (ed.) CAiSE 2010. LNCS, vol. 6051, pp. 302–316. Springer, Heidelberg (2010)CrossRef Miklós, Z., et al.: From Web data to entities and back. In: Pernici, B. (ed.) CAiSE 2010. LNCS, vol. 6051, pp. 302–316. Springer, Heidelberg (2010)CrossRef
24.
Zurück zum Zitat On, B.W., Koudas, N., Lee, D., Srivastava, D.: Group linkage. In: ICDE (2007) On, B.W., Koudas, N., Lee, D., Srivastava, D.: Group linkage. In: ICDE (2007)
25.
Zurück zum Zitat Papadakis, G., Ioannou, E., Niederée, C., Fankhauser, P.: Efficient entity resolution for large heterogeneous information spaces. In: WSDM, pp. 535–544 (2011) Papadakis, G., Ioannou, E., Niederée, C., Fankhauser, P.: Efficient entity resolution for large heterogeneous information spaces. In: WSDM, pp. 535–544 (2011)
26.
Zurück zum Zitat Papadakis, G., Ioannou, E., Niederée, C., Palpanas, T., Nejdl, W.: Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data. In: WSDM, pp. 53–62 (2012) Papadakis, G., Ioannou, E., Niederée, C., Palpanas, T., Nejdl, W.: Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data. In: WSDM, pp. 53–62 (2012)
27.
Zurück zum Zitat Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. PVLDB 4(4), 208–218 (2011) Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. PVLDB 4(4), 208–218 (2011)
28.
Zurück zum Zitat Shen, W., DeRose, P., Vu, L., Doan, A., Ramakrishnan, R.: Source-aware entity matching: a compositional approach. In: ICDE, pp. 196–205 (2007) Shen, W., DeRose, P., Vu, L., Doan, A., Ramakrishnan, R.: Source-aware entity matching: a compositional approach. In: ICDE, pp. 196–205 (2007)
29.
Zurück zum Zitat Staworko, S., Ioannou, E.: Management of inconsistencies in data integration. In: Data Exchange, Integration, and Streams, pp. 217–225 (2013) Staworko, S., Ioannou, E.: Management of inconsistencies in data integration. In: Data Exchange, Integration, and Streams, pp. 217–225 (2013)
30.
Zurück zum Zitat Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: KDD (2002) Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: KDD (2002)
31.
Zurück zum Zitat Whang, S., Menestrina, D., Koutrika, G., Theobald, M., Garcia-Molina, H.: Entity resolution with iterative blocking. In: SIGMOD Conference, pp. 219–232 (2009) Whang, S., Menestrina, D., Koutrika, G., Theobald, M., Garcia-Molina, H.: Entity resolution with iterative blocking. In: SIGMOD Conference, pp. 219–232 (2009)
Metadaten
Titel
Searching Web 2.0 Data Through Entity-Based Aggregation
verfasst von
Ekaterini Ioannou
Yannis Velegrakis
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-49521-6_7