Skip to main content
Erschienen in: Datenbank-Spektrum 2/2012

01.07.2012 | Schwerpunktbeitrag

OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data

verfasst von: Katrin Braunschweig, Julian Eberius, Maik Thiele, Wolfgang Lehner

Erschienen in: Datenbank-Spektrum | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.
On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches.
In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Bergamaschi S, Domnori E, Guerra F, Trillo Lado R, Velegrakis Y (2011) Keyword search over relational databases: a metadata approach. In: Proceedings of the 2011 international conference on management of data (SIGMOD ’11), pp 565–576 Bergamaschi S, Domnori E, Guerra F, Trillo Lado R, Velegrakis Y (2011) Keyword search over relational databases: a metadata approach. In: Proceedings of the 2011 international conference on management of data (SIGMOD ’11), pp 565–576
2.
Zurück zum Zitat Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semantic Web Inf Syst 5(3) Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semantic Web Inf Syst 5(3)
3.
Zurück zum Zitat Blunschi L, Dittrich PJ, Girard OR, Kirakos S, Marcos K, Salles AV (2007) A dataspace odyssey: the iMeMex personal dataspace management system. In: CIDR, pp 114–119 Blunschi L, Dittrich PJ, Girard OR, Kirakos S, Marcos K, Salles AV (2007) A dataspace odyssey: the iMeMex personal dataspace management system. In: CIDR, pp 114–119
4.
Zurück zum Zitat Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Poggi A, Rodriguez-Muro M, Rosati R, Ruzzi M, Savo DF (2011) The MASTRO system for ontology-based data access. J Web Semant 2:43–53 Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Poggi A, Rodriguez-Muro M, Rosati R, Ruzzi M, Savo DF (2011) The MASTRO system for ontology-based data access. J Web Semant 2:43–53
5.
Zurück zum Zitat Chiticariu L, Hernández MA, Kolaitis PG, Popa L (2007) Semi-automatic schema integration in Clio. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 1326–1329 Chiticariu L, Hernández MA, Kolaitis PG, Popa L (2007) Semi-automatic schema integration in Clio. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 1326–1329
6.
Zurück zum Zitat Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th anniversary meeting of the association for computational linguistics (ACL’02) Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th anniversary meeting of the association for computational linguistics (ACL’02)
8.
Zurück zum Zitat Finin T, Murnane W, Karandikar A, Keller N, Martineau J, Dredze M (2010) Annotating named entities in twitter data with crowdsourcing. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (CSLDAMT ’10). Association for Computational Linguistics, Stroudsburg, pp 80–88. http://dl.acm.org/citation.cfm?id=1866696.1866709 Finin T, Murnane W, Karandikar A, Keller N, Martineau J, Dredze M (2010) Annotating named entities in twitter data with crowdsourcing. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (CSLDAMT ’10). Association for Computational Linguistics, Stroudsburg, pp 80–88. http://​dl.​acm.​org/​citation.​cfm?​id=​1866696.​1866709
9.
Zurück zum Zitat Franklin M, Halevy A, Maier D (2005) From databases to dataspaces: a new abstraction for information management. SIGMOD Rec 34:27–33 CrossRef Franklin M, Halevy A, Maier D (2005) From databases to dataspaces: a new abstraction for information management. SIGMOD Rec 34:27–33 CrossRef
11.
Zurück zum Zitat Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases (VLDB ’02), pp 670–681 CrossRef Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases (VLDB ’02), pp 670–681 CrossRef
12.
Zurück zum Zitat Lawson N, Eustice K, Perkowitz M, Yetisgen-Yildiz M (2010) Annotating large email datasets for named entity recognition with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (CSLDAMT ’10). Association for Computational Linguistics, Stroudsburg, pp 71–79. http://dl.acm.org/citation.cfm?id=1866696.1866708 Lawson N, Eustice K, Perkowitz M, Yetisgen-Yildiz M (2010) Annotating large email datasets for named entity recognition with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (CSLDAMT ’10). Association for Computational Linguistics, Stroudsburg, pp 71–79. http://​dl.​acm.​org/​citation.​cfm?​id=​1866696.​1866708
13.
Zurück zum Zitat Madhavan J, Cohen S, Dong XL, Halevy AY, Jeffery SR, Ko D, Yu C (2007) Web-scale data integration: you can afford to pay as you go. In: CIDR, pp 342–350 Madhavan J, Cohen S, Dong XL, Halevy AY, Jeffery SR, Ko D, Yu C (2007) Web-scale data integration: you can afford to pay as you go. In: CIDR, pp 342–350
14.
Zurück zum Zitat Oleson D, Sorokin A, Laughlin GP, Hester V, Le J, Biewald L (2011) Programmatic gold: targeted and scalable quality assurance in crowdsourcing. In: Human computation Oleson D, Sorokin A, Laughlin GP, Hester V, Le J, Biewald L (2011) Programmatic gold: targeted and scalable quality assurance in crowdsourcing. In: Human computation
15.
Zurück zum Zitat Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10:334–350 MATHCrossRef Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10:334–350 MATHCrossRef
17.
Zurück zum Zitat Vaz Salles MA, Dittrich JP, Karakashian SK, Girard OR, Blunschi L (2007) iTrails: pay-as-you-go information integration in dataspaces. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 663–674 Vaz Salles MA, Dittrich JP, Karakashian SK, Girard OR, Blunschi L (2007) iTrails: pay-as-you-go information integration in dataspaces. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 663–674
Metadaten
Titel
OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data
verfasst von
Katrin Braunschweig
Julian Eberius
Maik Thiele
Wolfgang Lehner
Publikationsdatum
01.07.2012
Verlag
Springer-Verlag
Erschienen in
Datenbank-Spektrum / Ausgabe 2/2012
Print ISSN: 1618-2162
Elektronische ISSN: 1610-1995
DOI
https://doi.org/10.1007/s13222-012-0091-9

Weitere Artikel der Ausgabe 2/2012

Datenbank-Spektrum 2/2012 Zur Ausgabe

Dissertationen

Dissertationen

Editorial

Editorial

Premium Partner