Skip to main content
main-content

Über dieses Buch

This book constitutes the proceedings of the Second Joint International Semantic Technology Conference, JIST 2012, held in Nara, Japan, in December 2012. The 20 full papers and 13 short papers included in this volume were carefully reviewed and selected from 90 submissions. The regular papers deal with ontology and description logics; RDF and SPARQL; learning and discovery; semantic search; knowledge building; semantic Web application. The in-use track papers cover topics on social semantic Web and semantic search; and the special track papers have linked data in practice and database integration as a topic.

Inhaltsverzeichnis

Frontmatter

Regular Paper Track

Ontology and Description Logics

A Resolution Procedure for Description Logics with Nominal Schemas

We present a polynomial resolution-based decision procedure for the recently introduced description logic

$\mathcal{ELHOV}_{n}(\sqcap)$

, which features nominal schemas as new language construct. Our algorithm is based on ordered resolution and positive superposition, together with a lifting lemma. In contrast to previous work on resolution for description logics, we have to overcome the fact that

$\mathcal{ELHOV}_{n}(\sqcap)$

does not allow for a normalization resulting in clauses of globally limited size.

Cong Wang, Pascal Hitzler

Get My Pizza Right: Repairing Missing is-a Relations in ${\cal ALC}$ Ontologies

With the increased use of ontologies in semantically-enabled applications, the issue of debugging defects in ontologies has become increasingly important. These defects can lead to wrong or incomplete results for the applications. Debugging consists of the phases of detection and repairing. In this paper we focus on the repairing phase of a particular kind of defects, i.e. the missing relations in the is-a hierarchy. Previous work has dealt with the case of taxonomies. In this work we extend the scope to deal with

${\cal ALC}$

ontologies that can be represented using acyclic terminologies. We present algorithms and discuss a system.

Patrick Lambrix, Zlatan Dragisic, Valentina Ivanova

Ontological Modeling of Interoperable Abnormal States

Exchanging huge volumes of data is a common concern in various fields. One issue has been the difficulty of cross-domain sharing of knowledge because of its highly heterogeneous nature. We constructed an ontological model of abnormal states from the generic to domain-specific level. We propose a unified form to describe an abnormal state as a "property", and then divide it into an "attribute" and a "value" in a qualitative form. This approach promotes interoperability and flexibility of quantitative raw data, qualitative information, and generic/abstract knowledge. By developing an

is-a

hierarchal tree and combining causal chains of diseases, 17,000 abnormal states from 6000 diseases can be captured as generic causal relations and are reusable across 12 medical departments.

Yuki Yamagata, Hiroko Kou, Kouji Kozaki, Riichiro Mizoguchi, Takeshi Imai, Kazuhiko Ohe

RDF and SPARQL

SkyPackage: From Finding Items to Finding a Skyline of Packages on the Semantic Web

Enabling complex querying paradigms over the wealth of available Semantic Web data will significantly impact the relevance and adoption of Semantic Web technologies in a broad range of domains. While the current predominant paradigm is to retrieve a list of items, in many cases the actual intent is satisfied by reviewing the lists and assembling compatible items into lists or packages of resources such that each package collectively satisfies the need, such as assembling different collections of places to visit during a vacation. Users may place constraints on individual items, and the compatibility of items within a package is based on global constraints placed on packages, like total distance or time to travel between locations in a package. Finding such packages using the traditional item-querying model requires users to review lists of possible multiple queries and assemble and compare packages manually.

In this paper, we propose three algorithms for supporting such a package query model as a first class paradigm. Since package constraints may involve multiple criteria, several competing packages are possible. Therefore, we propose the idea of computing a skyline of package results as an extension to a popular query model for multi-criteria decision-making called skyline queries, which to date has only focused on computing item skylines. We formalize the semantics of the logical query operator,

Sky-Package

, and propose three algorithms for the physical operator implementation. A comparative evaluation of the algorithms over real world and synthetic-benchmark RDF datasets is provided.

Matthew Sessoms, Kemafor Anyanwu

Accessing Relational Data on the Web with SparqlMap

The vast majority of the structured data of our age is stored in relational databases. In order to link and integrate this data on the Web, it is of paramount importance to make relational data available according to the RDF data model and associated serializations. In this article we present

SparqlMap

, a SPARQL-to-SQL rewriter based on the specifications of the W3C R2RML working group. The rationale is to enable SPARQL querying on existing relational databases by rewriting a SPARQL query to exactly one corresponding SQL query based on mapping definitions expressed in R2RML. The SparqlMap process of rewriting a query on a mapping comprises the three steps (1) mapping candidate selection, (2) query translation, and (3) query execution. We showcase our SparqlMap implementation and benchmark data that demonstrates that SparqlMap outperforms the current state-of-the-art.

Jörg Unbehauen, Claus Stadler, Sören Auer

Protect Your RDF Data!

The explosion of digital content and the heterogeneity of enterprise content sources have pushed existing data integration solutions to their boundaries. Although RDF can be used as a representation format for integrated data, enterprises have been slow to adopt this technology. One of the primary inhibitors to its widespread adoption in industry is the lack of fine grained access control enforcement mechanisms available for RDF. In this paper, we provide a summary of access control requirements based on our analysis of existing access control models and enforcement mechanisms. We subsequently: (i) propose a set of access control rules that can be used to provide support for these models over RDF data; (ii) detail a framework that enforces access control restrictions over RDF data; and (iii) evaluate our implementation of the framework over real-world enterprise data.

Sabrina Kirrane, Nuno Lopes, Alessandra Mileo, Stefan Decker

Learning and Discovery

Active Learning of Domain-Specific Distances for Link Discovery

Discovering cross-knowledge-base links is of central importance for manifold tasks across the Linked Data Web. So far, learning link specifications has been addressed by approaches that rely on standard similarity and distance measures such as the Levenshtein distance for strings and the Euclidean distance for numeric values. While these approaches have been shown to perform well, the use of standard similarity measure still hampers their accuracy, as several link discovery tasks can only be solved sub-optimally when relying on standard measures. In this paper, we address this drawback by presenting a novel approach to learning string similarity measures concurrently across multiple dimensions directly from labeled data. Our approach is based on learning linear classifiers which rely on learned edit distance within an active learning setting. By using this combination of paradigms, we can ensure that we reduce the labeling burden on the experts at hand while achieving superior results on datasets for which edit distances are useful. We evaluate our approach on three different real datasets and show that our approach can improve the accuracy of classifiers. We also discuss how our approach can be extended to other similarity and distance measures as well as different classifiers.

Tommaso Soru, Axel-Cyrille Ngonga Ngomo

Interlinking Linked Data Sources Using a Domain-Independent System

Linked data interlinking is the discovery of every

owl:sameAs

links between given data sources. An

owl:sameAs

link declares the homogeneous relation between two instances that co-refer to the same real-world object. Traditional methods compare two instances by predefined pairs of RDF predicates, and therefore they rely on the domain of the data. Recently, researchers have attempted to achieve the domain-independent goal by automatically building the linkage rules. However they still require the human curation for the labeled data as the input for learning process. In this paper, we present SLINT+, an interlinking system that is training-free and domain-independent. SLINT+ finds the important predicates of each data sources and combines them to form predicate alignments. The most useful alignments are then selected in the consideration of their confidence. Finally, SLINT+ uses selected predicate alignments as the guide for generating candidate and matching instances. Experimental results show that our system is very efficient when interlinking data sources in 119 different domains. The very considerable improvements on both precision and recall against recent systems are also reported.

Khai Nguyen, Ryutaro Ichise, Bac Le

Instance Coreference Resolution in Multi-ontology Linked Data Resources

Web of linked data is one of the main principles for realization of semantic web ideals. In recent years, different data providers have produced many data sources in the Linking Open Data (LOD) cloud upon different schemas. Isolated published linked data sources are not themselves so beneficial for intelligent applications and agents in the context of semantic web. It is not possible to take advantage of the linked data potential capacity without integrating various data sources. The challenge of integration is not limited to instances; rather, schema heterogeneity affects discovering instances with the same identity. In this paper we propose a novel approach, SBUEI, for instance co-reference resolution between various linked data sources even with heterogeneous schemas. For this purpose, SBUEI considers the entity co-reference resolution problem in both schema and instance levels. The process of matching is applied in both levels consecutively to let the system discover identical instances. SBUEI also applies a new approach for consolidation of linked data in instance level. After finding identical instances, SBUEI searches locally around them in order to find more instances that are equal. Experiments show that SBUEI obtains promising results with high precision and recall.

Aynaz Taheri, Mehrnoush Shamsfard

Semantic Search

The Dynamic Generation of Refining Categories in Ontology-Based Search

In the era of information revolution, the amount of digital contents is growing explosively with the advent of personal smart devices. The consumption of the digital contents makes users depend heavily on search engines to search what they want. Search requires tedious review of search results from users currently, and so alleviates it; predefined and fixed categories are provided to refine results. Since fixed categories never reflect the difference of queries and search results, they often contain insensible information. This paper proposes a method for the dynamic generation of refining categories under the ontology-based semantic search systems. It specifically suggests a measure for dynamic selection of categories and an algorithm to arrange them in an appropriate order. Finally, it proves the validity of the proposed approach by using some evaluative measures.

Yongjun Zhu, Dongkyu Jeon, Wooju Kim, June S. Hong, Myungjin Lee, Zhuguang Wen, Yanhua Cai

Keyword-Driven Resource Disambiguation over RDF Knowledge Bases

Keyword search is the most popular way to access information. In this paper we introduce a novel approach for determining the correct resources for user-supplied queries based on a hidden Markov model. In our approach the user-supplied query is modeled as the observed data and the background knowledge is used for parameter estimation. We leverage the semantic relationships between resources for computing the parameter estimations. In this approach, query segmentation and resource disambiguation are mutually tightly interwoven. First, an initial set of potential segments is obtained leveraging the underlying knowledge base; then, the final correct set of segments is determined after the most likely resource mapping was computed. While linguistic analysis (e.g. named entity, multi-word unit recognition and POS-tagging) fail in the case of keyword-based queries, we will show that our statistical approach is robust with regard to query expression variance. Our experimental results reveal very promising results.

Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer

An Automated Template Selection Framework for Keyword Query over Linked Data

Template-based information access, in which templates are constructed for keywords, is a recent development of linked data information retrieval. However, most such approaches suffer from ineffective template management. Because linked data has a structured data representation, we assume the data’s inside statistics can effectively influence template management. In this work, we use this influence for template creation, template ranking, and scaling. Our proposal can effectively be used for automatic linked data information retrieval and can be incorporated with other techniques such as ontology inclusion and sophisticated matching to further improve performance.

Md-Mizanur Rahoman, Ryutaro Ichise

Knowledge Building

Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud

We present a declarative approach implemented in a comprehensive open-source framework based on

DBpedia

to extract lexical-semantic resources – an ontology about language use – from

Wiktionary

. The data currently includes language, part of speech, senses, definitions, synonyms, translations and taxonomies (hyponyms, hyperonyms, synonyms, antonyms) for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions of

Wiktionary

. This is achieved by a declarative mediator/wrapper approach. The goal is to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaption of wrappers by domain experts. The extracted data is as fine granular as the source data in

Wiktionary

and additionally follows the

lemon

model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics.

Sebastian Hellmann, Jonas Brekle, Sören Auer

Navigation-Induced Knowledge Engineering by Example

A New Paradigm for Knowledge Engineering by the Masses

Knowledge Engineering is a costly, tedious and often time-consuming task, for which light-weight processes are desperately needed. In this paper, we present a new paradigm - Navigation-induced Knowledge Engineering by Example (NKE) - to address this problem by producing structured knowledge as a result of users navigating through an information system. Thereby, NKE aims to reduce the costs associated with knowledge engineering by framing it as navigation. We introduce and define the NKE paradigm and demonstrate it with a proof-of-concept prototype which creates OWL class expressions based on users navigating in a collection of resources. The overall contribution of this paper is twofold: (i) it introduces a novel paradigm for knowledge engineering and (ii) it provides evidence for its technical feasibility.

Sebastian Hellmann, Jens Lehmann, Jörg Unbehauen, Claus Stadler, Thanh Nghia Lam, Markus Strohmaier

FacetOntology: Expressive Descriptions of Facets in the Semantic Web

The formal structure of the information on the Semantic Web lends itself to faceted browsing, an information retrieval method where users can filter results based on the values of properties (“facets”). Numerous faceted browsers have been created to browse RDF and Linked Data, but these systems use their own ontologies for defining how data is queried to populate their facets. Since the source data is the same format across these systems (specifically, RDF), we can unify the different methods of describing how to query the underlying data, to enable compatibility across systems, and provide an extensible base ontology for future systems. To this end, we present FacetOntology, an ontology that defines how to query data to form a faceted browser, and a number of transformations and filters that can be applied to data before it is shown to users. FacetOntology overcomes limitations in the expressivity of existing work, by enabling the full expressivity of SPARQL when selecting data for facets. By applying a

FacetOntology definition

to data, a set of facets are specified, each with queries and filters to source RDF data, which enables faceted browsing systems to be created using that RDF data.

Daniel A. Smith, Nigel R. Shadbolt

Semantic Web Application

An Ontological Framework for Decision Support

In the last few years, ontologies have been successfully exploited by Decision Support Systems (

DSSs

) to support some phases of the decisionmaking process. In this paper, we propose to employ an ontological representation for all the content both processed and produced by a

DSS

in answering requests. This semantic representation supports the

DSS

in the whole decisionmaking process, and it is capable of encoding (i) the request, (ii) the data relevant for it, and (iii) the conclusions/suggestions/decisions produced by the

DSS

. The advantages of using an ontology-based representation of the main data structure of a

DSS

are many: (i) it enables the integration of heterogeneous sources of data available in the web, and to be processed by the

DSS

, (ii) it allows to track, and to expose in a structured form to additional services (e.g., explanation or case reuse services), all the content processed and produced by the

DSS

for each request, and (iii) it enables to exploit logical reasoning for some of the inference steps of the

DSS

decision-making process. The proposed approach have been successfully implemented and exploited in a

DSS

for personalized environmental information, developed in the context of the

PESCaDO

EU project.

Marco Rospocher, Luciano Serafini

Development of the Method for the Appropriate Selection of the Successor by Applying Metadata to the Standardization Reports and Members

In businesses and organizations, it is difficult to find the successor for various activities by considering a person’s knowledge and actual experience. In this study, we find the successor to a member of a standardization activity. By assigning metadata to profiles and annual activity reports of members engaged in standardization activities, the relationship between the profiles and the annual activity reports is described as an RDF graph and visualized with nodes and links. This paper has two objectives. Objective-1 is the development and evaluation of a method to design the best combination of search queries to discover an appropriate successor. Objective-2 is the proposal and evaluation of an easy and understandable visualization method of the successor search results obtained in objective-1. The proposed procedure nominates candidates for the successor effectively and the results are visualized in the case study.

Isaac Okada, Minoru Saito, Yoshiaki Oida, Hiroyuki Yamato, Kazuo Hiekata, Shinya Miura

A Native Approach to Semantics and Inference in Fine-Grained Documents

This paper proposes a novel approach for enhancing document excerpts with semantic structures that are treated as first-class citizens, i.e., integrated at the system level. Providing support for semantics at the system level is in contrast with existing solutions that implement semantics as an add-on using intermediate descriptors. A framework and a toolset inspired by the Semantic Web Stack have been integrated into the

Snippet System

, an operating system environment that provides support for fine-grained representation and management of documents and the relationships that exist between arbitrary excerpts. The high granularity, combined with native support for semantics, is leveraged to alleviate some of the existing personal information management problems in terms of content retrieval and document engineering. The resulting framework offers some inherent advantages in terms of refined resource description, content and knowledge reuse along with self-contained documents that retain their metadata without depending on intermediate entities.

Arash Atashpendar, Laurent Kirsch, Jean Botev, Steffen Rothkugel

A German Natural Language Interface for Semantic Search

Semantic data is the key for an efficient information retrieval. It relies on a well-defined structure and enables automated processing. Therefore, more and more ontologies are specified, extended and interlinked. By now, only the query language SPARQL provides a precise access to semantic data. Since most common users are overstrained in formulating queries, which satisfy the structure of semantic data, more search-interface approaches emerge aiming at good usability and correct answers. We implemented a Natural Language Interface (NLI), that answers questions formulated in German natural language. In order to query the domain ontology, the user query is translated into SPARQL first. Since domain-ontology resources are required for the SPARQL-query formulation, this paper introduces an approach for the identification of resources in user query. We show a path-based identification of semantically similar resources and a similarity measure. After running 100 test questions, our system achieves a precision and recall of 66%.

Irina Deines, Dirk Krechel

In-Use Track

Social Semantic Web

Topica – Profiling Locations through Social Streams

This paper presents work in interlinking social stream information with geographical spaces through the use of Linked Data technologies. The paper focuses on filtering, enriching, structuring and interlinking microposts of localised (i.e. geo-tagged) social streams (a.k.a localised forums) to profile geographical areas (e.g., cities, countries). For this purpose, we enriched social streams extracted from Twitter, Facebook and TripAdvisor and structured them into well-known vocabularies and data models, such as SIOC and SKOS. To integrate this information into a location profile we introduce the

linkedPOI

ontology. The

linkedPOI

ontology captures and leverages DBpedia categories to derive concepts which profile a geographic space.

We exemplify the use of social stream-based location profiling by means of a travel mashup case study. We introduce the

Topica Portal

, which allows users to browse geographical spaces by topic. We highlight potential impact for the future of semantic travel mashup systems.

A. E. Cano, Aba-Sah Dadzie, Grégoire Burel, Fabio Ciravegna

A Community-Driven Approach to Development of an Ontology-Based Application Management Framework

Although the semantic web standards are established, applications and uses of the data are relatively limited. This is partly due to high learning curve and efforts demanded in building semantic web and ontology-based applications. In this paper, we describe an ontology application management framework that aims to simplify creation and adoption of a semantic web application. The framework supports application development in ontology- database mapping, recommendation rule management and application templates focusing on semantic search and recommender system applications. We present some case studies that adopted our application framework in their projects. Evolution of the software tool significantly profited from the semantic web research community in Thailand who has contributed both in terms of the tool development and adoption support.

Marut Buranarach, Ye Myat Thein, Thepchai Supnithi

When Mommy Blogs Are Semantically Tagged

OWL 2-supported Semantic Tagging is a non compulsory yet decisive and highly influential component of a multidisciplinary knowledge architecture framework which synergetically combines the Semantic and the Social Webs. The facility consists of a semantic tagging layer based on OWL 2 axioms and expressions enticing social network users, typically mommy bloggers, to annotate their chaos of textual data with natural language verbalized versions of ontological elements. This paper provides a comprehensive short summary of the overall framework along with its backbone metamodel and its parenting analysis and surveillance ontology ParOnt, laying a particular emphasis on its semantic expression-based tagging feature, and accordingly highlighting the attained gains and improvements in terms of effective results, services and recommendations, all falling in the scope of public parenting orientation and awareness.

Jinan El-Hachem, Volker Haarslev

Applying Semantic Technologies to Public Sector: A Case Study in Fraud Detection

Fraudulent claims cost both the public and private sectors an enormous amount of money each year. The existence of data silos is considered one of the main barriers to cross-region, cross-department, and cross-domain data analysis that can detect abnormalities not easily seen when focusing on single data sources. An evident advantage of leveraging Linked Data and semantic technologies is the smooth integration of distributed data sets. This paper reports a proof-of-concept study in the benefit fraud detection area. We believe that the design considerations, study outcomes, and learnt lessons can help making decisions of how one should adopt semantic technologies in similar contexts.

Bo Hu, Nuno Carvalho, Loredana Laera, Vivian Lee, Takahide Matsutsuka, Roger Menday, Aisha Naseer

Semantic Search

Location-Based Concept in Activity Log Ontology for Activity Recognition in Smart Home Domain

Activity recognition plays an important role in several researches. Nevertheless, the existing researches suffer various kinds of problems when human has a different lifestyle. To address these shortcomings, this paper proposes the activity log in the context-aware infrastructure ontology in order to interlink the history user’s context and current user’s context. In this approach, the location-based concept is built into the activity log for producing the description logic (DL) rules. The relationship between activities in the same location is investigated for making the result of activity recognition more accurately. We also conduct the semantic ontology search (SOS) system for evaluating the effectiveness of our proposed ideas. The semantic data can be retrieved through SOS system, including, human activity and activity of daily living (ADL). The results from SOS system showed the advantage overcome the existing system when uses the location-based concept in activity log ontology.

Konlakorn Wongpatikaseree, Mitsuru Ikeda, Marut Buranarach, Thepchai Supnithi, Azman Osman Lim, Yasuo Tan

Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

The vision of the Semantic Web is to make use of semantic representations on the largest possible scale - the Web. Large knowledge bases such as DBpedia, OpenCyc, GovTrack are emerging and freely available as Linked Data and SPARQL endpoints. Exploring and analysing such knowledge bases is a significant hurdle for Semantic Web research and practice. As one possible direction for tackling this problem, we present an approach for obtaining complex class expressions from objects in knowledge bases by using Machine Learning techniques. We describe in detail how they leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. The algorithms are made available in the open source DL-Learner project and we present several real-life scenarios in which they can be used by Semantic Web applications. Because of the wide usage of the method in several well-known tools, we optimized and benchmarked the existing algorithms and show that we achieve an approximately 3-fold increase in speed, in addition to a more robust implementation.

Didier Cherix, Sebastian Hellmann, Jens Lehmann

DashSearch LD: Exploratory Search for Linked Data

Although a large number of datasets gathered as Linked Open Data (LOD) is better for data sharing and re-using, the datasets themselves become more difficult to understand. Since each dataset has its own data structure, we need to understand datasets individually. In addition, since the entities in datasets are interconnected, we need to understand the interconnections between datasets. In other words, understanding the data is crucial for exploiting LOD. In this paper, we show a novel system called DashSearch LD to understand and use LOD with an exploratory search approach. The user interactively explores datasets by viewing and selecting entities in the datasets. Specifically, the user manipulates widgets on the screen by moving and overlapping them with a mouse to check entities, draw detail data on them, and obtain other entities linked by the widgets.

Takayuki Goto, Hideaki Takeda, Masahiro Hamasaki

Entity-Based Semantic Search on Conversational Transcripts Semantic

Search on Hansard

This paper describes the implementation of a semantic web search engine on conversation styled transcripts. Our choice of data is Hansard, a publicly available conversation style transcript of parliamentary debates. The current search engine implementation on Hansard is limited to running search queries based on keywords or phrases hence lacks the ability to make semantic inferences from user queries. By making use of knowledge such as the relationship between members of parliament, constituencies, terms of office, as well as topics of debates the search results can be improved in terms of both relevance and coverage. Our contribution is not algorithmic instead we describe how we exploit a collection of external data sources, ontologies, semantic web vocabularies and named entity extraction in the analysis of underlying semantics of user queries as well as the semantic enrichment of the search index thereby improving the quality of results.

Obinna Onyimadu, Keiichi Nakata, Ying Wang, Tony Wilson, Kecheng Liu

Special Track

Linked Data in Practice

Development of Linked Open Data for Bioresources

The broad dissemination of information is a key issue in improving access to existing bioresources. We attempted to develop Linked Open Data (LOD) for bioresources available at the RIKEN BioResource Center. The LOD consists of standardized, structured data available openly on the World Wide Web, including published bioresource information for 5,000 mouse strains and 3,600 cell lines. The LOD includes links to publically available information, such as genes, alleles, and ontologies, providing phenotypic information through the BioLOD website. As a result, information on mouse strains and cell lines have been connected to various data items in public databases and other project-oriented databases. Thus, through the use of LOD, dispersed efforts to produce different databases can be easily combined. Through these efforts, we expect to contribute to the global improvement of access to bioresources.

Hiroshi Masuya, Terue Takatsuki, Yuko Makita, Yuko Yoshida, Yoshiki Mochizuki, Norio Kobayashi, Atsushi Yoshiki, Yukio Nakamura, Tetsuro Toyoda, Yuichi Obata

Towards a Data Hub for Biodiversity with LOD

Because of a huge variety of biological studies focused on different targets, i.e., from molecules to ecosystem, data produced and used in each field is also managed independently so that it is difficult to know the relationship among them. We aim to build a data hub with LOD to connect data in different biological fields to enhance search and use of data across the fields. We build a prototype data hub on taxonomic information on species, which is a key to retrieve data and link to databases in different fields. We also demonstrate how the data hub can be used with an application to assist search on other database.

Yoshitaka Minami, Hideaki Takeda, Fumihiro Kato, Ikki Ohmukai, Noriko Arai, Utsugi Jinbo, Motomi Ito, Satoshi Kobayashi, Shoko Kawamoto

Linking Open Data Resources for Semantic Enhancement of User–Generated Content

This paper describes our experiences in developing a Linking Open Data (LOD) resource for Taiwanese Geographic Names (LOD TGN), extracting Taiwanese place names found in Facebook posts, and linking such place names to the entries in LOD TGN. The aim of this study is to enhance the semantics of User-Generated Content (UGC) through the use of LOD resources, so that for example the content of Facebook posts can be more reusable and discoverable. 9E24QThis study actually is a development of a geospatial semantic annotation method for Facebook posts through the use of LOD resources.

Dong-Po Deng, Guan-Shuo Mai, Cheng-Hsin Hsu, Chin-Lung Chang, Tyng-Ruey Chuang, Kwang-Tsao Shao

Korean Linked Data on the Web: Text to RDF

Interlinking data coming from different sources has been a long standing goal [4] aiming to increase reusability, discoverability, and as a result the usefulness of information. Nowadays, Linked Open Data (LOD) tackles this issue in the context of semantic web. However, currently most of the web data is stored in relational databases and published as unstructured text. This triggers the need of (i) combining the current semantic technologies with relational databases; (ii) processing text integrating several NLP tools, and being able to query the outcome using the standard semantic web query language: SPARQL; and (iii) linking the outcome with the LOD cloud. The work presented here shows a solution for the needs listed above in the context of Korean language, but our approach can be adapted to other languages as well.

Martín Rezk, Jungyeul Park, Yoon Yongun, Kyungtae Lim, John Larsen, YoungGyun Hahm, Key-Sun Choi

Database Integration

Issues for Linking Geographical Open Data of GeoNames and Wikipedia

It is now possible to use various geographical open data sources such as GeoNames and Wikipedia to construct geographic information systems. In addition, these open data sources are integrated by the concept of Linked Open Data. There have been several attempts to identify links between existing data, but few studies have focused on the quality of such links. In this paper, we introduce an automatic link discovery method for identifying the correspondences between GeoNames entries and Wikipedia pages, based on Wikipedia category information. This method finds not only appropriate links but also inconsistencies between two databases. Based on this integration results, we discuss the type of inconsistencies for making consistent Linked Open Data.

Masaharu Yoshioka, Noriko Kando

Interlinking Korean Resources on the Web

LOD (Linked Open Data) is an international endeavor to interlink structured data on the Web and create the Web of Data on a global level. In this paper, we report about our experience of applying existing LOD frameworks, most of which are designed to run only in European language environments, to Korean resources to build linked data. Through the localization of Silk, we identified localized similarity measures as essential for interlinking Korean resources. Specifically, we built new algorithms to measure distance between Korean strings and to measure distance between transliterated Korean strings. A series of empirical tests have found that the new measures substantially improve the performance of Silk with high precision for matching Korean strings and with high recall for matching transliterated Korean strings. We expect the localization issues described in this paper to be applicable to many non-Western countries.

Soon Gill Hong, Saemi Jang, Young Ho Chung, Mun Yong Yi, Key-Sun Choi

Backmatter

Weitere Informationen