scroll identifier for mobile
main-content

## Über dieses Buch

DASFAA is an annual international database conference, located in the Asia-Pacific region,which show cases state-of-the-art R & D activities in databases-terms and their applications. It provides a forum for technical presentations and discussions among database researchers, developers and users from academia, business and industry. DASFAA 2015 the 20th in the series, was held during April 20-23, 2015 in Hanoi, Vietnam. In this year, we carefully selected two workshops, each focusing on specific research issues that contribute to the main themes of the DASFAA conference. This volume contains the final versions of papers accepted for the two workshops: Second International Workshop on Semantic Computing and Personalization (SeCoP 2015); Second International Workshop on Big Data Management and Service (BDMS 2015); and a Poster Session.

[All the workshops were selected via a public call-for-proposals process. The workshop organizers put a tremendous amount of effort into soliciting and - lecting papers with a balance of high quality, new ideas and new applications. We asked all workshops to follow a rigid paper selection process, including the procedure to ensure that any Program Committee members are excluded from the paper review process of any paper they are involved with. A requirement about the overall paper acceptance rate of no more than 50% was also imposed on all the workshops.]

## Inhaltsverzeichnis

### A Novel Method for Clustering Web Search Results with Wikipedia Disambiguation Pages

Organizing search results of an ambiguous query into topics can facilitate information search on the Web. In this paper, we propose a novel method to cluster search results of ambiguous query into topics about the query constructed from Wikipedia disambiguation pages (WDP). To improve the clustering result, we propose a concept filtering method to filter semantically unrelated concepts in each topic. Also, we propose the top K full relations (TKFR) algorithm to assign search results to relevant topics based on the similarities between concepts in the results and topics. Comparing with the clustering methods whose topic labels are extracted from search results, the topics of WDP which are edited by human are much more helpful for navigation. The experiment results show that our method can work for ambiguous queries with different query lengths and highly improves the clustering result of method using WDP.

Zhi Huang, Zhendong Niu, Donglei Liu, Wenjuan Niu, Wei Wang

### Integrating Opinion Leader and User Preference for Recommendation

Collaborative filtering (CF) is one of the most well-known and commonly used technology for recommender systems. However, it suffers from inherent issues such as data sparsity. Many works have been done by used additional information such as user attributes, tags and social relationships to address these problems. We proposed an algorithm named

OLrs

(Opinion Leaders for Recommender System) based on the trust relationships. Specifically, the opinion leaders who have a strong influence for the active user and an accurate evaluation of the recommend item will be identified. The prediction for a given item is generated by ratings of these opinion leaders and the active user. Experimental results based on Epinions data set demonstrated that the prediction accuracy of our method outperforms other approach.

Dong Wu, Kai Yang, Tao Wang, Weiang Luo, Huaqing Min, Yi Cai

### Learning Trend Analysis and Prediction Based on Knowledge Tracing and Regression Analysis

Estimating students’ knowledge is a fundamental and important task for student modeling in intelligent tutoring systems. Since the concept of knowledge tracing was proposed, there have been many studies focusing on estimating students’ mastery of specific knowledge components, yet few studies paid attention to the analysis and prediction on a student’s overall learning trend in the learning process. Therefore, we propose a method to analyze a student’s learning trend in the learning process and predict students’ performance in future learning. Firstly, we estimate the probability that the student has mastered the knowledge components with the model of Bayesian Knowledge Tracing, and then model students’ learning curves in the overall learning process and predict students’ future performance with Regression Analysis. Experimental results show that this method can be used to fit students’ learning trends well and can provide prediction with reference value for students’ performances in the future learning.

Yali Cai, Zhendong Niu, Yingwang Wang, Ke Niu

### Intensive Maximum Entropy Model for Sentiment Classification of Short Text

The rapid development of social media services has facilitated the communication of opinions through microblogs/tweets, instant-messages, online news, and so forth. This article concentrates on the mining of emotions evoked by short text materials. Compared to the classical sentiment analysis from long text, sentiment analysis of short text is sometimes more meaningful in social media. We propose an intensive maximum entropy model for sentiment classification, which generates the probability of sentiments conditioned to short text by employing intensive feature functions. Experimental evaluations using real-world data validate the effectiveness of the proposed model on sentiment classification of short text.

Yanghui Rao, Jun Li, Xiyun Xiang, Haoran Xie

### Maintaining Ranking Lists in Dynamic Virtual Environments

Preference queries serve for retrieving a small set of tuples with top aggregated scores over multiple features, from a large set of tuples. We consider the problem of maintaining the ranking lists of items for preference queries in dynamic virtual environments, which is very useful for avatars in virtual environments to continuously monitor interesting items surrounding them. Traditional solutions on preference queries utilize the pre-computed materialized ranking lists to efficiently find top items by only retrieving a prefix of the ranking lists. However, for preference queries in virtual environments, items (tuples) to be ranked change frequently due to the movements and updates of avatars. Creating and maintaining materialized ranking lists in such dynamic scenarios will be extremely expensive. In this paper, we address the problem by proposing a solution as a marriage of continuous range query and continuous top-k query. A preference query is continuously processed by dynamically adding and removing the perceived items of an avatar. Extensive experimental studies show that the proposed techniques are very efficient in handling the continuous updates of ranking lists.

Mingyan Teng, Denghao Ma, Xiaoyong Du

### Knowledge Communication Analysis Based on Clustering and Association Rules Mining

With the growth of knowledge sharing, an increasingly large amount of Open-Access academic resources are being stored online. This paper systematically studies the method of mining knowledge communication via Open-Access Journals. We first designed a new framework of knowledge communication analysis based on clustering and association rule mining. Then, we proposed two improved indexes named cited frequency and weighted cited frequency. Extensive evaluations using real-world data validate the effectiveness of the proposed framework of knowledge communication analysis.

Qingyuan Wu, Qi Wu, Sidi Zhao, Mingxue Wei, Fu Lee Wang

### Sentiment Detection of Short Text via Probabilistic Topic Modeling

As an important medium used to describe events, the short text is effective to convey emotions and communicate affective states. In this paper, we proposed a classification method based on probabilistic topic model, which greatly improve the performance of sentimental categorization methods on short text. To solve the problems of sparsity and context-dependency, we extract hidden topics behind the text and associate different words by the same topic. Evaluation on sentiment detection of short text verified the effectiveness of the proposed method.

Zewei Wu, Yanghui Rao, Xin Li, Jun Li, Haoran Xie, Fu Lee Wang

### Schema Matching Based on Source Codes

Schema matching is a critical step in numerous database applications, such as web data sources integrating, data warehouse loading and information exchanging among several authorities. Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we propose a new class of techniques, called schema matching based on source codes. The idea is to exploit the

exterior schema

extracted from the source codes to find semantic correspondences between attributes in the schemas to be matched. Essentially, the

exterior schema

is a schema that is used to be exposed to final users and is in the outermost shell of applications. Thus, it typically contains complete semantics of data, which is very helpful in the solution of schema matching. We present a framework for schema matching based on source codes, which includes three key components: extracting the

exterior schema

, evaluating the quality of matching and finding the optimal mapping. We also present some helpful features and rules of the source codes for the implementation of each component, and address the corresponding challenges in details.

Guohui Ding, Guoren Wang, Chunlong Fan, Shuo Chen

### A Quota-Based Energy Consumption Management Method for Organizations Using Nash Bargaining Solution

The increasing development of energy consumption monitoring system at public buildings, hospitals, campus, and factories enables more and more organizations to measure and coordinate the energy consumption activity of individual key energy users within them. To enhance the overall performance of energy consumption under limited energy budget, the present paper proposes a new criterion, i.e., energy consumption satisfaction degree (ECSD), for an organization to evaluate the satisfaction of each key energy user in the consumption of its allocated energy quota. Inspired by the classical Nash bargaining solution (NBS) in cooperative game theory, we further develop a quota-based energy consumption management method to effectively guarantee the annual energy-saving target of the organization. Numerical simulation shows that, compared with the equal and priority-based energy allocation schemes, the proposed method can maximize the overall satisfaction of all key energy users and, meanwhile, maintain a reasonable fairness among them.

Renting Liu, Xuesong Jonathan Tan

### Entity Relation Mining in Large-Scale Data

Currently, the web-based Named-Entity relationship extraction has been a new research field with a tremendous potential. The goal of web-based entity relationship extraction is to explore the relationship between a set of realistic entities. It’s a challenging research field and has a widely application value in the related fields of text mining. In this paper, we propose a newly defined framework called Snowball++ based on the traditional entity relationship extraction frameworks. In our Snowball++ framework, we focus on the many-to-many relations more than one-to-one relations. The system is also implemented in the many-to-many manner and it improves the precision and recall. It’s worth to notice that Snowball++ will assign a specific relation type to each entity-relationship pair and the whole training process only need a few manual labor. For the sake of building a efficient and scalable system, we implement the Snowball++ framework on the Hadoop platform which is a totally distributed computing system. Eventually, the experiments show that our framework and implementation are efficient and effective.

Jingnan Li, Yi Cai, Qixuan Wang, Shuyue Hu, Tao Wang, Huaqing Min

### A Collaborative Filtering Model for Personalized Retweeting Prediction

As the development of social media, the services in social media have significantly changed people’s habits of using Internet. However, as the large amount of information posted by users and the highly frequent updates in social media, users often face the problem of information overload and miss out of content that they may be interested in. Recommender systems, which recommends an item (e.g., a product, a service and a twitter etc.) to users based on their interests, is an effective technique to handle this issue. In this paper, we borrow matrix factorization model from recommender system to predict users’ behaviors of retweeting in social media. Compared with previous works, we take the relevance of users’ interests, tweets’ content, and publishers’ influence into account simultaneously. Our experimental results on a real-world dataset show that the proposed model achieves desirable performance in characterizing users’ retweeting behaviors and predicting topic diffusion in social media.

Jun Li, Jiamin Qin, Tao Wang, Yi Cai, Huaqing Min

### Finding Paraphrase Facts Based on Coordinate Relationships

We propose a method to acquire paraphrases from the Web in accordance with a given sentence. For example, consider an input sentence “Lemon is a high vitamin c fruit”. Its paraphrases are expressions or sentences that convey the same meaning but are different syntactically, such as “Lemons are rich in vitamin c”, or “Lemons contain a lot of vitamin c”. We aim at finding sentence-level paraphrases from the noisy Web, instead of domain-specific corpora. By observing search results of paraphrases, users are able to estimate the likelihood of the sentence as a fact. We evaluate the proposed method on five distinct semantic relations. Experiments show our average precision is

$$60.5\,\%$$

, compared to TE/ASE method with average precision of

$$44.15\,\%$$

. Besides, we can acquire 3 paraphrases more than TE/ASE method per input.

Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka

### Emergency Situation Awareness During Natural Disasters Using Density-Based Adaptive Spatiotemporal Clustering

With the increase in the popularity of social media as well as the emergence of easy-to-use geo-mobile applications on smartphones, a huge amount of geo-annotated data is posted on social media sites. To enhance emergency situation awareness, these geo-annotated data are expected to be used in a new medium. In particular, geotagged tweets on Twitter are used by local governments to determine the situation accurately during natural disasters. Geotagged tweets are referred to as georeferenced documents; they include not only a short text message but also the posting time and location. In this paper, we propose a new spatiotemporal analysis method for emergency situation awareness during natural disasters using

$$(\epsilon ,\tau )$$

(

ϵ

,

τ

)

-density-based adaptive spatiotemporal clustering. Such clustering can identify bursty local areas by using adaptive spatiotemporal clustering criteria considering local spatiotemporal densities. Extracting

$$(\epsilon ,\tau )$$

(

ϵ

,

τ

)

-density-based adaptive spatiotemporal clusters allows the proposed method to analyze emergency situations such as natural disasters in real time. The experimental results showed that the proposed method can analyze emergency situations related to the weather in Japan more sensitively compared with our previous method.

Tatsuhiro Sakai, Keiichi Tamura, Hajime Kitakami

### Distributed Data Managing in Health Care Social Network Based on Mobile P2P

Nowadays, more and more public health care information is being stored and transferred on Internet and mobile devices. However in developing countries, there are many rural residents could not afford the cost for commercial network. To solve the problem of being lack of a cheap and stable communication infrastructure directly between hospital servers and rural village residents’ cellphones, this system managed to leverage mobile P2P and social network to build the health care information system. Based on the open-source P2P framework Alljoyn, social network engine Elgg (and Elgg Mobile), and distributed system HBase/Hadoop, we implemented HealthSocialNet, which focuses on immunization and antenatal care with high electrical efficiency and scalability. This research on health care social network system could help poor areas in developing countries implement and deploy a low-cost personalized health care data managing system fast.

Ye Wang, Hong Liu, Lin Wang

### Survey of MOOC Related Research

MOOC expands fast in recent years so that it shows both advantages and bottlenecks. MOOC platforms try to solve their massive learning dealings and make further research on their MOOC data. E-learning research organizations combine MOOC research into their area to find better learning models of MOOC. Some MOOC related research organizations follow the frontier of MOOC. Research of MOOC is on the way.

Yanxia Pang, Min Song, YuanYuan Jin, Ying Zhang

### Modeling Large Time Series for Efficient Approximate Query Processing

Evolving customer requirements and increasing competition force business organizations to store increasing amounts of data and query them for information at any given time. Due to the current growth of data volumes, timely extraction of relevant information becomes more and more difficult with traditional methods. In addition, contemporary Decision Support Systems (DSS) favor faster approximations over slower exact results. Generally speaking, processes that require exchange of data become inefficient when connection bandwidth does not increase as fast as the volume of data. In order to tackle these issues, compression techniques have been introduced in many areas of data processing. In this paper, we outline a new system that does not query complete datasets but instead utilizes models to extract the requested information. For time series data we use Fourier and Cosine transformations and piece-wise aggregation to derive the models. These models are initially created from the original data and are kept in the database along with it. Subsequent queries are answered using the stored models rather than scanning and processing the original datasets. In order to support model query processing, we maintain query statistics derived from experiments and when running the system. Our approach can also reduce communication load by exchanging models instead of data. To allow seamless integration of model-based querying into traditional data warehouses, we introduce a SQL compatible query terminology. Our experiments show that querying models is up to 80 % faster than querying over the raw data while retaining a high accuracy.

Kasun S. Perera, Martin Hahmann, Wolfgang Lehner, Torben Bach Pedersen, Christian Thomsen

### Personalized User Value Model and Its Application

With the increase of telecom services, it is becoming increasingly difficult for either an individual customer or a business to find his/her suitable one. To facilitate telecom service providers to recommend suitable telecom services to customers, we propose a five-dimension user value model in this paper. Our model is evaluated by a very large real-life data set. The experimental results show that the precision of our model outperform the baseline model by 2 %.

Gang Yu, Zhiyan Wang, Yi Cai

### Flexible Aggregation on Heterogeneous Information Networks

With the advent of heterogeneous information networks that consist of multi-type, interconnected nodes, such as bibliographic networks and knowledge graphs, it is important to study flexible aggregation in such networks. In this paper, we investigate the flexible aggregation problem on heterogeneous information networks, which is defined on multi-type of nodes and relations. We develop an efficient heuristic algorithm for aggregation in two phases: informational aggregation and structural aggregation. Extensive experiments on real world data sets demonstrate the effectiveness and efficiency of the proposed algorithm.

Dan Yin, Hong Gao, Zhaonian Zou, Xianmin Liu, Jianzhong Li

### Discovering Organized POI Groups in a City

With the development of urban modernization, a great number of hot spots, such as buildings, business streets and shopping malls, scatter over the city which have a great influence on people’s lives and modern civilization. All of these hot spots consist of a set of point of interests (POIs). In this paper, we propose a new concept, i.e., Organized POI Group (OPG) and present a method to find them out. In addition, we classify the OPGs as three categories: building, street and village, according to their features.

Yanxia Xu, Guanfeng Liu, Hongzhi Yin, Jiajie Xu, Kai Zheng, Lei Zhao

### Multi-roles Affiliation Model for General User Profiling

Online social networks release user attributes, which is important for many applications. Due to the sparsity of such user attributes online, many works focus on profiling user attributes automatically. However, in order to profile a specific user attribute, an unique model is built and such model usually does not fit other profiling tasks. In our work, we design a novel, flexible general user profiling model which naturally models users’ friendships with user attributes. Experiments show that our method simultaneously profile multiple attributes with better performance.

Lizi Liao, Heyan Huang, Yashen Wang

### Needle in a Haystack: Max/Min Online Aggregation in the Cloud

As the development of social network, mobile Internet, etc., an increasing amount of data are being generated, which beyond the processing ability of traditional data management tools. In many real-life applications, users can accept approximate answers accompanied by accuracy guarantees. One of the most commonly used approaches of approximate query processing is online aggregation. Most existing work of online aggregation in the cloud focuses on the aggregation functions such as Count, Sum and Avg, while there is little work on the Max/Min online aggregation in the cloud now. In this paper, we measure the accuracy of Max/Min online aggregation by using quantile which is deduced by Chebyshev’s inequality and central limit theorem. We implement our methods in a cloud online aggregation system called COLA and the experimental results demonstrate our method can deliver reasonable online Max/Min estimates within an acceptable time period.

Xiang Ci, Fengming Wang, Xiaofeng Meng

### FFD-Index: An Efficient Indexing Scheme for Star Subgraph Matching on Large RDF Graphs

Subgraph matching, a basic SPARQL operation, is known to be NP-complete. Coupled with the rapidly increasing volumes of RDF data, it makes efficient graph query processing a very challenging problem. In this paper, we tackle the important problem of efficient processing of star-shaped subgraph matching queries, which are a core SPARQL query pattern and usually lead to a number of costly join operations. We present a novel method to encode a star-shaped subgraph into a bit string and an indexing mechanism to improve the query answering performance, called FFD-index. Our extensive evaluation shows that FFD-index and the corresponding algorithms are effective in solving star-shaped graph queries and they significantly outperform the state-of-the-art SPARQL query engine RDF-3X.

Xuedong Lyu, Xin Wang, Yuan-Fang Li, Zhiyong Feng

### Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

Conventional approaches to gender classification much rely on a large scale of labeled data, which is normally hard and expensive to obtain. In this paper, we propose a co-training approach to address this problem in gender classification. Specifically, we employ both non-interactive and interactive texts, i.e., the

message

and

comment

texts, as two different views in our co-training approach to well incorporate unlabeled data. Experimental results on a large data set from micro-blog demonstrate the appropriateness of leveraging interactive knowledge in gender classification and the effectiveness of the proposed co-training approach in gender classification.

Jingjing Wang, Yunxia Xue, Shoushan Li, Guodong Zhou

### Interactive Gender Inference in Social Media

In this paper, we define a novel task named interactive gender inference, which aims to utilize interactive text to identify the genders of two interactive users. To address this task, we propose a two stage approach by well incorporating the dependency among the interactive samples sharing identical users. Specifically, we first apply a standard four-category classification algorithm to get a preliminary result, and then propose a global optimization algorithm to achieve better performance. Evaluation demonstrates the effectiveness of our proposed approach to interactive gender inference.

Zhu Zhu, Jingjing Wang, Shoushan Li, Guodong Zhou

### Joint Sentiment and Emotion Classification with Integer Linear Programming

As two foundational tasks in sentiment analysis, sentiment classification and emotion classification have been considered separately and studied independently in the literature. In this paper, we put forward an Integer Linear Programming (ILP)-driven joint learning approach to leveraging the relationship between these two tasks. Empirical study verifies the appropriateness and effectiveness of our proposed approach to joint sentiment and emotion classification.

Rong Wang, Shoushan Li, Guodong Zhou, Hanxiao Shi

### Mining Wikileaks Data to Identify Sentiment Polarities in International Relationships

The infamous Wikileaks cables are a large-scale resource for analyzing international relationships. We use sentiment analysis on this dataset to extract opinion polarities in the international scenario. We use an unsupervised approach based on standard sentiment lexicon with modifiers to mine opinion polarities among the cables to and from embassies/consulates of USA. Sharp changes in opinion polarities are mapped to international events happening around the time of the cable at the location of the embassy/consulate, and a positive/negative correlation is drawn. The dataset consists of 232,410 cables from 1966 up to October 2009 concerning 272 embassies and consulates across the world. The top 28 of the spikes/dips in polarity changes coming from 20 embassies/consulates are then evaluated. Our results show that there is a strong correlation (76 %) between our findings and sentiments surrounding actual events. For example, our study was able to correctly identify suicide terrorist attacks outside the American embassy in Casablanca. It could also highlight a cable that referred to a terrorist who was later arrested in New Delhi possessing secret documents related to Indian Army.

Arpit Jain, Arnab Bhattacharya

### Extracting Indoor Spatial Objects from CAD Models: A Database Approach

With the increasing development of indoor positioning technologies such as Wifi and RFID, indoor location based services (LBSs) has been a hot topic in recent years. Differing from GPS-based outdoor LBSs, we lack sufficient indoor maps which are the foundation of indoor LBSs. In this paper, we present a database approach to extract indoor spatial objects, e.g., rooms and doors, from CAD models, and then transform them into an indoor moving-object database. With this mechanism, we are able to efficiently generate indoor maps and support indoor-space queries. In addition, we implement a prototype system to demonstrate the feasibility of our proposal. It shows that our approach has a high precision on extracting indoor spatial objects and can support indoor spatial queries effectively.

Dazhou Xu, Peiquan Jin, Xiaoxiang Zhang, Jiang Du, Lihua Yue

### Incremental Class Discriminant Analysis on Interval-Valued Emitter Signal Parameters

Emitter signal parameter analysis has been widely recognized as one crucial task for communication, electronic reconnaissance and radar intelligence analysis. However, the parameter measurements are characteristic of uncertainty in the form of intervals. In addition, the measurements are typically accumulated continuously. Existing machine learning methods for interval-valued data are unfit in such a case as they generally assume a uniform distribution and are restricted to static data analysis. To address the above problems, we bring forward an incremental class discriminant analysis method on interval-valued emitter signal parameters. Experimental results have validated its effectiveness.

Xin Xu, Wei Wang, Jiaheng Lu

### Visualization Tool for Boundary Image Matching Based on Time-Series Data

In this paper, we propose a visualization tool for boundary image matching based on time-series matching techniques. The proposed tool works as a client-server model that first converts boundary images to time-series data and then exploits efficient time-series matching techniques. The client shows the matching results with different charts or graphs to provide various viewpoints. The server efficiently performs the time-series matching using the multidimensional index, which supports a huge number of image time-series. By adapting visualization techniques on time-series matching, we can easily and intuitively understand the boundary matching results as well as time-series matching results. In particular, our polar chart, which represents 1-D time-series to 2-D boundary images, may give a strong intuition of understanding various trends of time-series data. We provide five different visualization methods, and we believe that those methods will be very helpful to understand the matching results of image time-series.

Seongwoo Moon, Sanghun Lee, Bum-Soo Kim, Yang-Sae Moon

### Performance Analysis of Hadoop-Based SQL and NoSQL for Processing Log Data

Recently, many companies and research organizations are seeking scalable solutions by using Hadoop ecosystems. The log data management with large-scale and real-time properties is one of the appropriate application on top of Hadoop. In this paper, we focus on SQL and NoSQL choices for building Hadoop-based log data management system. For this purpose, we first select major products supporting SQL and NoSQL, and we then present an appropriate scheme for each product by considering its own characteristics. All the schema are for real-time monitoring and analyzing the log data. For each product, we implement insertion and selection operations of log data in Hadoop, and we analyze the performance of these operation. Analysis results show that MariaDB and MongoDB are fast in the insertion, and PostgreSQL and HBase are fast in the selection. We believe that our evaluation results will be very helpful for users to choose Hadoop SQL and NoSQL products for handling large-scale and real-time log data.

Siwoon Son, Myeong-Seon Gil, Yang-Sae Moon, Hee-Sun Won

### SVIS: Large Scale Video Data Ingestion into Big Data Platform

Utilizing big data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big data platform is most demanding but challenging problem. However, existing data loading or ingesting tools either lack of video ingestion capability or cannot handle such huge volume of video data. In this paper, we present

SVIS

, a highly scalable and extendable video data ingestion system which can fast ingest different kinds of video source into centralized big data stores.

SVIS

embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested data will have desired formats (i.e. structured data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine,

SVIS

can be dynamically scaled out to handle large scale online camera streams and intensive ingestion jobs.

SVIS

is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and data sink. Experimental results show that

SVIS

system has high efficiency and good scalability.

Xiaoyan Guo, Yu Cao, Jun Tao

### A $$^3$$ 3 SAR: Context-Aware Spatial Augmented Reality for Anywhere, Anyone, and Analysis

Internet and camera equipped mobile devices with versatile capabilities and inexpensive costs make it possible for a Spatial Augmented Reality (SAR) platform. In general, the SAR is to enhance the sensing of the real world by combining overlay data on top of the view. It supports applications in medical management, urban projects, and online gaming etc. Nevertheless, existing systems are mostly focus on visualizing formatted information on mobile devices. They are short in exploiting the profound nature of the reality. In this paper, we propose a novel context-aware platform, called A

$$^3$$

3

SAR, which augments the realities under three contexts: Anywhere Augmentation, Anyone Augmentation, and Analysis Augmentation. A

$$^3$$

3

SAR aims at seamlessly integrating the virtual and real worlds by incorporating emerging technologies from different dimensions. For the Anywhere Augmentation, the overlay is constructed based on the semantics extracted from websites with geospatial information (e.g., upcoming shuffles at a bus station, or the history of an antique in museum). For the Anybody Augmentation, the overlay is built according to users preferences and profiles (e.g., for a piano, visualizing music for a player, but visualizing maintenance instructions for a tuner). More than just loading pre-existing information, the Analysis Augmentation also provides analytical data dynamically (e.g., visualizing the most endangered spots in a fire accident). However, challenges rise in several aspects: (1) efficiency for handling concurrent service requests, especially analytical tasks; (2) overlay accuracy regarding noisy information; (3) semantic extraction from heterogeneous sources. We propose a series of technical solutions: we design an intelligent engine for efficient analytical overlays; we improve the calibration accuracy by addressing spatial imprecision; we tackle the heterogeneous modeling problem by considering a semantic web based solution. The real and the virtual, two worlds in parallel, have intersected at SAR, and are converged within A

$$^3$$

3

SAR.

Benjin Mei, Dehai Liu, Xike Xie, Jinchuan Chen, Xiaoyong Du

### Semi-supervised Clustering Method for Multi-density Data

Finding clusters is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. However, they are depending on two specified parameters (

Eps

and

Minpts

) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density-based clustering method for multi-density data. Experimental results confirm that the proposed method gives better results than other semi-supervised and unsupervised clustering algorithms.

Walid Atwa, Kan Li

### Towards a Pattern-Based Query Language for Document Databases

Document databases are becoming popular, but how to present complex document query to obtain useful information from the document remains an important topic to study. In this paper, we describe the design issues of a pattern-based document database query language named JPQ, which uses various expressive patterns to extract and construct document fragments following a JSON-like document data model. It adopts tree-like extraction patterns with a coherent pattern composition mechanism to extract data elements from hierarchically structured documents and maintain the logical relationships among the elements. Based on these relationships, JPQ deploys a deductive mechanism to declaratively specify the data transformation requests and considers also data filtering on hierarchical data structure.

Xuhui Li, Zhengqi Liu, Mengchi Liu, Xiaoying Wu, Shanfeng Zhu

### Backmatter

Weitere Informationen

## BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

## Whitepaper

- ANZEIGE -

### Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.