Probabilistic Voronoi diagrams for probabilistic moving nearest neighbor queries

https://doi.org/10.1016/j.datak.2012.02.001Get rights and content

Abstract

A large spectrum of applications such as location based services and environmental monitoring demand efficient query processing on uncertain databases. In this paper, we propose the probabilistic Voronoi diagram (PVD) for processing moving nearest neighbor queries on uncertain data, namely the probabilistic moving nearest neighbor (PMNN) queries. A PMNN query finds the most probable nearest neighbor of a moving query point continuously. To process PMNN queries efficiently, we provide two techniques: a pre-computation approach and an incremental approach. In the pre-computation approach, we develop an algorithm to efficiently evaluate PMNN queries based on the pre-computed PVD for the entire data set. In the incremental approach, we propose an incremental probabilistic safe region based technique that does not require to pre-compute the whole PVD to answer the PMNN query. In this incremental approach, we exploit the knowledge for a known region to compute the lower bound of the probability of an object being the nearest neighbor. Experimental results show that our approaches significantly outperform a sampling based approach by orders of magnitude in terms of I/O, query processing time, and communication overheads.

Introduction

Uncertainty is an inherent property in many database applications that include location based services [1], environmental monitoring [2], and feature extraction systems [3]. The inaccuracy or imprecision of data capturing devices, the privacy concerns of users, and the limitations on bandwidth and battery power introduce uncertainties in different attributes such as the location of an object or the measured value of a sensor. The values of these attributes are stored in a database, known as an uncertain database.

In recent years, query processing on an uncertain database has received significant attention from the research community due to its wide range of applications. Consider a location based application where the location information of users may need to be pre-processed before publishing due to the privacy concern of users. Alternatively, a user may want to provide her position as a larger region in order to prevent her location to be identified to a particular site. In such cases, locations of users are stored as uncertain attributes such as regions instead of points in the database. An application that deals with the location of objects (e.g., post office, hospital) obtained from satellite images is another example of an uncertain database. Since the location information may not be possible to identify accurately from the satellite images due to noisy transmission, locations of objects need to be represented as regions denoting the probable locations of objects. Likewise, in a biological database, objects identified from microscopic images need to be presented as uncertain attributes due to inaccuracies of data capturing devices.

In this paper, we propose a novel concept called Probabilistic Voronoi Diagram (PVD), which has a potential to efficiently process Nearest Neighbor (NN) queries on an uncertain database. The PVD for a given set of uncertain objects o1, o2, …, on partitions the data space into a set of Probabilistic Voronoi Cells (PVCs) based on the probability measure. Each cell PVC(oi) is a region in the data space, where each data point in this region has a higher probability of being the NN to oi than any other object.

A nearest neighbor query on an uncertain database, called a Probabilistic Nearest Neighbor (PNN) query, returns a set of objects, where each object has a non-zero probability of being the nearest to a query point. A common variant of the PNN query that finds the most probable NN to a given query point is also called a top-1-PNN query. Existing research focuses on efficient processing of PNN queries [4], [5], [6], [7] and its variants [8], [9], [10], [11], [12] for a static query point. In this paper, we are interested in answering Probabilistic Moving Nearest Neighbor (PMNN) queries on an uncertain database, where data objects are static, the query is moving, and the future path of the moving query is unknown. A PMNN query returns the most probable nearest object for a moving query point continuously.

A straightforward approach for evaluating a PMNN query is to use a sampling-based method, which processes the PMNN query as a sequence of PNN queries at sampled locations on the query path. However, to obtain up-to-date answers, a high sampling rate is required, which makes the sampling-based approach inefficient due to the frequent processing of PNN queries.

To avoid high processing cost of the sampling based approach and to provide continuous results, recent approaches for continuous NN query processing on a point data set rely on safe-region based techniques, e.g., Voronoi diagram [13]. In a Voronoi diagram based approach, the data space is partitioned into disjoint Voronoi cells where all points inside a cell have the same NN. Then, the NN of a query point is reduced to identifying the cell for the query point, and the result of a moving query point remains valid as long as it remains inside that cell. Motivated by the safe-region based paradigm, in this paper we propose a Voronoi diagram based approach for processing a PMNN query on a set of uncertain objects.

Voronoi diagrams for uncertain objects [6], [14] based on a simple distance metric, such as the minimum and maximum distances to objects, result in a large neutral region that contains those points for which no specific NN object is defined. Thus, these are not suitable for processing a PMNN query. In this paper, we propose the PVD that divides the space based on a probability measure rather than using just a simple distance metric.

A naive approach to compute the PVD is to find the top-1-PNN for every possible location in the data space using existing static PNN query processing techniques [4], [5], [8], which is an impractical solution due to high computational overhead. In this paper, we propose a practical solution to compute the PVD for a set of uncertain objects. The key idea of our approach is to efficiently compute the probabilistic bisectors between two neighboring objects that form the basis of PVCs for the PVD.

After computing the PVD, the most probable NN can be determined by simply identifying the PVC in which the query point is currently located. The result of the query does not change as long as the moving query point remains in the current PVC. A user sends its request as soon as it exits the PVC. Thus, in contrast to the sampling based approach, the PVD ensures the most probable NN for every point of a moving query path is available. Since this approach requires the pre-computation of the whole PVD, we name it the pre-computation approach in this paper.

The pre-computation approach needs to access all the objects from the database to compute the entire PVD. In addition, the PVD needs to be re-computed for any updates (insertion or deletion) to the database. Thus the pre-computation approach may not be suitable for the cases when the query is confined into a small region in the data space or when there are frequent updates in the database. For such cases, we propose an incremental algorithm based on the concept of local PVD. In this approach, a set of surrounding objects and an associated search space, called known region, with respect to the current query position are retrieved from the database. Objects are retrieved based on their probabilistic NN rankings from the current query location. Then, we compute the local PVD based only on the retrieved data set, and develop a probabilistic safe region based PMNN query processing technique. The probabilistic safe region defines a region for an uncertain object where the object is guaranteed to be the most probable nearest neighbor. This probabilistic safe region enables a user to utilize the retrieved data more efficiently and reduces the communication overheads when a client is connected to the server through a wireless link. The process needs to be repeated as soon as the retrieved data set cannot provide the required answer for the moving query point. We name this PMNN query processing technique the incremental approach in this paper.

In summary, we make the following contributions in this paper:

  • We formulate the Probabilistic Voronoi Diagram (PVD) for uncertain objects and propose techniques to compute the PVD.

  • We provide an algorithm for evaluating PMNN queries based on the pre-computed PVD.

  • We propose an incremental algorithm for evaluating PMNN queries based on the concept of local PVD.

  • We conduct an extensive experimental study which shows that our PVD based approaches outperform the sampling based approach significantly.

The rest of the paper is organized as follows. Section 2 discusses preliminaries and the problem setup. Section 3 reviews related work. In Section 4, we formulate the concept of PVD and present methods to compute it, focusing on one and two dimensional spaces. In Section 5, we present two techniques: pre-computation approach and incremental approach for processing PMNN queries. Section 6 reports our experimental results and Section 7 concludes the paper.

Section snippets

Preliminaries and problem setup

Let O be a set of uncertain objects in a d-dimensional data space. An uncertain object oi  O, 1  i  |O|, is represented by a d-dimensional uncertain range Ri and a probability density function (pdf) fi(u) that satisfies ∫ Ri fi(u)du = 1 for u  Ri. If u  Ri, then fi(u) = 0. We assume that the pdf of uncertain objects follow uniform distributions for the sake of easy explication. Our concept of PVD is applicable for other types of distributions. We briefly discuss PVDs for other distributions in Section 4.3

Background

In this section, we first give an overview of existing PNN query processing techniques on uncertain databases that are closely related to our work. Then we present existing work on Voronoi diagrams.

Probabilistic Voronoi Diagram

A Probabilistic Voronoi Diagram (PVD) is defined as follows:

Definition 1

(PVD) Let O be a set of uncertain objects in a d-dimensional data space. The probabilistic Voronoi diagram partitions the data space into a set of disjoint regions, called Probabilistic Voronoi Cells (PVCs). The PVC of an object oi  O is a region or a set of regions, denoted by PVC(oi), such that p(oi, q) > p(oj, q) for any point q  PVC(oi) and for any object oj  O  {oi}, where p(oi, q) and p(oj, q) are the probabilities of oi and oj of being

Processing PMNN queries

In this section, based on the concept of PVD we propose two techniques: a pre-computation approach and an incremental approach for answering PMNN queries. In the pre-computation approach, we first create the PVD for the whole data set and then index the PVCs for answering PMNN queries. We name the pre-computation based technique for processing PMNN queries as P-PVD-PMNN. On the other hand, in the incremental approach, we retrieve a set of surrounding objects with respect the current query

Experimental study

We compare our PVD based approaches for the PMNN query (P-PVD-PMNN and I-PVD-PMNN) with a sampling based approach (Naive-PMNN), which processes a PMNN query as a sequence of static PNN queries at sampled locations. Though in Naive-PMNN we use the most recent technique of static top-1-PNN queries [8], any existing technique for static PNN queries [4], [5] can be used. Note that, by using the existing method in [6], for each uncertain object oi, we could only define a region (or UV-cell) where oi

Summary

In this paper, we have introduced the concept of Probabilistic Voronoi Diagrams (PVDs). A PVD divides the data space using a probability measure. Based on the PVD, we developed two different techniques: a pre-computation approach and an incremental approach, for efficient processing of Probabilistic Moving Nearest Neighbor (PMNN) queries. Our experimental results show that our techniques outperform the sampling based approach by up to two orders of magnitude in our evaluation metrics.

Our work

Mohammed Eunus Ali is currently an Assistant Professor in the Department of Computer Science and Engineering (CSE) at Bangladesh University of Engineering and Technology (BUET), Dhaka. He received his PhD degree in Computer Science from the University of Melbourne in 2010. He obtained his B.Sc. and M.Sc. degrees in Computer Science and Engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh in 1999 and 2002, respectively. He served as a Lecturer in Computer

References (46)

  • R. Cheng et al.

    UV-Diagram: A Voronoi Diagram for Uncertain Data

  • H.-P. Kriegel et al.

    Probabilistic nearest-neighbor query on uncertain objects

  • G. Beskales et al.

    Efficient search for the top-k probable nearest neighbors in uncertain databases

    Proceedings of the VLDB Endow

    (2008)
  • C. Re et al.

    Efficient top-k query evaluation on probabilistic data

  • M.A. Soliman et al.

    Top-k query processing in uncertain databases

  • Y. Qi et al.

    Threshold query optimization for uncertain data

  • A. Knight et al.

    Efficient range query processing on uncertain data

  • A. Okabe et al.

    Spatial Tessellations: Concepts and Applications of Voronoi Diagrams

    (2000)
  • W. Evans et al.

    Guaranteed Voronoi Diagrams of Uncertain Sites

  • R. Cheng et al.

    Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data

  • G. Trajcevski et al.

    Continuous probabilistic nearest-neighbor queries for uncertain trajectories

  • X. Lian et al.

    Probabilistic group nearest neighbor queries in uncertain databases

    IEEE TKDE

    (2008)
  • X. Dai et al.

    Probabilistic spatial queries on existentially uncertain data

  • Cited by (15)

    • Moving view field nearest neighbor queries

      2019, Data and Knowledge Engineering
      Citation Excerpt :

      They use a branch-and-bound approach with mindist, which is the minimum distance between a given query point and a data object (or MBR of the R-tree) to decrease the number of redundant processing. MNN queries [11–17] continuously find the nearest data object to a given query point. These studies observed that the result of the MNN query does not change if the moved query point is in a safe region [14].

    • Network Voronoi Diagram on uncertain objects for nearest neighbor queries

      2015, Information Sciences
      Citation Excerpt :

      Cheng et al. [9] answered PNN based on UV-diagram, which outperforms the R-tree based methods. Ali et al. [3] provided a pre-computation approach and an incremental approach to handle moving nearest neighbor queries on uncertain data in Euclidean space. As an important technique for facilitating NN related queries, VD [36] has been extensively studied and many variants and construction algorithms have been proposed.

    • Countering overlapping rectangle privacy attack for moving kNN queries

      2013, Information Systems
      Citation Excerpt :

      The threat to privacy is becoming more urgent as positioning devices become more precise, and a lack of addressing privacy issues may significantly impair the proliferation of LBSs [1,2]. An important class of LBSs is supported by the moving k nearest neighbor (MkNN) query [3,4], which continuously returns the k nearest data objects for a moving user. For example, a tourist may want to observe the five nearest restaurants continuously while exploring a city so that she can drop in to a preferred one of the nearest five at any time.

    • Computational geometry With independent and dependent uncertainties

      2022, Computational Geometry With Independent And Dependent Uncertainties
    • Voronoi Diagram and Delaunay Triangulation with Independent and Dependent Geometric Uncertainties

      2021, International Journal of Computational Geometry and Applications
    • Query method for nearest region of spatial line segment based on hilbert curve grid

      2019, International Journal of Innovative Computing, Information and Control
    View all citing articles on Scopus

    Mohammed Eunus Ali is currently an Assistant Professor in the Department of Computer Science and Engineering (CSE) at Bangladesh University of Engineering and Technology (BUET), Dhaka. He received his PhD degree in Computer Science from the University of Melbourne in 2010. He obtained his B.Sc. and M.Sc. degrees in Computer Science and Engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh in 1999 and 2002, respectively. He served as a Lecturer in Computer Science and Engineering at Bangladesh University of Engineering and Technology in 2000–2003. He also worked as a Research Fellow in the University of Melbourne in 2010–2011. Dr. Eunus's research areas cover a wide range of topic in database systems and information management that include spatial and multimedia databases, biological and high-dimensional databases, and large-scale distributed data management. He is also involved in different projects that investigate the role of ICT for the development of developing countries like Bangladesh. In particular, he concentrates on conducting research and developing new data management techniques suitable for socio-economic perspective of Bangladesh.

    Egemen Tanin is a senior lecturer in the Department of Computing and Information Systems, the University of Melbourne. He received his PhD degree in computer science from the University of Maryland, College Park, Maryland, where he also held a postdoctoral research associate position from 2001 until 2003. His areas of research include spatial data management and database visualization.

    Rui Zhang obtained his Bachelor's degree from Tsinghua University and PhD from National University of Singapore. Currently he is a senior lecturer in the Department of Computing and Information Systems, the University of Melbourne. His research interest is data and information management in general, particularly in areas of indexing techniques, spatio-temporal databases, web data, data streams and sequence databases. He has published extensively in such areas and regularly serves as PC members of top conferences such as SIGMOD, VLDB, ICDE, SIGKDD, etc. He is the Vice Chair for ACM SIGSPATIAL Australian Chapter, and he is a co-chair for Australasian Database Conference 2012 and 2013.

    Ramamohanarao Kotagiri received his degrees B.E. at Andhra University, ME at the Indian Institute of Science, Bangalore and Ph.D. at Monash University. He was awarded the Alexander von Humboldt Fellowship in 1983. He has been at the University Melbourne since 1980 and was appointed a Professor in Computer Science in 1989. Rao held several senior positions including Head of Computer Science and Software Engineering, Head of the School of Electrical Engineering and Computer Science at the University of Melbourne, Deputy Director of Centre for Ultra Broadband Information Networks, Co-Director of the Key Centre for Knowledge-Based Systems, and Research Director for the Cooperative Research Centre for Intelligent Decision Systems. He served as a member of the Australian Research Council Information Technology Panel. He served on the Prime Ministers Science, Engineering and Innovation Council working party on Data for Scientists. At present he is on the Editorial Boards for Universal Computer Science, KAIS, IEEE TKDE, Journal of Statistical Analysis and Data Mining and VLDB Journal. He served as a program committee member of several International conferences including SIGMOD, IEEE ICDM, VLDB, ICLP and ICDE. He was the program Co-Chair for VLDB, PAKDD, DASFAA and DOOD conferences. He is a steering committee member of IEEE ICDM, PAKDD and DASFAA. Rao is a fellow of the Institute of Engineers Australia, Australian Academy Technological Sciences and Engineering and Australian Academy of Science. He has research interests in the areas of database systems, logic based systems, agent oriented systems, information retrieval, data mining, intrusion detection and machine learning.

    View full text