skip to main content
10.1145/2213836.2213899acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Efficient external-memory bisimulation on DAGs

Authors Info & Claims
Published:20 May 2012Publication History

ABSTRACT

In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory.

Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.

References

  1. S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: from relations to semistructured data and XML. Morgan Kaufmann, San Francisco, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Aggarwal and J. S. Vitter. The Input/Output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Ajwani, A. Cosgaya-Lozano, and N. Zeh. Engineering a topological sorting algorithm for massive graphs. Proc. Algorithm Engineering and Experimentation (ALENEX), 2011.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Arge, P. Ferragina, R. Grossi, and J. S. Vitter. On sorting strings in external memory (extended abstract). In Proc. ACM Symp. on Theory of Computation (STOC), pages 540--548, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ben-Ari, T. Milo, and E. Verbin. Querying DAG-shaped execution traces through views. In WebDB, Providence, RI, USA, 2009.Google ScholarGoogle Scholar
  7. P. Buneman, M. Grohe, and C. Koch. Path queries on compressed XML. In VLDB, pages 141--152, Berlin, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In Proc. Symp. on Discrete Algorithms (SODA), pages 139--149, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Christophides, G. Karvounarakis, D. Plexousakis, M. Scholl, and S. Tourtounis. Optimizing taxonomic semantic web queries using labeling schemes. J. Web Sem., 1(2):207--228, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Dementiev, L. Kettner, and P. Sanders. STXXL: standard template library for XXL data sets. Software: Practice and Experience, 38(6):589--637, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Deng, B. Choi, J. Xu, and S. S. Bhowmick. Optimizing incremental maintenance of minimal bisimulation of cyclic graphs. In DASFAA, pages 543--557, Hong Kong, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Deutch and T. Milo. A quest for beauty and wealth (or, business processes for database researchers). In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 1--12, Athens, Greece, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Dovier, C. Piazza, and A. Policriti. An efficient algorithm for computing bisimulation equivalence. Theor. Comput. Sci., 311(1--3):221--256, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. H. L. Fletcher, D. Van Gucht, Y. Wu, M. Gyssens, S. Brenes, and J. Paredaens. A methodology for coupling fragments of XPath with structural indexes for XML documents. Inf. Syst., 34(7):657--670, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Grimsmo, T. A. Bjørklund, and M. L. Hetland. Linear computation of the maximum simultaneous forward and backward bisimulation for node-labeled trees. In XSym, pages 18--32, Singapore, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hellings. Bisimulation partitioning and partition maintenance on very large directed acyclic graphs. Master's thesis, Eindhoven Univ. of Technology, 2011.Google ScholarGoogle Scholar
  17. R. Kaushik, P. Bohannon, J. F. Naughton, and H. F. Korth. Covering indexes for branching path queries. In Proc. SIGMOD International Conference on Management of Data, pages 133--144, Madison, Wisconsin, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kaushik, P. Bohannon, J. F. Naughton, and P. Shenoy. Updates for structure indexes. In VLDB, pages 239--250, Hong Kong, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In Proc. IEEE Int. Conf. on Data Engineering (ICDE), pages 129--140, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. U. Meyer, P. Sanders, and J. Sibeyn, editors. Algorithms for memory hierarchies: advanced lectures. Springer-Verlag, Berlin, Heidelberg, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Milo and D. Suciu. Index structures for path expressions. In Proc. Int. Conf. on Database Theory (ICDT), pages 277--295, Jerusalem, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Moreau. The foundations for provenance on the web. Foundations and Trends in Web Science, 2(2--3):99--41, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K.-K. Muniswamy-Reddy and M. Seltzer. Provenance as first class cloud data. SIGOPS Oper. Syst. Rev., 43:11--16, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM J. Comput., 16(6):973--989, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Palla, I. J. Farkas, P. Pollner, I. Derényi, and T. Vicsek. Fundamental statistical features and self-similar properties of tagged networks. New Journal of Physics, 10(12):123026, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  26. D. Saha. An incremental bisimulation algorithm. In Proc. Found. of Software Technology and Theoretical Computer Science (FSTTCS), pages 204--215, New Delhi, India, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Sangiorgi. On the origins of bisimulation and coinduction. ACM Trans. Program. Lang. Syst., 31:15:1--15:41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Smith et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 25:1251--1255, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  29. W. Wang, H. Jiang, H. Wang, X. Lin, H. Lu, and J. Li. Efficient processing of XML path queries using the disk-based F&B index. In VLDB, pages 145--156, Trondheim, Norway, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Yu and R. Buyya. A taxonomy of workflow management systems for grid computing. Journal of Grid Computing, 3:171--200, 2005.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient external-memory bisimulation on DAGs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
        May 2012
        886 pages
        ISBN:9781450312479
        DOI:10.1145/2213836

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 May 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader