skip to main content
article
Free Access

Harp: a distributed query system for legacy public libraries and structured databases

Published:01 July 1999Publication History
Skip Abstract Section

Abstract

The main purpose of a digital library is to facilitate users easy access to enormous amount of globally networked information. Typically, this information includes preexisting public library catalog data, digitized document collections, and other databases. In this article, we describe the distributed query system of a digital library prototype system known as HARP. In the HARP project, we have designed and implemented a distributed query processor and its query front-end to support integrated queries to preexisting public library catalogs and structured databases. This article describes our experiences in the design of an extended Sequel (SQL) query language known as HarpSQL. It also presents the design and implementation of the distributed query system. Our experience in distributed query processor and user interface design and development will be highlighted. We believe that our prototyping effort will provide useful lessons to the development of a complete digital library infrastructure.

References

  1. ATKINS, D. E., BIRMINGHAM, W. P., DURFEE, E. H., GLOVER, E. J., MULLEN, T., RUNDENSTEINER, E. A., SOLOWAY, E., VIDAL, J. M., WALLACE, R., AND WELLMAN, M. P. 1996. Toward inquiry-based education through interacting software agents. IEEE Computer 29, 5, 69-76. Google ScholarGoogle Scholar
  2. BLAKE, G., CONSENS, M., DAVIS, I., KILPELAINEN, P., KUIKKA, E., LARSON, P.-A., SNIDER, T., AND TOMPA, F. 1995. Text/relational database management systems: Overview and proposed SQL extentions database prototype. Tech. Rep. 95-25. Centre for the New OLD and Text Research, University of Waterloo, Waterloo, Canada.Google ScholarGoogle Scholar
  3. CHAUDHURI, S., DAYAL, U., AND YAN, T.W. 1995. Join queries with external text sources: execution and optimization techniques. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD '95, San Jose, CA, May 23-25, 1995), M. Carey and D. Schneider, Eds. ACM Press, New York, NY, 410-422. Google ScholarGoogle Scholar
  4. CRAWFORD, W. 1984. MARC for Library Use: Understanding the USMARC Formats. Knowledge Industry Publications, Inc., White Plains, NY. Google ScholarGoogle Scholar
  5. GRAHAM, I. 1995. The HTML Sourcebook. John Wiley & Sons, Inc., New York, NY. Google ScholarGoogle Scholar
  6. ISO. 1986. International Standard 8879: Information processing--text and office systems- Standard Generalized Markup Language (SGML). International Standards Organization. Ref. No. ISO 8879-1986(E).Google ScholarGoogle Scholar
  7. JONES, D. M. 1996. The Hypertext Bibliography Project. Tech. Rep. MIT Laboratory for Computer Science, Cambridge, MA. http://theory.lcs.mit.edu-dmjones/hbp/.Google ScholarGoogle Scholar
  8. KAHLE, B. AND MEDLAR, A. 1991. An information system for corporate users: Wide area information servers. Online 15, 5 (Sept. 1991), 56-60. Google ScholarGoogle Scholar
  9. LAGOZE, C. AND DAVIS, J. R. 1995. Dienst: An architecture for distributed document libraries. Commun. ACM 38, 4 (Apr. 1995), 47. Google ScholarGoogle Scholar
  10. LEY, M. 1995. DB&LP: A WWW bibliogrphy on databases and logic programming. Tech. Rep. Informatik Universitat, Trier, Germany.Google ScholarGoogle Scholar
  11. LIM, E.-P., SRIVASTAVA, J., AND HWANG, S.-Y. 1995. An algebraic transformation framework for multidatabase queries. Distrib. Parallel Databases 3, 3 (July 1995), 273-307. Google ScholarGoogle Scholar
  12. LIU, L. AND PU, C. 1996. Issues on query processing in distributed and interoperable information systems. In Proceedings of the International Symposium on Cooperative Database Systems for Advanced Applications (Kyoto, Japan, Dec.).Google ScholarGoogle Scholar
  13. Lu, Y. AND LIM, E.-P. 1996. On integrating existing bibliographic databases and structured databases. In Proceedings of the IEEE International Computer Software and Applications Conference (COMPSAC '96, Aug.). IEEE Press, Piscataway, NJ. Google ScholarGoogle Scholar
  14. NISO. 1995. Information Retrieval (Z39.50): Application service definition and protocol specification. Tech. Rep. ANSI/NISO Z39.50-1995. NISO Press, Bethesda, MD. Available via http://lcweb.loc.gov/z3950/agency/.Google ScholarGoogle Scholar
  15. OUSTERHOUT, J. 1993. An Introduction to Tcl and Tk. Addison-Wesley, Reading, MA.{Google ScholarGoogle Scholar
  16. PAPAKONSTANTINOU, Y., GARCIA-MOLINA, H., AND WIDOM, J. 1995. Object exchange across heterogeneous information sources. In Proceedings of the IEEE International Conference on Data Engineering (Mar.). IEEE Press, Piscataway, NJ. Google ScholarGoogle Scholar
  17. QUASS, D., RAJARAMAN, n., SAGIV, Y., ULLMAN, J., AND WIDON, J. 1995. Querying semistructured heterogeneous information. In Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases (Singapore, Dec.). Springer-Verlag, Berlin, Germany. Google ScholarGoogle Scholar
  18. SALZA, S., BARONE, G., AND MOZRY, T. 1994. Distributed query optimization in loosely coupled multidatabase systems. In Proceedings of the International Conference on Database Theory (Prague). Google ScholarGoogle Scholar
  19. SMITH, T. 1996. A digital library for geographicaly referenced materials. IEEE Comput. 29, 5 (May), 54-60. Google ScholarGoogle Scholar
  20. STONEBRAKER, M. AND ROWE, L. A 1986. The design of POSTGRES. SIGMOD Rec. 15, 2 (June 1986), 340-355. Google ScholarGoogle Scholar
  21. THE POSTGRES GROUP. 1994. The POSTGRES user manual. Department of Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  22. VAN HEYNINGEN, M. 1994. The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources. In Proceedings of the 2nd International WWW Conference (Chicago, IL, Oct. 17-20). http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Agents/whitehead/ whitehead.html.Google ScholarGoogle Scholar
  23. WONG, E. AND YOUSSEFI, K. 1976. Decomposition--A strategy for query processing. ACM Trans. Database Syst. 1, 3 (Sept.), 223-241. Google ScholarGoogle Scholar

Index Terms

  1. Harp: a distributed query system for legacy public libraries and structured databases

        Recommendations

        Reviews

        Svetlana Segarceanu

        Harp is a prototype distributed query system for a digital library. This work addresses the query-processing problem involving both legacy library catalogues and structured data in SQL databases. The design of the system takes into account the model of legacy public library catalogues, specifically the representation of the bibliographic records in machine-readable cataloguing (MARC) format; the query capabilities of the remote access protocol Z39.50; and the integration of different types of data by extended predicates. The introduction presents the main objectives in digital library research and the research issues considered here. Section 2 relates this work to other research in the field. Section 3 describes the HarpSQL query language for writing integrated queries to bibliographic and SQL databases. Its features include foreign SQL and bibliographic tables, the MARCString data type, and the definition of the Extract and Contain predicates to suit different selection criteria on the MARCString attributes and perform the join operation on BIB tables and SQL tables. Section 4 introduces the distributed query processing architecture, which consists of two types of processes: query managers that associate the query with a specific graph, and query agents that process the subqueries of the graph. Based on this, section 5 presents the query processing strategy. Section 6 covers implementation issues for the distributed query processor. The query manager and agents are implemented as separate processes, and the query manager and the agent processes communicate using message queues. The HarpSQL server is realized by extending the POSTGRES database system. Section 7 gives the design and implementation of a graphical query formulation tool. Possible extensions to HarpSQL and conclusions are given in the last two sections. This work integrates the research efforts to ease remote, simultaneous access to legacy library collections and other archival data on the Internet. It focuses on distributed query processing, a less explored area of this field. The paper presents interesting and worthwhile research in multidatabase query processing.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader