skip to main content
article
Free Access

A probabilistic relational algebra for the integration of information retrieval and database systems

Published:01 January 1997Publication History
Skip Abstract Section

Abstract

We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. In PRA, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always conform to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modeled. We introduce the concept of vague predicates which yield probabilistic weights instead of Boolean values, thus allowing for queries with vague selection conditions. With these features, PRA implements uncertainty and vagueness in combination with the relational model.

References

  1. BARBARA, D., GARCIA-MOLLINA, H., AND PORTER, D. 1992. The management of probabilistic data. IEEE. Trans. Knowl. Data Eng. 4, 5, 487-502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. BILLINGSLEY, P. 1979. Probability and Measure. John Wiley and Sons, New York.Google ScholarGoogle Scholar
  3. BLAIR, D.C. 1988. An extended relational document retrieval model. Inf. Process. Manage. 24, 3, 349-371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BUCKLEY, C. 1985. Implementation of the SMART information retrieval system. Tech. Rep. 85-686, Dept. of Computer Science, Cornell. Univ., Ithaca, N.Y. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CAVALLO, R. AND PITTARELLI, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Databases. Morgan Kaufmann, San Mateo, Calif., 71-81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CODD, E. F. 1986. Missing information (applicable and inapplicable) in relational databases. SIGMOD Rec. 15, 4, 53-78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CROFT, W., TURTLE, H., AND LEWIS, D. 1991. The use of phrases and structured queries in information retrieval. In Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 32-45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. DESAI, B., GOYAL, P., AND SADRI, F. 1987. Non-first-normal form universal relations: An application to information retrieval systems. Inf. Syst. 12, 1, 49-55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FLICKNER, M., SAWHNEY, H., NIBLACK, W., ASHLEY, J., HUANG, Q., DOM, N., GORKANI, M., HAFNER, J., LEE, D., PETKOVIC, D., STEELE, D., AND YANKER, P. 1995. Query by image and video content: The QBIC system. Computer 28, 9, 23-32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FUHR, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the 16th International Conference on Very Large Databases, D. McLeod, R. Sacks-Davis, and H. Schek, Eds. Morgan Kaufmann, San Mateo, Calif., 696- 707. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. FUHR, N. 1992a. Integration of probabilistic fact and text retrieval. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Belkin, P. Ingwersen, and M. Petjersen, Eds. ACM, New York, 211-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. FUHR, N. 1992b. Probabilistic models in information retrieval. Comput. J. 35, 3, 243-255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. FUHR, N. 1993. A probabilistic relational model for the integration of IR and databases. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 309-317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. FUHR, N. 1995. Probabilistic Datalog--A logic for powerful retrieval method. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 282-290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. FUHR, N. AND BUCKLEY, C. 1991. A probabilistic learning approach for document indexing. ACM Trans. Inf. Syst. 9, 3 (July), 223-248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. FUHR, N. AND ROLLEKE, T. 1996. A probabilistic NF2 relational algebra for integrated information retrieval and database systems. In Proceedings of the 2nd World Conference on Integrated Design and Process Technology (IDPT). Soc. for Design and Process Science, Austin, Tex.Google ScholarGoogle Scholar
  17. GESSERT, G. 1990. Four valued logic for relational database systems. SIGMOD Rec. 19, 1, 29 -35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. GU, J., THIEL, U., AND ZHAO, J. 1993. Efficient retrieval of complex objects: Query processing in a hybrid DB and IR system. In Proceedings 1. GI-Fachtagung Information Retrieval. Universit#tsverlag Konstanz, Konstanz, Germany.Google ScholarGoogle Scholar
  19. HAINES, D. AND CROFT, B. 1993. Relevance feedback and inference networks. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM, New York, 2-11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. HARMAN, D. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 103, 271-290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IMIELINSKI, T. 1989. Incomplete information in logical databases. Data Eng. Bull. 12, 2, 29-40.Google ScholarGoogle Scholar
  22. IMIELINSKI, T. AND LIPSKI, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761-791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. JARKE, M. AND KOCH, g. 1984. Query optimization in database systems. ACM Comput. Surv. 16, 111-152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. KIM, W., Ed. 1989. Special issue on imprecision in databases. Data Eng. Bull. 12, 2.Google ScholarGoogle Scholar
  25. LEE, D. AND KIM, M. 1993. Accommodating subjective vagueness through a fuzzy extension to the relational data model. Inf. Syst. 18, 6, 363-374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. LEE, S. 1992. An extended relational database model for uncertain and imprecise information. In Proceedings of the 18th VLDB Conference. Morgan Kaufmann, San Mateo, Calif., 211-220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. LIPSKI, W. 1979. On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. 4, 3, 262-296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. LOEFFEN, A. 1994. Text databases: A survey of text models and systems. SIGMOD Rec. 23, 1, 97-106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. MACLEOD, I. 1991. Text retrieval and the relational model. J. Am. Soc. Inf. Sci. 42, 3, 155-165.Google ScholarGoogle ScholarCross RefCross Ref
  30. MOTRO, A. 1988. VAGUE: A user interface to relational databases that permits vague queries. ACM Trans. Off. Inf. Syst. 6, 3, 187-214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. MOTRO, A. 1990. Accommodating imprecision in database systems: Issues and solutions. SIGMOD Rec. 19, 4, 69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, Calif. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. PFEIFER, U. AND FUHR, N. 1995. Efficient processing of vague queries using a data stream approach. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 189-198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. PFEIFER, U., FUHR, N., AND HUYNH, T. 1995. Searching structured documents with the enhanced retrieval functionality of freeWAIS-sf and Sfgate. In Computer Networks and ISDN Systems: Proceedings of the 3rd International World-Wide Web Conference, D. Kroemker, Ed. Elsevier, Amsterdam, 1027-1036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. PRADE, H. AND TESTEMALE, C. 1984. Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries. Inf. Sci. 34, 115-143.Google ScholarGoogle ScholarCross RefCross Ref
  36. RABITTI, F. AND SAVINO, P. 1990. Retrieval of multimedia documents by imprecise query specification. In Advances in Database Technology--EDBT '90, F. Bancilhon, C. Thanos, and D. Tsichritzis, Eds. Springer-Verlag, Berlin, 203-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. RAGHAVAN, V., SAXTON, L., WONG, S., AND TING, S. 1986. A unified architecture for the integration of data base management and information retrieval systems. In Information Processing 86, H.-J. Kugler, Ed. Elsevier, Amsterdam, 1049-1054.Google ScholarGoogle Scholar
  38. REITER, R. 1984. Towards a logical reconstruction of relational database theory. In On Conceptual Modelling, M. Brodie, J. Mylopoulous, and J. Schmidt, Eds. Springer, New York.Google ScholarGoogle Scholar
  39. ROLLEKE, T. 1994. Equivalences of the probabilistic relational algebra. Tech. Rep., Dept. of Computer Science, Univ. of Dortmund, Dortmund, Germany.Google ScholarGoogle Scholar
  40. SALTON, G. AND BUCKLEY, C. 1988. Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5, 513-523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. SAXTON, L. AND RAGHAVAN, V. 1990. Design of an integrated information retrieval database management system. IEEE Trans. Knowl. Data Eng. 2, 2, 210-219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. SCHEK, H.-J. AND PISTOR, P. 1982. Data structures for an integrated database management and information retrieval system. In Proceedings of the 8th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, Calif., 197-207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. TAKAHASHI, Y. 1993. Fuzzy database query languages and their relational completeness theorem. IEEE Trans. Knowl. Data Eng. 5, 1, 122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. VAN RIJSBERGEN, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.Google ScholarGoogle ScholarCross RefCross Ref
  45. VASSILIOU, Y. 1979. Null values in database management--A denotational semantics approach. In Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. VOSSEN, a. 1991. Data Models, Database Languages and Database Management Systems. Addison-Wesley, Reading, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. YUE, K. 1991. A more general model for handling missing information using a 3-valued logic. SIGMOD Rec. 20, 3, 43-49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. ZEMANKOVA, M. AND KANDEL, A. 1985. Implementing imprecision in information systems. Inf. Sci. 37, 107-141. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A probabilistic relational algebra for the integration of information retrieval and database systems

      Recommendations

      Reviews

      Fazli Can

      The information retrieval (IR) literature contains various studies on the integration of IR and database management systems. This paper presents a probabilistic relational algebra (PRA) for this purpose. PRA is a logical data model based on intensional semantics. It allows probabilistic document indexing and search term weighting. The authors provide a detailed discussion of several aspects of their approach, including a discussion of PRA in terms of its operations and as a generalization of relational algebra. Computation of event probabilities, handling of imprecise attribute values, vague predicates, and implementation issues are studied, and a survey of related work is provided. The work is interesting, but implementations in databases of realistic size are required to validate its practical value.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 15, Issue 1
        Jan. 1997
        101 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/239041
        Issue’s Table of Contents

        Copyright © 1997 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 January 1997
        Published in tois Volume 15, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader