Abstract
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. In PRA, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always conform to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modeled. We introduce the concept of vague predicates which yield probabilistic weights instead of Boolean values, thus allowing for queries with vague selection conditions. With these features, PRA implements uncertainty and vagueness in combination with the relational model.
- BARBARA, D., GARCIA-MOLLINA, H., AND PORTER, D. 1992. The management of probabilistic data. IEEE. Trans. Knowl. Data Eng. 4, 5, 487-502. Google ScholarDigital Library
- BILLINGSLEY, P. 1979. Probability and Measure. John Wiley and Sons, New York.Google Scholar
- BLAIR, D.C. 1988. An extended relational document retrieval model. Inf. Process. Manage. 24, 3, 349-371. Google ScholarDigital Library
- BUCKLEY, C. 1985. Implementation of the SMART information retrieval system. Tech. Rep. 85-686, Dept. of Computer Science, Cornell. Univ., Ithaca, N.Y. Google ScholarDigital Library
- CAVALLO, R. AND PITTARELLI, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Databases. Morgan Kaufmann, San Mateo, Calif., 71-81. Google ScholarDigital Library
- CODD, E. F. 1986. Missing information (applicable and inapplicable) in relational databases. SIGMOD Rec. 15, 4, 53-78. Google ScholarDigital Library
- CROFT, W., TURTLE, H., AND LEWIS, D. 1991. The use of phrases and structured queries in information retrieval. In Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 32-45. Google ScholarDigital Library
- DESAI, B., GOYAL, P., AND SADRI, F. 1987. Non-first-normal form universal relations: An application to information retrieval systems. Inf. Syst. 12, 1, 49-55. Google ScholarDigital Library
- FLICKNER, M., SAWHNEY, H., NIBLACK, W., ASHLEY, J., HUANG, Q., DOM, N., GORKANI, M., HAFNER, J., LEE, D., PETKOVIC, D., STEELE, D., AND YANKER, P. 1995. Query by image and video content: The QBIC system. Computer 28, 9, 23-32. Google ScholarDigital Library
- FUHR, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the 16th International Conference on Very Large Databases, D. McLeod, R. Sacks-Davis, and H. Schek, Eds. Morgan Kaufmann, San Mateo, Calif., 696- 707. Google ScholarDigital Library
- FUHR, N. 1992a. Integration of probabilistic fact and text retrieval. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Belkin, P. Ingwersen, and M. Petjersen, Eds. ACM, New York, 211-222. Google ScholarDigital Library
- FUHR, N. 1992b. Probabilistic models in information retrieval. Comput. J. 35, 3, 243-255. Google ScholarDigital Library
- FUHR, N. 1993. A probabilistic relational model for the integration of IR and databases. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 309-317. Google ScholarDigital Library
- FUHR, N. 1995. Probabilistic Datalog--A logic for powerful retrieval method. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 282-290. Google ScholarDigital Library
- FUHR, N. AND BUCKLEY, C. 1991. A probabilistic learning approach for document indexing. ACM Trans. Inf. Syst. 9, 3 (July), 223-248. Google ScholarDigital Library
- FUHR, N. AND ROLLEKE, T. 1996. A probabilistic NF2 relational algebra for integrated information retrieval and database systems. In Proceedings of the 2nd World Conference on Integrated Design and Process Technology (IDPT). Soc. for Design and Process Science, Austin, Tex.Google Scholar
- GESSERT, G. 1990. Four valued logic for relational database systems. SIGMOD Rec. 19, 1, 29 -35. Google ScholarDigital Library
- GU, J., THIEL, U., AND ZHAO, J. 1993. Efficient retrieval of complex objects: Query processing in a hybrid DB and IR system. In Proceedings 1. GI-Fachtagung Information Retrieval. Universit#tsverlag Konstanz, Konstanz, Germany.Google Scholar
- HAINES, D. AND CROFT, B. 1993. Relevance feedback and inference networks. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM, New York, 2-11. Google ScholarDigital Library
- HARMAN, D. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 103, 271-290. Google ScholarDigital Library
- IMIELINSKI, T. 1989. Incomplete information in logical databases. Data Eng. Bull. 12, 2, 29-40.Google Scholar
- IMIELINSKI, T. AND LIPSKI, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761-791. Google ScholarDigital Library
- JARKE, M. AND KOCH, g. 1984. Query optimization in database systems. ACM Comput. Surv. 16, 111-152. Google ScholarDigital Library
- KIM, W., Ed. 1989. Special issue on imprecision in databases. Data Eng. Bull. 12, 2.Google Scholar
- LEE, D. AND KIM, M. 1993. Accommodating subjective vagueness through a fuzzy extension to the relational data model. Inf. Syst. 18, 6, 363-374. Google ScholarDigital Library
- LEE, S. 1992. An extended relational database model for uncertain and imprecise information. In Proceedings of the 18th VLDB Conference. Morgan Kaufmann, San Mateo, Calif., 211-220. Google ScholarDigital Library
- LIPSKI, W. 1979. On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. 4, 3, 262-296. Google ScholarDigital Library
- LOEFFEN, A. 1994. Text databases: A survey of text models and systems. SIGMOD Rec. 23, 1, 97-106. Google ScholarDigital Library
- MACLEOD, I. 1991. Text retrieval and the relational model. J. Am. Soc. Inf. Sci. 42, 3, 155-165.Google ScholarCross Ref
- MOTRO, A. 1988. VAGUE: A user interface to relational databases that permits vague queries. ACM Trans. Off. Inf. Syst. 6, 3, 187-214. Google ScholarDigital Library
- MOTRO, A. 1990. Accommodating imprecision in database systems: Issues and solutions. SIGMOD Rec. 19, 4, 69. Google ScholarDigital Library
- PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, Calif. Google ScholarDigital Library
- PFEIFER, U. AND FUHR, N. 1995. Efficient processing of vague queries using a data stream approach. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 189-198. Google ScholarDigital Library
- PFEIFER, U., FUHR, N., AND HUYNH, T. 1995. Searching structured documents with the enhanced retrieval functionality of freeWAIS-sf and Sfgate. In Computer Networks and ISDN Systems: Proceedings of the 3rd International World-Wide Web Conference, D. Kroemker, Ed. Elsevier, Amsterdam, 1027-1036. Google ScholarDigital Library
- PRADE, H. AND TESTEMALE, C. 1984. Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries. Inf. Sci. 34, 115-143.Google ScholarCross Ref
- RABITTI, F. AND SAVINO, P. 1990. Retrieval of multimedia documents by imprecise query specification. In Advances in Database Technology--EDBT '90, F. Bancilhon, C. Thanos, and D. Tsichritzis, Eds. Springer-Verlag, Berlin, 203-218. Google ScholarDigital Library
- RAGHAVAN, V., SAXTON, L., WONG, S., AND TING, S. 1986. A unified architecture for the integration of data base management and information retrieval systems. In Information Processing 86, H.-J. Kugler, Ed. Elsevier, Amsterdam, 1049-1054.Google Scholar
- REITER, R. 1984. Towards a logical reconstruction of relational database theory. In On Conceptual Modelling, M. Brodie, J. Mylopoulous, and J. Schmidt, Eds. Springer, New York.Google Scholar
- ROLLEKE, T. 1994. Equivalences of the probabilistic relational algebra. Tech. Rep., Dept. of Computer Science, Univ. of Dortmund, Dortmund, Germany.Google Scholar
- SALTON, G. AND BUCKLEY, C. 1988. Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5, 513-523. Google ScholarDigital Library
- SAXTON, L. AND RAGHAVAN, V. 1990. Design of an integrated information retrieval database management system. IEEE Trans. Knowl. Data Eng. 2, 2, 210-219. Google ScholarDigital Library
- SCHEK, H.-J. AND PISTOR, P. 1982. Data structures for an integrated database management and information retrieval system. In Proceedings of the 8th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, Calif., 197-207. Google ScholarDigital Library
- TAKAHASHI, Y. 1993. Fuzzy database query languages and their relational completeness theorem. IEEE Trans. Knowl. Data Eng. 5, 1, 122. Google ScholarDigital Library
- VAN RIJSBERGEN, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.Google ScholarCross Ref
- VASSILIOU, Y. 1979. Null values in database management--A denotational semantics approach. In Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, New York. Google ScholarDigital Library
- VOSSEN, a. 1991. Data Models, Database Languages and Database Management Systems. Addison-Wesley, Reading, Mass. Google ScholarDigital Library
- YUE, K. 1991. A more general model for handling missing information using a 3-valued logic. SIGMOD Rec. 20, 3, 43-49. Google ScholarDigital Library
- ZEMANKOVA, M. AND KANDEL, A. 1985. Implementing imprecision in information systems. Inf. Sci. 37, 107-141. Google ScholarDigital Library
Index Terms
- A probabilistic relational algebra for the integration of information retrieval and database systems
Recommendations
A probabilistic relational model and algebra
Although the relational model for databases provides a great range of advantages over other data models, it lacks a comprehensive way to handle incomplete and uncertain data. Uncertainty in data values, however, is pervasive in all real-world ...
A relational database model and algebra integrating fuzzy attributes and probabilistic tuples
AbstractAlthough there have been many fuzzy or probabilistic relational database models proposed for representing and handling imprecise and uncertain information of objects in real-world applications, models combining the relevance and ...
XIRQL: An XML query language based on information retrieval concepts
XIRQL ("circle") is an XML query language that incorporates imprecision and vagueness for both structural and content-oriented query conditions. The corresponding uncertainty is handled by a consistent probabilistic model. The core features of XIRQL are ...
Comments