skip to main content
article
Free Access

Conceptual schema analysis: techniques and applications

Published:01 September 1998Publication History
Skip Abstract Section

Abstract

The problem of analyzing and classifying conceptual schemas is becomig increasingly important due to the availability of a large number of schemas related to existing applications. The purposes of schema analysis and classification activities can be different: to extract information on intensional properties of legacy systems in order to restructure or migrate to new architectures; to build libraries of reference conceptual components to be used in building new applications in a given domain; and to identify information flows and possible replication of data in an organization. This article proposes a set of techniques for schema analysis and classification to be used separately or in combination. The techniques allow the analyst to derive significant properties from schemas, with human intervention limited as far as possible. In particular, techniques for associating descriptors with schemas, for abstracting reference conceptual schemas based on schema clustering, and for determining schema similarity are presented. A methodology for systematic schema analysis is illustrated, with the purpose of identifying and abstracting into reference components the similar and potentially reusable parts of a set of schemas. Experiences deriving from the application of the proposed techniques and methodology on a large set of Entity-Relationship conceptual schemas of information systems in the Italian Public Administration domain are described

References

  1. AIKEN, P., MUNTZ, A., AND RICHARDS, R. 1994. DoD legacy systems--reverse engineering data requirements. Commun. ACM 37, 5 (May), 26-41. Google ScholarGoogle Scholar
  2. ARANGO, G. 1989. Domain analysis: From art form to engineering discipline. In Proceedings of the Fifth International Workshop on Software Specification and Design (May), 152-159. Google ScholarGoogle Scholar
  3. BANKER, R. D., KAUFFMAN, R. J., AND ZWEIG, D. 1993. Repository evaluation of software reuse. IEEE Trans. Softw. Eng. 19, 4 (Apr.), 379-389. Google ScholarGoogle Scholar
  4. BATINI, C., LENZERINI, M., AND NAVATHE, S.B. 1986. A comprehensive analysis of methodologies for database schema integration. ACM Comput. Surv. 18, 4 (Dec.), 322-364. Google ScholarGoogle Scholar
  5. BATINI, C., CASTANO, S., DE ANTONELLIS, V., FUGINI, M. G., AND PERNICI, B. 1996. Analysis of an inventory of information systems in the public administration. Req. Eng. J. 1, 1, 47-62.Google ScholarGoogle Scholar
  6. BATINI, C., CERI, S., AND NAVATHE, S. B. 1992. Conceptual Database Design. Benjamin- Cummings, Redwood City, CA. Google ScholarGoogle Scholar
  7. BATINI, C., DI BATTISTA, G., AND SANTUCCI, G. 1993. Structuring primitives for a dictionary of entity relationship data schemas. IEEE Trans. Softw. Eng. 19, 4 (Apr.), 344-365. Google ScholarGoogle Scholar
  8. BECK, H. W., ANWAR, T., AND NAVATHE, S.B. 1994. A conceptual clustering algorithm for database schema design. IEEE Trans. Knowl. Data Eng. 6, 3 (June), 396-411. Google ScholarGoogle Scholar
  9. BELLINZONA, R., FUGINI, M. G., AND PERNICI, B. 1995. Reusing specifications in OO applications. IEEE Softw. 12, 2 (March), 65-75. Google ScholarGoogle Scholar
  10. BERTINO, E. AND MARTINO, L. 1993. Object-Oriented Database Systems--Concepts andArchitectures. International Computer Sciences Series, Addison-Wesley, Reading, MA. Google ScholarGoogle Scholar
  11. BRIGHT, M. W., HURSON, A. R., AND PAKZAD, S. 1994. Automated resolution of semantic heterogeneity in multidatabases. ACM Trans. Database Syst. 19, 2 (June), 212-253. Google ScholarGoogle Scholar
  12. BRODIE, M. L. AND STONEBRAKER, M. 1992. DARWIN: On the incremental migration of legacy information systems. DOM Tech. Rep., TM-0588-10-92-165, GTE Laboratories Inc., November.Google ScholarGoogle Scholar
  13. CACM 1995. Special issue on hypermedia design. Commun. ACM 38, 8 (August).Google ScholarGoogle Scholar
  14. CASTANO, S., DE ANTONELLIS, V., AND ZONTA, B. 1992. Classifying and reusing conceptual schemas. In Proceedings of ER'92 International Conference on the Entity-Relationship Approach (Karlsruhe, Oct.), 121-138. Google ScholarGoogle Scholar
  15. CASTANO, S. AND DE ANTONELLIS, V. 1994. The F3 reuse environment for requirements engineering. ACM SIGSOFT Softw. Eng. Not. 19, 3 (July) 62-65. Google ScholarGoogle Scholar
  16. CASTANO, S. AND DE ANTONELLIS, V. 1995. Reference conceptual architectures for re-engineering information systems. Int. J. Coop. Inf. Syst. 4, 2&3, 213-235.Google ScholarGoogle Scholar
  17. CASTANO, S., DE ANTONELLIS, V., AND PERNICI, B. 1995. Building reusable conceptual components in the public administration domain. In Proceedings of SSR'95, ACM SIGSOFT Conference on Software Reuse (Seattle, April), 81-87. Google ScholarGoogle Scholar
  18. CERI, S. (ED.) 1983. Methodology and Tools for Database Design. North-Holland, Amsterdam. Google ScholarGoogle Scholar
  19. CHEN, P. P. 1976. The entity-relationship model: Towards a unified view of data. ACM Trans. Database Syst. 1, 1 (March), 9-37. Google ScholarGoogle Scholar
  20. COSTANTOPOULOS, P., JARKE, M., MYLOPOULOS, J., AND VASSILIOU, Y. 1995. The software information base: A server for reuse. VLDB J. 4, 1 (Jan.), 1-43. Google ScholarGoogle Scholar
  21. DAMIANI, E. AND FUGINI, M. a. 1995. Automatic thesaurus construction supporting fuzzy retrieval of reusable components. In Proceedings of the ACM SIG-APP Conference on Applied Computing (SAC'95) (Nashville, Feb.), 542-547. Google ScholarGoogle Scholar
  22. DE ANTONELLIS, V. AND ZONTA, B. 1990. A disciplined approach to office analysis. IEEE Trans. Softw. Eng. 16, 8 (Aug.), 822-828. Google ScholarGoogle Scholar
  23. DE ANTONELLIS, V., CASTANO, S., AND VANDONI, L. 1994. Building reusable components through project evolution analysis. Inf. Syst. 19, 3 (April), 259-274. Google ScholarGoogle Scholar
  24. DEVANBU, P., BRACHMAN, R. J., SELFRIDGE, P. G., AND BALLARD, B. W. 1991. LASSIE: A knowledge-based software information system. Commun. ACM 34, 5 (May), 35-49. Google ScholarGoogle Scholar
  25. DVORAK, J. 1994. Conceptual entropy and its effect on class hierarchies. IEEE Comput. (June), 59-63. Google ScholarGoogle Scholar
  26. FAUSTLE, S. AND FUGINI, M.G. 1993. Querying a software information base for component reuse. In Proceedings of the Second ACM/IEEE International Workshop on Software Reusability (Lucca, Italy, March), 89-98.Google ScholarGoogle Scholar
  27. FELDMAN, P. AND MILLER, D. 1986. Entity model clustering: Structuring a data model by abstraction. Comput. J. 29, 4, 348-360.Google ScholarGoogle Scholar
  28. FRANCALANCI, C. AND PERNICI, B. 1994. Abstraction levels for entity-relationship schemas. In Proceedings of the Thirteenth International Conference on the Entity-Relationship Approach (ER '94) (Manchester, UK, Dec.), 456-473. Google ScholarGoogle Scholar
  29. FRANCALANCI, C., FUGINI, M. G., AND PERNICI, B. 1994. Structuring requirements: Existing approaches and automatic support. In Proceedings of the IEEE Conference on Systems, Man, and Cybernetics (San Antonio, TX, Oct.), 824-829.Google ScholarGoogle Scholar
  30. FRAKES, W. B. AND POLE, T. P. 1994. An empirical study of representation methods for reusable software components. IEEE Trans. Softw. Eng. 20, 8 (Aug.), 617-630. Google ScholarGoogle Scholar
  31. FRAKES, W. B., PRIETO-DIAZ, R., AND FOX, C. 1995. DARE: Domain analysis and reuse environment. In Proceedings of the Workshop on Institutionalizing Software Reuse (St. Charles, IL, Aug.) K. Wentzel and L. Latour, Eds.Google ScholarGoogle Scholar
  32. HAMMER, J. AND MCLEOD, D. 1993. An approach to resolving semantic heterogeneity in a federation of autonomous heterogeneous database systems. Int. J. Intell. Coop. Inf. Syst. 2, 1 (June), 51-83.Google ScholarGoogle Scholar
  33. HOPKING, T. AND PHILLIPS, C. 1988. Numerical Methods in Practice: Using the NAG Library. Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  34. JACOBSON, I. 1992. Object-Oriented Software Engineering--A Use Case Driven Approach. ACM Press, Addison-Wesley, New York. Google ScholarGoogle Scholar
  35. JARKE, M. ET AL. 1992. DAIDA: An environment for evolving information systems. ACM Trans. Inf. Syst. 10, 1 (Jan.), 1-50. Google ScholarGoogle Scholar
  36. JOHNSON, W. L., FEATHER, M. S., AND HARRIS, D.R. 1992. Representation and presentation of requirements knowledge. IEEE Trans. Softw. Eng. 18, 10 (Oct.), 853-869. Google ScholarGoogle Scholar
  37. KASHYAP, V. AND SHETH, A. 1993. Schema correspondences between objects with semantic proximity. DCS-TR-301, Tech. Rep., Rutgers University, Dept. of Computer Science, October.Google ScholarGoogle Scholar
  38. KIM, W., CHOI, I., GALA, S., AND SCHEEVEL, M. 1995. On resolving schematic heterogeneity in multidatabase systems. In Modern Database Systems--The Object Model, Interoperability and Beyond. W. Kim, Ed., ACM Press, New York, 521-550. Google ScholarGoogle Scholar
  39. KIM, Y. G. AND MARCH, S.T. 1995. Comparing data modeling formalisms. Commun. ACM 38, 6 (June), 103-115. Google ScholarGoogle Scholar
  40. KLIR, G. J. AND FLOGER, T.A. 1988. Fuzzy Sets, Uncertainty, and Information. Prentice- Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar
  41. KRUEGER, C.W. 1992. Software reuse. ACM Comput. Surv. 24, 2 (June), 131-183. Google ScholarGoogle Scholar
  42. MAAREK, Y. S., BERRY, D. M., AND KAISER, G.E. 1991. An information retrieval approach for automatically constructing software libraries. IEEE Trans. Softw. Eng. 17, 8 (Aug.), 800-813. Google ScholarGoogle Scholar
  43. MADNICK, S.E. 1995. From VLDB to VMLDB (very many large data bases): Dealing with large-scale semantic heterogeneity. In Proceedings of the 21st VLDB Conference (Zurich, Sept.), 11-16. Google ScholarGoogle Scholar
  44. MAIDEN, N.A. 1991. Analogy as a paradigm for specification reuse. Softw. Eng. J. (Jan.), 3-15. Google ScholarGoogle Scholar
  45. MAIDEN, N. A. AND SUTCLIFFE, A. G. 1992. Exploiting reusable specifications through analogy. Commun. ACM 35, 4 (April), 55-64. Google ScholarGoogle Scholar
  46. MAIDEN, N. A., ASSENOVA, P., CONSTANTOPOULOS, P., JARKE, M., JOHANNESSON, P., NISSEN, H., SPANOUDAKIS, G., AND SUTCLIFFE, A.G. 1995. Computational mechanisms for distributed requirements engineering. In Proceedings of Seventh International Software Engineering and Knowledge Engineering Conference (June).Google ScholarGoogle Scholar
  47. MILI, H., MILI, F., AND MILI, A. 1995. Reusing software: Issues and research directions. IEEE Trans. Softw. Eng. 21, 6 (June), 528-562. Google ScholarGoogle Scholar
  48. MYLOPOULOS, J., BORGIDA, A., JARKE, M., AND KOUBARAKIS, M. 1990. Telos: Representing knowledge about information systems. ACM Trans. Inf. Syst. 8 (Oct.), 325-362. Google ScholarGoogle Scholar
  49. OSTERTAG, E., HANDLER, J., PRIETO-DIAZ, R., AND BRAUN, C. 1992. Computing similarity in a reuse library system: An AI-based approach. ACM Trans. Softw. Eng. Methodol. 1, 3 (July), 205-228. Google ScholarGoogle Scholar
  50. PAPAZOGLOU, M.P. 1995. Unraveling the semantics of conceptual schemas. Commun. ACM 38, 9 (Sept.), 80-94. Google ScholarGoogle Scholar
  51. PRIETO-DIAZ, R. 1995. Systematic reuse: A scientific or engineering method? In Software Engineering Notes, Special issue, Proceedings of the Symposium on Software Reusability SSR'95 (Aug.), M. Samadzadeh and M. Zand, Eds., 9-10. Google ScholarGoogle Scholar
  52. PRIETO-DIAZ, R. 1987. Domain analysis for reusability. In Proceedings of the International Conference IEEE COMPSAC '87 (Tokyo, Oct. 1987), 23-29.Google ScholarGoogle Scholar
  53. PRIETO-DIAZ, R. 1991. Implementing faceted classification for software reuse. Commun. ACM 34, 5 (May), 89-97. Google ScholarGoogle Scholar
  54. REUBENSTEIN, H. B. AND WATERS, R. C. 1991. The requirements apprentice: Automated assistance for requirements acquisition. IEEE Trans. Softw. Eng. 17, 3 (March), 226-240. Google ScholarGoogle Scholar
  55. SALTON, G. 1989. Automatic Text Processing--The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading, MA. Google ScholarGoogle Scholar
  56. SALTON, G., ALLAN, J., AND BUCKLEY, C. 1994. Automatic structuring and retrieval of large text files. Commun. ACM 37, 2 (Feb.), 97-108. Google ScholarGoogle Scholar
  57. SCHMIDT, D.C. 1995. Using design patterns to develop reusable object-oriented communication software. Commun. ACM 38, 10 (Oct.), 65-74. Google ScholarGoogle Scholar
  58. SHETH, A. P. AND KASHYAP, V. 1992. So far (schematically) yet so near (semantically). In Proceedings of the Conference on Semantics and Interoperable Database Systems (DS-5, Lorne, Australia). Google ScholarGoogle Scholar
  59. SHETH, A. P. AND LARSON, J.P. 1990. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22, 3 (Sept.), 183-236. Google ScholarGoogle Scholar
  60. SHETH, A. P., GALA, S. K., AND NAVATHE, S.B. 1993. On automatic reasoning for schema integration. Int. J. Intell. Coop. Inf. Syst. 2, 1 (June), 23-50.Google ScholarGoogle Scholar
  61. SIEGEL, M., AND MADNICK, S.E. 1991. A metadata approach to resolving semantic conflicts. In Proceedings of the Seventeenth VLDB Conference (Barcelona, Sept.), 133-145. Google ScholarGoogle Scholar
  62. SINDRE, G., CONRADI, R., AND KARLSSON, E.A. 1995. The REBOOT approach to software reuse. J. Syst. Softw. 30, 3, (Sept.). Google ScholarGoogle Scholar
  63. SPACCAPIETRA, S., PARENT, C., AND DUPONT, Y. 1992. Model independent assertions for integration of heterogeneous schemas. VLDB J. 1, 81-126. Google ScholarGoogle Scholar
  64. SPANOUDAKIS, G. AND CONSTANTOPOULOS, P. 1993. Similarity for analogical software reuse: A conceptual modelling approach. In Proceedings of CAiSE '93, International Conference on Advanced Information Systems Engineering (Paris, June). Google ScholarGoogle Scholar
  65. TEOREY, T. J., WEI, G., BOLTON, D. L., AND KOENIG, J.A. 1989. ER model clustering as an aid for user communication and documentation in database design. Commun. ACM 3, 8 (Aug.). Google ScholarGoogle Scholar
  66. THOMPSON, C. 1993. Living an enterprise model. Database Program. Des. (March).Google ScholarGoogle Scholar
  67. YOURDON, E. 1989. Modern Structured Analysis. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar
  68. WIRFS-BROCK, R., WILKERSON, B., AND WIENER, L. 1990. Designing Object-Oriented Software. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar

Index Terms

  1. Conceptual schema analysis: techniques and applications

      Recommendations

      Reviews

      Fazli Can

      In this lengthy paper, the authors introduce semi-automatic conceptual schema analysis techniques. Their techniques include steps such as indexing (feature selection), thesaurus construction, and schema clustering. They envision that the techniques can be used in schema reuse in application development; in analysis of legacy systems; to enforce interoperability among heterogeneous distributed systems; and in analysis of semi-structured information (such as information presented on the Web). The applications in practical situations are presented without any quantitative comparison with manual methods. As the size and variety of the available information increase, the methods needed to organize and retrieve it must adapt; otherwise we start to reinvent, do not take advantage of what is available, and repeat the mistakes of the past. In this respect, the efforts of this study are worthwhile; however, they still need validation by rigorous comparison with manual methods. The paper is excessively long and repetitious: sections 2 and 3 and, similarly, sections 4 and 5 could have been combined and shortened. The paper should be useful to practitioners and researchers.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader