skip to main content
research-article

Conceptual data model-based software size estimation for information systems

Authors Info & Claims
Published:14 October 2009Publication History
Skip Abstract Section

Abstract

Size estimation plays a key role in effort estimation that has a crucial impact on software projects in the software industry. Some information required by existing software sizing methods is difficult to predict in the early stage of software development. A conceptual data model is widely used in the early stage of requirements analysis for information systems. Lines of code (LOC) is a commonly used software size measure. This article proposes a novel LOC estimation method for information systems from their conceptual data models through using a multiple linear regression model. We have validated the proposed method using samples from both the software industry and open-source systems.

References

  1. Albrecht, A. J. and Gaffney, J. E. JR. 1983. Software function, source lines of code, and development effort prediction: A software science validation. IEEE Trans. Softw. Eng. SE-9, 6 (Nov.), 639--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Armour, P. 2002. Ten unmyths of project estimation: Reconsidering some commonly accepted project management practices. Commun. ACM 45, 11, 15--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Belsley, D. A., Kuh, E., and Welsch, R. E. 2004. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley, New York, NY.Google ScholarGoogle Scholar
  4. Blaha, M. and Premerlani, W. 1998. Object-Oriented Modeling and Design for Database Applications. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Boehm, B. W. and Fairley, R. E. 2000. Software estimation perspectives. IEEE Softw. 17, 6 (Nov./Dec.), 22--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boehm, B. W. et al. 2000. Software Cost Estimation with COCOMO II. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Briand, L. C., Eman, K. E., Surmann, D., Wieczorek, I., and Maxwell, K. D. 1999. An assessment and comparison of common software cost estimation modeling techniques. In Proceedings of the International Conference on Software Engineering. 313--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Briand, L. C. and Wieczorek, I. 2002. Resource modeling in software engineering. Encyclopedia of Software Engineering, J. Marciniak, Wiley, Ed. New York, NY, 1160--1196.Google ScholarGoogle Scholar
  9. Burgess, R. S. 1988. Structured Program Design Using JSP. Hutchension, London, U.K.Google ScholarGoogle Scholar
  10. Canfora, G., Cerulo, L., and Troiano, L. 2004. An experience of fuzzy linear regression applied to effort estimation. In Proceedings of the 16th International Conference on Software Engineering&Knowledge Engineering. 57--61.Google ScholarGoogle Scholar
  11. Chen, P. P. 1976. The entity-relationship model—towards a unified view of data. ACM Trans. Database Syst. 1, 1, 9--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cosmic. 1999. Full Function Points—Release 2.0. September. Engineering Management Research Lab, Montreal, P.Q., Canada.Google ScholarGoogle Scholar
  13. Costagliola, G., Ferrucci, F., Tortora, G., and Vitiello, G. 2005. Class point: An approach for the size estimation of object-oriented systems. IEEE Trans. Softw. Eng. 31, 1, 52--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Foss, T., Stensrud, E., Kitchenenham, B., and Myrtveit, I. 2003. A simulation study of the model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29, 11, (Nov.), 985--995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Garmus, D. and Herron, D. 2000. Function Point Analysis: Measurement Practices for Successful Software Projects. Addison Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ghezzi, C., Jazayeri, M., and Mandrioli, D. 2003. Fundamentals of Software Engineering, 2nd Ed., Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Glantz, S. A. and Slinker, B. K. 2001. Primer of Applied Regression&Analysis of Variance. 2nd ed. McGraw-Hill, New York, NY.Google ScholarGoogle Scholar
  18. Hay, D. 2002. Requirements Analysis: From Business Views to Architecture. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeffery, D. R., Low, G. C., and Barnes, M. 1993. A comparison of function point counting techniques. IEEE Trans. Softw. Eng. 19, 5 (May), 529--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jeffery, D. R. and Walkerden, F. 1999. An empirical study of analogy-based software effort estimation. Empir. Softw. Eng. 4, 2, 135--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jensen, R. 1983. An improved macrolevel software development resource estimation model. In Proceedings of the 5th ISPA Conference. 82--92.Google ScholarGoogle Scholar
  22. Jones, T. C. 1997. Applied Software Measurement. McGraw-Hill, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kennedy, P. 2003. A Guide to Econometrics, 5th ed. Blackwell, London, U.K.Google ScholarGoogle Scholar
  24. Lai, R. and Huang, S. J. 2003. A model for estimating the size of a formal communication protocol application and its implementation. IEEE Trans. Softw. Eng. 29, 1, (Jan.), 46--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Laranjeira, L. A. 1990. Software size estimation of object-oriented systems. IEEE Trans. Softw. Eng. 16, 5 (May), 510--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. McClave, J. T. and Sincich, T. 2003. Statistics, 9th ed. Prentice-Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  27. Miranda, E. 2000. An evaluation of the paired comparisons method for software sizing. In Proceedings of the International Conference on Software Engineering. 597--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mohagheghi, P., Anda, B., and Conradi, R. 2005. Effort estimation of use cases for incremental large-scale software development. In Proceedings of the International Conference on Software Engineering. 303--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Molokken, K. and Jorgensen, M. 2003. A review of surveys on software effort estimation. In Proceedings of the International Symposium on Empirical Software Engineering. 223--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Regression Models. McGraw-Hill/Irwin, New York, NY.Google ScholarGoogle Scholar
  31. Putnam, L. and Myers, W. 1992. Measures for Excellence. Yourdon Press Computing Series. Prentice-Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  32. Ruhe, M., Jeffery, R., and Wieczorek, I. 2003. Cost estimation for Web applications. In Proceedings of the International Conference on Software Engineering. 285--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Smith, J. 1999. The estimation of effort based on use cases, Rational Software white paper. IBM, Somers, NY.Google ScholarGoogle Scholar
  34. Snoeck, M. and Dedene, G. 1998. Existence dependency: The key to semantic integrity between structural and behavioral aspects of object types. IEEE Trans. Softw. Eng. 24, 4, 233--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stensrud, E., Foss, T., Kitchenham, B., and Myrtveit, I. 2002. An empirical validation of the relationship between the magnitude of relative error and project size. In Proceedings of the IEEE Symposium on Software Metrics. 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tan, H. B. K. and Zhao, Y. 2004. ER-based software sizing for data-intensive systems. In Proceedings of the International Conference on Conceptual Modeling. 180--190.Google ScholarGoogle Scholar
  37. Tan, H. B. K. and Zhao, Y. 2006. Sizing data-intensive systems from ER model. In IEICE Trans. Inform. Syst. 89-D, 4, 1321--1326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tan, H. B. K., Zhao, Y., and Zhang, H. 2006. Estimating LOC for information systems from their conceptual data models. In Proceedings of the International Conference on Software Engineering. 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Teorey, T. J., Yang, D., and Fry, J. P. 1986. A logical design methodology for relational databases using the extended entity-relationship model. ACM Comput. Surv. 18, 2 (June), 197--222. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Conceptual data model-based software size estimation for information systems

    Recommendations

    Reviews

    Alan Raymond Hevner

    The authors propose an approach for estimating software size from conceptual data models of information systems. Three independent variables characterize the conceptual data model: C , "the total number of classes"; R , "the total number of unidirectional relationship types"; and , "the average number of attributes per class." Drawing data from actual development projects in industry and open-source repositories, the authors build multiple linear regression models for size estimation in different system environments, such as industry Visual Basic systems, open-source PHP systems, industry Java systems, and open-source Java systems. The derived regression models are validated to predict system size in terms of number of lines of code, within an acceptable range of performance. Furthermore, the paper demonstrates that the new approach for cost estimation that predicts software size is comparable to the use of function points. The key advantages of using conceptual data models for cost estimation are the parsimonious use of only three parameters and the fact that such conceptual models are "more readily available in the early stage of software development" than many of the function point parameters. Thus, this cost estimation approach should appeal to managers of information system development projects with well-defined conceptual data models. Online Computing Reviews Service

    Larry Bernstein

    This scholarly paper makes several important contributions to the software engineering problem of how to size a system. Remember that size is the principal factor when determining the cost of system development. The authors provide models for sizing certain types of applications. They also indicate how to improve productivity. Their approach is reasonable, a good read, and dovetails with my own sizing experience. I especially like the way the authors carefully qualify their results. The authors make the wonderful observation that "the number of attributes in classes does not affect the complexity as much as the number of classes and relationship types." This insight suggests how software engineers can constrain software design in order to minimize size. They clearly define the nature of the sizing problem. Section 3.1 restricts their findings to information systems. An information system is defined as "a database application that supports business processes and functions through maintaining a database using a standard database management system (DBMS)." In an entity-relationship model, information systems have the following properties: a graphical user interface (GUI); business logic embedded in the database; report generators; the ability to update data structures to reflect business processes; simple data functions; and error correction programs. The researchers confess that without careful data controls, their results were inconsistent. Then, they explain how to get valid data. Their results are based on a good set of data with Java code. Although the authors fail to define a physical line of code, I think they count the number of carriage returns, subtracting comment lines and blank lines. They do not indicate whether the lines of code include data definition statements. They use Code Counter Pro 1.21, which implicitly defines a line of code. Their model contains a set of equations computing KLOC, the metric of 1,000 lines of code, as a linear function of the number of classes, the number of relational types, and the average number of attributes per class. Each equation deals with a carefully defined subset of information systems. They do not study nonfeature code needed to satisfy architectural constraints such as reliability, performance, recovery, operability, scalability, and maintainability. Also, when function points are used for sizing, the authors report that reuse level, security, GUI, and external reports and inquiry need to be added to adjustment factors. There is a small but important error in Section 5.1.2: the conversion factor of 53 is not prescribed; it is an industry approximation and needs to be calibrated individually for each software shop. Overall, this paper presents a thoughtful approach to software sizing. If your application fits within the authors' constraints, you can profitably use their approach. (Heed their warning and do not apply their model carelessly.) I look forward to future papers on these continuing research efforts and insights. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 19, Issue 2
      October 2009
      115 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/1571629
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 October 2009
      • Accepted: 1 April 2008
      • Revised: 1 June 2007
      • Received: 1 June 2006
      Published in tosem Volume 19, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader