research-article

Conceptual data model-based software size estimation for information systems

Authors:
Hee Beng Kuan Tan

Nanyang Technological University, Nanyang Avenue, Singapore

Nanyang Technological University, Nanyang Avenue, Singapore
View Profile

,
Yuan Zhao

Nanyang Technological University, Nanyang Avenue, Singapore

Nanyang Technological University, Nanyang Avenue, Singapore
View Profile

,
Hongyu Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

ACM Transactions on Software Engineering and Methodology Volume 19 Issue 2Article No.: 4pp 1–37https://doi.org/10.1145/1571629.1571630

Published:14 October 2009Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Size estimation plays a key role in effort estimation that has a crucial impact on software projects in the software industry. Some information required by existing software sizing methods is difficult to predict in the early stage of software development. A conceptual data model is widely used in the early stage of requirements analysis for information systems. Lines of code (LOC) is a commonly used software size measure. This article proposes a novel LOC estimation method for information systems from their conceptual data models through using a multiple linear regression model. We have validated the proposed method using samples from both the software industry and open-source systems.

References

Albrecht, A. J. and Gaffney, J. E. JR. 1983. Software function, source lines of code, and development effort prediction: A software science validation. IEEE Trans. Softw. Eng. SE-9, 6 (Nov.), 639--648. Google ScholarDigital Library
Armour, P. 2002. Ten unmyths of project estimation: Reconsidering some commonly accepted project management practices. Commun. ACM 45, 11, 15--18. Google ScholarDigital Library
Belsley, D. A., Kuh, E., and Welsch, R. E. 2004. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley, New York, NY.Google Scholar
Blaha, M. and Premerlani, W. 1998. Object-Oriented Modeling and Design for Database Applications. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
Boehm, B. W. and Fairley, R. E. 2000. Software estimation perspectives. IEEE Softw. 17, 6 (Nov./Dec.), 22--26. Google ScholarDigital Library
Boehm, B. W. et al. 2000. Software Cost Estimation with COCOMO II. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
Briand, L. C., Eman, K. E., Surmann, D., Wieczorek, I., and Maxwell, K. D. 1999. An assessment and comparison of common software cost estimation modeling techniques. In Proceedings of the International Conference on Software Engineering. 313--322. Google ScholarDigital Library
Briand, L. C. and Wieczorek, I. 2002. Resource modeling in software engineering. Encyclopedia of Software Engineering, J. Marciniak, Wiley, Ed. New York, NY, 1160--1196.Google Scholar
Burgess, R. S. 1988. Structured Program Design Using JSP. Hutchension, London, U.K.Google Scholar
Canfora, G., Cerulo, L., and Troiano, L. 2004. An experience of fuzzy linear regression applied to effort estimation. In Proceedings of the 16th International Conference on Software Engineering&Knowledge Engineering. 57--61.Google Scholar
Chen, P. P. 1976. The entity-relationship model—towards a unified view of data. ACM Trans. Database Syst. 1, 1, 9--36. Google ScholarDigital Library
Cosmic. 1999. Full Function Points—Release 2.0. September. Engineering Management Research Lab, Montreal, P.Q., Canada.Google Scholar
Costagliola, G., Ferrucci, F., Tortora, G., and Vitiello, G. 2005. Class point: An approach for the size estimation of object-oriented systems. IEEE Trans. Softw. Eng. 31, 1, 52--74. Google ScholarDigital Library
Foss, T., Stensrud, E., Kitchenenham, B., and Myrtveit, I. 2003. A simulation study of the model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29, 11, (Nov.), 985--995. Google ScholarDigital Library
Garmus, D. and Herron, D. 2000. Function Point Analysis: Measurement Practices for Successful Software Projects. Addison Wesley, Reading, MA. Google ScholarDigital Library
Ghezzi, C., Jazayeri, M., and Mandrioli, D. 2003. Fundamentals of Software Engineering, 2nd Ed., Prentice-Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
Glantz, S. A. and Slinker, B. K. 2001. Primer of Applied Regression&Analysis of Variance. 2nd ed. McGraw-Hill, New York, NY.Google Scholar
Hay, D. 2002. Requirements Analysis: From Business Views to Architecture. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
Jeffery, D. R., Low, G. C., and Barnes, M. 1993. A comparison of function point counting techniques. IEEE Trans. Softw. Eng. 19, 5 (May), 529--532. Google ScholarDigital Library
Jeffery, D. R. and Walkerden, F. 1999. An empirical study of analogy-based software effort estimation. Empir. Softw. Eng. 4, 2, 135--158. Google ScholarDigital Library
Jensen, R. 1983. An improved macrolevel software development resource estimation model. In Proceedings of the 5th ISPA Conference. 82--92.Google Scholar
Jones, T. C. 1997. Applied Software Measurement. McGraw-Hill, New York, NY. Google ScholarDigital Library
Kennedy, P. 2003. A Guide to Econometrics, 5th ed. Blackwell, London, U.K.Google Scholar
Lai, R. and Huang, S. J. 2003. A model for estimating the size of a formal communication protocol application and its implementation. IEEE Trans. Softw. Eng. 29, 1, (Jan.), 46--62. Google ScholarDigital Library
Laranjeira, L. A. 1990. Software size estimation of object-oriented systems. IEEE Trans. Softw. Eng. 16, 5 (May), 510--522. Google ScholarDigital Library
McClave, J. T. and Sincich, T. 2003. Statistics, 9th ed. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Miranda, E. 2000. An evaluation of the paired comparisons method for software sizing. In Proceedings of the International Conference on Software Engineering. 597--604. Google ScholarDigital Library
Mohagheghi, P., Anda, B., and Conradi, R. 2005. Effort estimation of use cases for incremental large-scale software development. In Proceedings of the International Conference on Software Engineering. 303--311. Google ScholarDigital Library
Molokken, K. and Jorgensen, M. 2003. A review of surveys on software effort estimation. In Proceedings of the International Symposium on Empirical Software Engineering. 223--230. Google ScholarDigital Library
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Regression Models. McGraw-Hill/Irwin, New York, NY.Google Scholar
Putnam, L. and Myers, W. 1992. Measures for Excellence. Yourdon Press Computing Series. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Ruhe, M., Jeffery, R., and Wieczorek, I. 2003. Cost estimation for Web applications. In Proceedings of the International Conference on Software Engineering. 285--294. Google ScholarDigital Library
Smith, J. 1999. The estimation of effort based on use cases, Rational Software white paper. IBM, Somers, NY.Google Scholar
Snoeck, M. and Dedene, G. 1998. Existence dependency: The key to semantic integrity between structural and behavioral aspects of object types. IEEE Trans. Softw. Eng. 24, 4, 233--251. Google ScholarDigital Library
Stensrud, E., Foss, T., Kitchenham, B., and Myrtveit, I. 2002. An empirical validation of the relationship between the magnitude of relative error and project size. In Proceedings of the IEEE Symposium on Software Metrics. 3--12. Google ScholarDigital Library
Tan, H. B. K. and Zhao, Y. 2004. ER-based software sizing for data-intensive systems. In Proceedings of the International Conference on Conceptual Modeling. 180--190.Google Scholar
Tan, H. B. K. and Zhao, Y. 2006. Sizing data-intensive systems from ER model. In IEICE Trans. Inform. Syst. 89-D, 4, 1321--1326. Google ScholarDigital Library
Tan, H. B. K., Zhao, Y., and Zhang, H. 2006. Estimating LOC for information systems from their conceptual data models. In Proceedings of the International Conference on Software Engineering. 321--330. Google ScholarDigital Library
Teorey, T. J., Yang, D., and Fry, J. P. 1986. A logical design methodology for relational databases using the extended entity-relationship model. ACM Comput. Surv. 18, 2 (June), 197--222. Google ScholarDigital Library

Index Terms

Conceptual data model-based software size estimation for information systems
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
        Software product lines

Recommendations

Estimating LOC for information systems from their conceptual data models
ICSE '06: Proceedings of the 28th international conference on Software engineering

Effort and cost estimation is crucial in software management. Estimation of software size plays a key role in the estimation. Line of Code (LOC) is still a commonly used software size measure. Despite the fact that software sizing is well recognized as ...
Read More
A General Empirical Solution to the Macro Software Sizing and Estimating Problem

Application software development has been an area of organizational effort that has not been amenable to the normal managerial and cost controls. Instances of actual costs of several times the initial budgeted cost, and a time to initial operational ...
Read More
Sizing Data-Intensive Systems from ER Model

There is still much problem in sizing software despite the existence of well-known software sizing methods such as Function Point method. Many developers still continue to use ad-hoc methods or so called "expert" approaches. This is mainly due to the ...
Read More

Reviews

Reviewer: Alan Raymond Hevner

The authors propose an approach for estimating software size from conceptual data models of information systems. Three independent variables characterize the conceptual data model: C , "the total number of classes"; R , "the total number of unidirectional relationship types"; and , "the average number of attributes per class." Drawing data from actual development projects in industry and open-source repositories, the authors build multiple linear regression models for size estimation in different system environments, such as industry Visual Basic systems, open-source PHP systems, industry Java systems, and open-source Java systems. The derived regression models are validated to predict system size in terms of number of lines of code, within an acceptable range of performance. Furthermore, the paper demonstrates that the new approach for cost estimation that predicts software size is comparable to the use of function points. The key advantages of using conceptual data models for cost estimation are the parsimonious use of only three parameters and the fact that such conceptual models are "more readily available in the early stage of software development" than many of the function point parameters. Thus, this cost estimation approach should appeal to managers of information system development projects with well-defined conceptual data models. Online Computing Reviews Service

Reviewer: Larry Bernstein

This scholarly paper makes several important contributions to the software engineering problem of how to size a system. Remember that size is the principal factor when determining the cost of system development. The authors provide models for sizing certain types of applications. They also indicate how to improve productivity. Their approach is reasonable, a good read, and dovetails with my own sizing experience. I especially like the way the authors carefully qualify their results. The authors make the wonderful observation that "the number of attributes in classes does not affect the complexity as much as the number of classes and relationship types." This insight suggests how software engineers can constrain software design in order to minimize size. They clearly define the nature of the sizing problem. Section 3.1 restricts their findings to information systems. An information system is defined as "a database application that supports business processes and functions through maintaining a database using a standard database management system (DBMS)." In an entity-relationship model, information systems have the following properties: a graphical user interface (GUI); business logic embedded in the database; report generators; the ability to update data structures to reflect business processes; simple data functions; and error correction programs. The researchers confess that without careful data controls, their results were inconsistent. Then, they explain how to get valid data. Their results are based on a good set of data with Java code. Although the authors fail to define a physical line of code, I think they count the number of carriage returns, subtracting comment lines and blank lines. They do not indicate whether the lines of code include data definition statements. They use Code Counter Pro 1.21, which implicitly defines a line of code. Their model contains a set of equations computing KLOC, the metric of 1,000 lines of code, as a linear function of the number of classes, the number of relational types, and the average number of attributes per class. Each equation deals with a carefully defined subset of information systems. They do not study nonfeature code needed to satisfy architectural constraints such as reliability, performance, recovery, operability, scalability, and maintainability. Also, when function points are used for sizing, the authors report that reuse level, security, GUI, and external reports and inquiry need to be added to adjustment factors. There is a small but important error in Section 5.1.2: the conversion factor of 53 is not prescribed; it is an industry approximation and needs to be calibrated individually for each software shop. Overall, this paper presents a thoughtful approach to software sizing. If your application fits within the authors' constraints, you can profitably use their approach. (Heed their warning and do not apply their model carelessly.) I look forward to future papers on these continuing research efforts and insights. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Software Engineering and Methodology Volume 19, Issue 2
October 2009
115 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/1571629
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2009
- Accepted: 1 April 2008
- Revised: 1 June 2007
- Received: 1 June 2006
Published in tosem Volume 19, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software sizing
conceptual data model
line of code (LOC)
multiple linear regression model
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 1,500
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Conceptual data model-based software size estimation for information systems

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Estimating LOC for information systems from their conceptual data models

A General Empirical Solution to the Macro Software Sizing and Estimating Problem

Sizing Data-Intensive Systems from ER Model

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Conceptual data model-based software size estimation for information systems

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Estimating LOC for information systems from their conceptual data models

A General Empirical Solution to the Macro Software Sizing and Estimating Problem

Sizing Data-Intensive Systems from ER Model

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media