Abstract
Nowadays Spreadsheet-based Information Systems are widely used in industries to support different phases of their production processes. The intensive employment of Spreadsheets in industry is mainly due to their ease of use that allows the development of Information Systems even by not experienced programmers. The development of such systems is further aided by integrated scripting languages (e.g. Visual Basic for Applications, Libre Office Basic, JavaScript, etc.) that offer features for the implementation of Rapid Application Development processes. Although Spreadsheet-based Information Systems can be developed with a very short time to market, they are usually poorly documented or in some case not documented at all. As a consequence, they are very difficult to be comprehended, maintained or migrated towards other architectures, such as Database Oriented Information Systems or Web Applications. The abstraction of a data model from the source spreadsheet files represents a fundamental activity of the migration process towards different architectures. In our work we present an heuristic- based reverse engineering process for inferring a data model from an Excel based information system. The process is fully automatic and it is based on seven sequential steps. Both the applicability and the effectiveness of the proposed process have been assessed by an experiment we conducted in the automotive industrial context. The process was successfully used to obtain the UML class diagrams representing the conceptual data models of three different Spreadsheet-based Information Systems. The paper presents the results of the experiment and the lessons we learned from it.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abraham, R., Erwig, M.: Header and unit inference for spreadsheets through spatial analyses. In: Proceedings of the IEEE International Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 165–172 (2004)
Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), pp. 182–191. ACM, New York (2006)
Abraham, R., Erwig, M., Andrew, S.: A type system based on end-user vocabulary. In: Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 215–222. IEEE Computer Society, Washington, DC (2007)
Abraham, R., Erwig, M.: Mutation operators for spreadsheets. IEEE Trans. Softw. Eng. 35(1), 94–108 (2009)
Ahmad, Y., Antoniu, T., Goldwater, S., Krishnamurthi S.: A type system for statically detecting spreadsheet errors. In: Proceedings of the IEEE International Conference on Automated Software Engineering, pp. 174–183. (2003)
Amalfitano, D., Fasolino, A.R., Maggio, V., Tramontana, P., Di Mare, G., Ferrara, F., Scala, S.: Migrating legacy spreadsheets-based systems to Web MVC architecture: an industrial case study. In: Proceedings of CSMR-WCRE, pp. 387–390 (2014)
Amalfitano, D., Fasolino, A.R., Maggio, V., Tramontana, P., De Simone, V.: Reverse engineering of data models from legacy spreadsheets-based systems: an Industrial Case Study. In: Proceedings of the 22nd Italian Symposium on Advanced Database System, pp. 123–130 (2014)
Amalfitano, D., Fasolino, A.R., Tramontana, P., De Simone, V., Di Mare, G., Scala, S.: Information extraction from legacy spreadsheet-based information system - an experience in the automotive context. In: DATA 2014, pp. 389–398 (2014)
Bovenzi, D., Canfora, G., Fasolino, A.R.: Enabling legacy system accessibility by Web heterogeneous clients. In: Proceedings of the Seventh European Conference on Software Maintenance and Reengineering, pp. 73–81. IEEE CS Press (2003)
Canfora, G., Fasolino, A.R., Frattolillo, G., Tramontana, P.: A wrapping approach for migrating legacy system interactive functionalities to service oriented architectures. Elsevier, J. Syst. Softw. 81(4), 463–480 (2008)
Chen, Z., Cafarella, M.: Automatic web spreadsheet data extraction. In: Proceedings of the 3rd International Workshop on Semantic Search Over the Web (SS@ 2013), p. 8. ACM, New York (2013)
Cunha, J., Saraiva J., Visser, J.: From spreadsheets to relational databases and back. In: Proceedings of the 2009 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2009, pp 179–188. ACM, New York (2009)
Cunha, J., Erwig, M., Saraiva, J.: Automatically inferring ClassSheet models from spreadsheets. In: Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing, VLHCC 2010, pp 93–100. IEEE Computer Society (2010)
Cunha, J., Mendes J., Fernandes J.P., Saraiva J.: Embedding and evolution of spreadsheet models in spreadsheet systems. In: VL/HCC 2011: IEEE Symposium on Visual Languages and Human-Centric Computing, pp 186–201. IEEE Computer Society (2011)
Cunha, J., Fernandes, J.P., Mendes, J., Pacheco, H., Saraiva, J.: Bidirectional transformation of model-driven spreadsheets. In: Hu, Z., de Lara, J. (eds.) ICMT 2012. LNCS, vol. 7307, pp. 105–120. Springer, Heidelberg (2012)
Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: MDSheet: A framework for model-driven spreadsheet engineering. In: Proceedings of the 34rd International Conference on Software Engineering, ICSE 2012, pp 1412–1415. ACM (2012)
Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Towards an evaluation of bidirectional model-driven spreadsheets. In: User evaluation for Software Engineering Researchers, USER 2012, pp 25–28. ACM Digital Library (2012)
Cunha, J., Fernandes, J.P., Saraiva, J.: From relational ClassSheets to UML+OCL. In: The Software Engineering Track at the 27th Annual ACM Symposium on Applied Computing (SAC 2012), Riva del Garda (Trento), Italy, pp. 1151–1158. ACM (2012)
Cunha, J., Mendes, J., Saraiva, J., Visser, J.: Model-based programming environments for spreadsheets. Sci. Comput. Program. (SCP) 96(2), 254–275 (2014)
Cunha, J., Fernandes, J., Mendes, J., Saraiva, J.: Embedding, evolution, and validation of model-driven spreadsheets. IEEE Trans. Softw. Eng. 41(3), 241–263 (2014)
Cunha, J., Erwig, M., Mendes, J., Saraiva, J.: Model inference for spreadsheets. Autom. Softw. Eng., 1–32 (2014). Springer, USA
De Lucia, A., Francese, R., Scanniello, G., Tortora, G.: Developing legacy system migration methods and tools for technology transfer. Softw. Pract. Experience 38(13), 1333–1364 (2008). Wiley
Di Lucca, G.A., Fasolino, A.R., De Carlini, U.: Recovering class diagrams from data-intensive legacy systems. In: Proceedings of International Conference on Software Maintenance, ICSM, pp. 52–62. IEEE CS Press (2000)
Fisher, M., Rothermel, G.: The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In: 1st Workshop on End-User Software Engineering, pp. 47–51 (2005)
Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 52–75. Springer, Heidelberg (2010)
Hermans F., Pinzger, M., van Deursen, A.: Supporting professional spreadsheet users by generating leveled dataflow diagrams. In: Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011), pp. 451–460. ACM, New York (2011)
Hung, V., Benatallah, B., Saint-Paul R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge management (CIKM 2011), pp. 1749–1754. ACM, New York (2011)
Janvrin, D., Morrison, J.: Using a structured design approach to reduce risks in end user spreadsheet development. Inf. Manag. 37(1), 1–12 (2000)
Mittermeir, R., Clermont, M.: Finding high-level structures in spreadsheet programs. In: Proceedings of the Ninth Working Conference on Reverse Engineering (WCRE), pp. 221–232. IEEE Computer Society (2002)
Panko, R.R., Halverson, R.P.: Individual and group spreadsheet design: patterns of errors. In: Proceedings of the Hawaii International Conference on System Sciences (HICSS), pp. 4–10 (1994)
Ronen, B., Palley, M.A., Lucas, H.C.: Spreadsheet analysis and design. Commun. ACM 32, 84–93 (1989)
Scaffidi, C., Shaw, M., Myers, B.: Estimating the numbers of end users and end user programmers. In: 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, 20–24 September 2015, pp. 207–214 (2005)
Shokry, H., Hinchey, M.: Model-based verification of embedded software. IEEE Comput. 42(4), 53–59 (2009)
Acknowledgements
This work was carried out in the contexts of the research projects IESWECAN (Informatics for Embedded SoftWare Engineering of Construction and Agricultural machiNes - PON01-01516) and APPS4SAFETY (Active Preventive Passive Solutions for Safety - PON03PE_00159_3), both partially founded by the Italian Ministry for University and Research (MIUR).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Amalfitano, D., Fasolino, A.R., Tramontana, P., De Simone, V., Di Mare, G., Scala, S. (2015). A Reverse Engineering Process for Inferring Data Models from Spreadsheet-based Information Systems: An Automotive Industrial Experience. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds) Data Management Technologies and Applications. DATA 2014. Communications in Computer and Information Science, vol 178. Springer, Cham. https://doi.org/10.1007/978-3-319-25936-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-25936-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25935-2
Online ISBN: 978-3-319-25936-9
eBook Packages: Computer ScienceComputer Science (R0)