Skip to main content
Erschienen in: Innovations in Systems and Software Engineering 1/2018

20.12.2017 | Original Paper

Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques

verfasst von: Anjana Gosain, Jaspreeti Singh

Erschienen in: Innovations in Systems and Software Engineering | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data warehouse (DW) quality metrics help in evaluating quality attributes and building classification models for predicting multidimensional (MD) schemas as understandable/non-understandable, thereby assisting in DW maintenance. To evaluate DW MD schema quality, we have earlier proposed a set of metrics based on some important aspects of dimension hierarchies and its sharing (like sharing of few hierarchy levels within a dimension; sharing of few hierarchy levels between dimensions, within and across facts) which may lead to structural complexity of MD schemas, thereby affecting its quality. The preliminary empirical validation of these metrics using classical statistical techniques (correlation and linear regression) indicated some of them as possible understandability indicators. However, machine learning (ML) techniques can model the complex associations between DW structural metrics and their quality attributes in a better way. Therefore, this work employs five ML classifiers [J48, partial decision trees (PART), Naïve Bayes, support vector machines (SVM) and logistic regression] to empirically investigate whether accurate prediction models can be built, based on our structural metrics, to be used as understandability predictors. The obtained results reveal that four of our metrics are good predictors of understandability of DW MD schemas. The experimentation further involved comparing the classifiers using mainly five performance measures: accuracy, precision, sensitivity, specificity and area under the receiver operating characteristic curve. The study confirmed the predictive capability of ML techniques for understandability prediction of DW MD schemas. The results also suggest that the SVM and Naïve Bayes classifiers perform better than other classifiers included in the study. Further, the typically used logistic regression technique gave results that were reasonably competitive with the more sophisticated techniques. However, the tree-based (J48) and rule-based (PART) techniques performed significantly worse than the best performing techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
WEKA (Waikato Environment for Knowledge Analysis). http://​www.​cs.​waikato.​ac.​nz/​~ml/​weka/​.
 
2
CGPA stands for cumulative grade point average.
 
Literatur
1.
Zurück zum Zitat Abello A, Samos J, Saltor F (2006) YAM2: a multidimensional conceptual model extending UML. Inf Syst 31(6):541–567CrossRef Abello A, Samos J, Saltor F (2006) YAM2: a multidimensional conceptual model extending UML. Inf Syst 31(6):541–567CrossRef
2.
Zurück zum Zitat Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6(2):119–138CrossRef Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6(2):119–138CrossRef
3.
Zurück zum Zitat Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: The 18th IEEE international symposium on software reliability, pp 215–224 Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: The 18th IEEE international symposium on software reliability, pp 215–224
4.
Zurück zum Zitat Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635CrossRefMATH Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635CrossRefMATH
5.
Zurück zum Zitat Basili VR, Weiss DM (1984) A methodology for collecting valid software engineering data. IEEE Trans Softw Eng 10(6):728–738CrossRef Basili VR, Weiss DM (1984) A methodology for collecting valid software engineering data. IEEE Trans Softw Eng 10(6):728–738CrossRef
6.
Zurück zum Zitat Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRef Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRef
7.
Zurück zum Zitat Belsley D, Kuh E, Welsch R (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New YorkCrossRefMATH Belsley D, Kuh E, Welsch R (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New YorkCrossRefMATH
8.
Zurück zum Zitat Berenguer G, Romero R, Trujillo J, Serrano M, Piattini M (2005) A set of quality indicators and their corresponding metrics for conceptual models of data warehouses. Data warehousing and knowledge discovery. Springer, Berlin, pp 95–104CrossRef Berenguer G, Romero R, Trujillo J, Serrano M, Piattini M (2005) A set of quality indicators and their corresponding metrics for conceptual models of data warehouses. Data warehousing and knowledge discovery. Springer, Berlin, pp 95–104CrossRef
9.
Zurück zum Zitat Briand LC, Morasca S, Basili VR (1996) Property based software engineering measurement. IEEE Trans Softw Eng 22:68–86CrossRef Briand LC, Morasca S, Basili VR (1996) Property based software engineering measurement. IEEE Trans Softw Eng 22:68–86CrossRef
10.
Zurück zum Zitat Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273CrossRef Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273CrossRef
11.
Zurück zum Zitat Brieman L, Friedman J, Olshen R, Stone C (1984) Classification of regression trees. Wadsworth Inc, Belmont Brieman L, Friedman J, Olshen R, Stone C (1984) Classification of regression trees. Wadsworth Inc, Belmont
12.
Zurück zum Zitat Calero C, Piattini M, Pascual C, Serrano MA (2001) Towards data warehouse quality metrics. In: Proceedings of 3rd international workshop on design and management of data warehouse, Interlaken, Switzerland, p 2 Calero C, Piattini M, Pascual C, Serrano MA (2001) Towards data warehouse quality metrics. In: Proceedings of 3rd international workshop on design and management of data warehouse, Interlaken, Switzerland, p 2
13.
Zurück zum Zitat Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354CrossRef Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354CrossRef
15.
Zurück zum Zitat Charness G, Gneezy U, Kuhn MA (2012) Experimental methods: between-subject and within-subject design. J Econ Behav Organ 81(1):1–8CrossRef Charness G, Gneezy U, Kuhn MA (2012) Experimental methods: between-subject and within-subject design. J Econ Behav Organ 81(1):1–8CrossRef
16.
Zurück zum Zitat Cherfi SS, Prat N (2003) Multidimensional schemas quality: assessing and balancing analyzability and simplicity. Conceptual modeling for novel application domains. Springer, Berlin, pp 140–151CrossRef Cherfi SS, Prat N (2003) Multidimensional schemas quality: assessing and balancing analyzability and simplicity. Conceptual modeling for novel application domains. Springer, Berlin, pp 140–151CrossRef
17.
Zurück zum Zitat Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning, pp 115–123 Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning, pp 115–123
18.
Zurück zum Zitat Congdon P (2001) Bayesian statistical modelling. Wiley, New YorkMATH Congdon P (2001) Bayesian statistical modelling. Wiley, New YorkMATH
19.
Zurück zum Zitat Cruz-Lemus JA, Maes A, Genero M, Poels G, Piattini M (2010) The impact of structural complexity on the understandability of UML statechart diagrams. Inf Sci 180(11):2209–2220MathSciNetCrossRef Cruz-Lemus JA, Maes A, Genero M, Poels G, Piattini M (2010) The impact of structural complexity on the understandability of UML statechart diagrams. Inf Sci 180(11):2209–2220MathSciNetCrossRef
20.
Zurück zum Zitat Darlington R (1968) Multiple regression in psychological research and practice. Psychol Bull 69(3):161–182CrossRef Darlington R (1968) Multiple regression in psychological research and practice. Psychol Bull 69(3):161–182CrossRef
21.
Zurück zum Zitat Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Softw Eng 39(2):237–257CrossRef Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Softw Eng 39(2):237–257CrossRef
22.
Zurück zum Zitat Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero–one loss. Mach Learn 29(2–3):103–130CrossRefMATH Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero–one loss. Mach Learn 29(2–3):103–130CrossRefMATH
24.
Zurück zum Zitat English L (1996) Information quality improvement: principles. methods and management. Information Impact International, Brentwood English L (1996) Information quality improvement: principles. methods and management. Information Impact International, Brentwood
25.
Zurück zum Zitat Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach. CRC Press, LondonCrossRefMATH Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach. CRC Press, LondonCrossRefMATH
26.
Zurück zum Zitat Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689CrossRef Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689CrossRef
27.
Zurück zum Zitat Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In ICML 98:144–151 Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In ICML 98:144–151
28.
Zurück zum Zitat Gosain A, Singh J (2017) Quality metrics emphasizing dimension hierarchy sharing in multidimensional models for data warehouse: a theoretical and empirical evaluation. Int J Syst Assur Eng Manag 8:1672–1688CrossRef Gosain A, Singh J (2017) Quality metrics emphasizing dimension hierarchy sharing in multidimensional models for data warehouse: a theoretical and empirical evaluation. Int J Syst Assur Eng Manag 8:1672–1688CrossRef
29.
Zurück zum Zitat Gosain A, Nagpal S, Sabharwal S (2011) Quality metrics for conceptual models for data warehouse focusing on dimension hierarchies. ACM SIGSOFT Softw Eng Notes 36(4):1–5CrossRef Gosain A, Nagpal S, Sabharwal S (2011) Quality metrics for conceptual models for data warehouse focusing on dimension hierarchies. ACM SIGSOFT Softw Eng Notes 36(4):1–5CrossRef
30.
Zurück zum Zitat Gosain A, Nagpal S, Sabharwal S (2013) Validating dimension hierarchy metrics for the understandability of multidimensional models for data warehouse. IET Softw 7(2):93–103CrossRef Gosain A, Nagpal S, Sabharwal S (2013) Validating dimension hierarchy metrics for the understandability of multidimensional models for data warehouse. IET Softw 7(2):93–103CrossRef
31.
Zurück zum Zitat Gosain A, Singh J (2015a) Quality metrics for data warehouse multidimensional models with focus on dimension hierarchy sharing. In: Advances in intelligent informatics. Springer, Berlin, pp 429–443 Gosain A, Singh J (2015a) Quality metrics for data warehouse multidimensional models with focus on dimension hierarchy sharing. In: Advances in intelligent informatics. Springer, Berlin, pp 429–443
32.
Zurück zum Zitat Gosain A, Singh J (2015b) Conceptual multidimensional modeling for data warehouses: a survey. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications. Springer, Berlin, pp 305–316 Gosain A, Singh J (2015b) Conceptual multidimensional modeling for data warehouses: a survey. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications. Springer, Berlin, pp 305–316
34.
Zurück zum Zitat Hsu CN, Huang HJ, Wong TT (2000) Why discretization works for naive bayesian classifiers. In Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 399–406 Hsu CN, Huang HJ, Wong TT (2000) Why discretization works for naive bayesian classifiers. In Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 399–406
35.
Zurück zum Zitat ISO (2001) Software product evaluation-quality characteristics and guidelines for their use. ISO/IEC Standard 9126, Geneva ISO (2001) Software product evaluation-quality characteristics and guidelines for their use. ISO/IEC Standard 9126, Geneva
36.
Zurück zum Zitat Jarke M, Lenzerini M, Vassiliou Y, Vassiliadis P (2003) Fundamentals of data warehouses, 2nd edn. Springer, BerlinCrossRefMATH Jarke M, Lenzerini M, Vassiliou Y, Vassiliadis P (2003) Fundamentals of data warehouses, 2nd edn. Springer, BerlinCrossRefMATH
37.
Zurück zum Zitat Jeusfeld MA, Quix C, Jarke M (1998) Design and analysis of quality information for data warehouses. Conceptual modeling-ER’98. Springer, Berlin, pp 349–362 Jeusfeld MA, Quix C, Jarke M (1998) Design and analysis of quality information for data warehouses. Conceptual modeling-ER’98. Springer, Berlin, pp 349–362
38.
Zurück zum Zitat John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345 John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345
39.
Zurück zum Zitat Kimball R, Ross M (2002) The data warehouse toolkit: the complete guide to dimensional modeling, 2nd edn. Wiley, London Kimball R, Ross M (2002) The data warehouse toolkit: the complete guide to dimensional modeling, 2nd edn. Wiley, London
40.
Zurück zum Zitat Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28(8):721–734CrossRef Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28(8):721–734CrossRef
41.
Zurück zum Zitat Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Int Joint Conf Artif Intell 14(2):1137–1145 Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Int Joint Conf Artif Intell 14(2):1137–1145
42.
Zurück zum Zitat Koru AG, Liu H (2005) Building effective defect-prediction models in practice. IEEE Softw 22(6):23–29CrossRef Koru AG, Liu H (2005) Building effective defect-prediction models in practice. IEEE Softw 22(6):23–29CrossRef
43.
Zurück zum Zitat Kumar M, Gosain A, Singh Y (2014) Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse. Int J Syst Assur Eng Manag 5(3):291–306CrossRef Kumar M, Gosain A, Singh Y (2014) Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse. Int J Syst Assur Eng Manag 5(3):291–306CrossRef
44.
Zurück zum Zitat Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–74CrossRefMATH Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–74CrossRefMATH
45.
Zurück zum Zitat Lanubile F, Visaggio G (1997) Evaluating predictive quality models derived from software measures: lessons learned. J Syst Softw 38(3):225–234CrossRef Lanubile F, Visaggio G (1997) Evaluating predictive quality models derived from software measures: lessons learned. J Syst Softw 38(3):225–234CrossRef
46.
Zurück zum Zitat Lanubile F, Lonigro A, Vissagio G (1995) Comparing models for identifying fault-prone software components. In: SEKE, pp 312–319 Lanubile F, Lonigro A, Vissagio G (1995) Comparing models for identifying fault-prone software components. In: SEKE, pp 312–319
47.
Zurück zum Zitat Lemeshow S, Hosmer D (2000) Applied logistic regression. Wiley series in probability and statistics. Wiley-Interscience, HobokenMATH Lemeshow S, Hosmer D (2000) Applied logistic regression. Wiley series in probability and statistics. Wiley-Interscience, HobokenMATH
48.
Zurück zum Zitat Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
49.
Zurück zum Zitat Linstedt D, Olschimke M (2015) Building a scalable data warehouse with data vault 2.0. Morgan Kaufmann, Burlington Linstedt D, Olschimke M (2015) Building a scalable data warehouse with data vault 2.0. Morgan Kaufmann, Burlington
50.
Zurück zum Zitat List B, Bruckner RM, Machaczek K, Schiefer J (2002) A comparison of data warehouse development methodologies case study of the process warehouse. Database and expert systems applications. Springer, Berlin, pp 203–215CrossRef List B, Bruckner RM, Machaczek K, Schiefer J (2002) A comparison of data warehouse development methodologies case study of the process warehouse. Database and expert systems applications. Springer, Berlin, pp 203–215CrossRef
51.
Zurück zum Zitat Lujan-Mora S, Trujillo J, Song IY (2006) A UML profile for multidimensional modeling in data warehouses. Data Knowl Eng 59(3):725–769CrossRef Lujan-Mora S, Trujillo J, Song IY (2006) A UML profile for multidimensional modeling in data warehouses. Data Knowl Eng 59(3):725–769CrossRef
52.
Zurück zum Zitat Malinowski E, Zimanyi E (2006) Hierarchies in a multidimensional model: from conceptual modeling to logical representation. Data Knowl Eng 59(2):348–377CrossRef Malinowski E, Zimanyi E (2006) Hierarchies in a multidimensional model: from conceptual modeling to logical representation. Data Knowl Eng 59(2):348–377CrossRef
53.
Zurück zum Zitat Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence-absence models in ecology: the need to account for prevalence. J Appl Ecol 38(5):921–931CrossRef Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence-absence models in ecology: the need to account for prevalence. J Appl Ecol 38(5):921–931CrossRef
54.
Zurück zum Zitat Mansmann S, Scholl MH (2007) Extending the multidimensional data model to handle complex data. J Comput Sci Eng 1(2):125–160CrossRef Mansmann S, Scholl MH (2007) Extending the multidimensional data model to handle complex data. J Comput Sci Eng 1(2):125–160CrossRef
55.
Zurück zum Zitat Melton A (1996) Software measurement. International Thomson Computer Press, LondonMATH Melton A (1996) Software measurement. International Thomson Computer Press, LondonMATH
56.
Zurück zum Zitat Michalski RS, Carbonell JG, Mitchell TM (2013) Machine learning: an artificial intelligence approach. Springer, BerlinMATH Michalski RS, Carbonell JG, Mitchell TM (2013) Machine learning: an artificial intelligence approach. Springer, BerlinMATH
57.
Zurück zum Zitat Nagpal S, Gosain A, Sabharwal S (2013) Theoretical and empirical validation of comprehensive complexity metric for multidimensional models for data warehouse. Int J Syst Assur Eng Manag 4(2):193–204CrossRef Nagpal S, Gosain A, Sabharwal S (2013) Theoretical and empirical validation of comprehensive complexity metric for multidimensional models for data warehouse. Int J Syst Assur Eng Manag 4(2):193–204CrossRef
58.
Zurück zum Zitat Nagpal S, Gosain A, Sabharwal S (2012) Complexity metric for multidimensional models for data warehouse. In: Proceedings of the CUBE international information technology conference, pp 360–365 Nagpal S, Gosain A, Sabharwal S (2012) Complexity metric for multidimensional models for data warehouse. In: Proceedings of the CUBE international information technology conference, pp 360–365
59.
Zurück zum Zitat Pedersen TB, Jensen CS, Dyreson CE (2001) A foundation for capturing and querying complex multidimensional data. Inf Syst 26(5):383–423CrossRefMATH Pedersen TB, Jensen CS, Dyreson CE (2001) A foundation for capturing and querying complex multidimensional data. Inf Syst 26(5):383–423CrossRefMATH
60.
Zurück zum Zitat Provost F, Kohavi R (1998) On applied research in machine learning. Mach Learn 30:127–132CrossRef Provost F, Kohavi R (1998) On applied research in machine learning. Mach Learn 30:127–132CrossRef
61.
Zurück zum Zitat Quinlan R (1993) C4.5 programs for machine learning. Morgan Kaufmann, Burlington Quinlan R (1993) C4.5 programs for machine learning. Morgan Kaufmann, Burlington
62.
Zurück zum Zitat Riaz M, Mendes E, Tempero E (2009) A systematic review of software maintainability prediction and metrics. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 367–377 Riaz M, Mendes E, Tempero E (2009) A systematic review of software maintainability prediction and metrics. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 367–377
63.
Zurück zum Zitat Rizzi S, Abello A, Lechtenbörger J, Trujillo J (2006) Research in data warehouse modeling and design: dead or alive? In: Proceedings of the 9th ACM international workshop on data warehousing and OLAP, pp 3–10 Rizzi S, Abello A, Lechtenbörger J, Trujillo J (2006) Research in data warehouse modeling and design: dead or alive? In: Proceedings of the 9th ACM international workshop on data warehousing and OLAP, pp 3–10
64.
Zurück zum Zitat Sabharwal S, Nagpal S, Aggarwal G (2015) Empirical investigation of metrics for multidimensional model of data warehouse using support vector machine. In: 4th International IEEE conference on reliability, infocom technologies and optimization (trends and future directions), pp 1–5 Sabharwal S, Nagpal S, Aggarwal G (2015) Empirical investigation of metrics for multidimensional model of data warehouse using support vector machine. In: 4th International IEEE conference on reliability, infocom technologies and optimization (trends and future directions), pp 1–5
65.
Zurück zum Zitat Schuff D, Corral K, Turetken O (2011) Comparing the understandability of alternative data warehouse schemas: an empirical study. Decis Support Syst 52(1):9–20CrossRef Schuff D, Corral K, Turetken O (2011) Comparing the understandability of alternative data warehouse schemas: an empirical study. Decis Support Syst 52(1):9–20CrossRef
66.
Zurück zum Zitat Serrano MA, Calero C, Piattini M (2003) Experimental validation of multidimensional data models metrics. In: Proceedings of 36th annual Hawaii IEEE international conference on system sciences, p 7 Serrano MA, Calero C, Piattini M (2003) Experimental validation of multidimensional data models metrics. In: Proceedings of 36th annual Hawaii IEEE international conference on system sciences, p 7
67.
Zurück zum Zitat Serrano MA (2004) Definition of a set of metrics for assuring data warehouse quality. Univeristy of Castilla, La Mancha Serrano MA (2004) Definition of a set of metrics for assuring data warehouse quality. Univeristy of Castilla, La Mancha
68.
Zurück zum Zitat Serrano MA, Calero C, Piattini M (2002) Validating metrics for data warehouse. Softw IEEE Proc 149(5):161–166CrossRef Serrano MA, Calero C, Piattini M (2002) Validating metrics for data warehouse. Softw IEEE Proc 149(5):161–166CrossRef
69.
Zurück zum Zitat Serrano MA, Calero C, Trujillo J, Lujan-Mora S, Piattini M (2004) Empirical validation of metrics for conceptual models for data warehouse. Advanced information systems engineering. Springer, Berlin, pp 506–520CrossRef Serrano MA, Calero C, Trujillo J, Lujan-Mora S, Piattini M (2004) Empirical validation of metrics for conceptual models for data warehouse. Advanced information systems engineering. Springer, Berlin, pp 506–520CrossRef
70.
Zurück zum Zitat Serrano MA, Calero C, Piattini M (2005) An experimental replication with data warehouse metrics. Int J Data Wareh Min 1(4):1–21CrossRef Serrano MA, Calero C, Piattini M (2005) An experimental replication with data warehouse metrics. Int J Data Wareh Min 1(4):1–21CrossRef
71.
Zurück zum Zitat Serrano MA, Trujillo J, Calero C, Piattini M (2007) Metrics for data warehouse conceptual models understandability. Inf Softw Technol 49(8):851–870CrossRef Serrano MA, Trujillo J, Calero C, Piattini M (2007) Metrics for data warehouse conceptual models understandability. Inf Softw Technol 49(8):851–870CrossRef
72.
Zurück zum Zitat Serrano MA, Calero C, Sahraoui HA, Piattini M (2008) Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw Quality J 16(1):79–106CrossRef Serrano MA, Calero C, Sahraoui HA, Piattini M (2008) Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw Quality J 16(1):79–106CrossRef
73.
Zurück zum Zitat Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Cengage learning, ISBN-13: 9780395615560/ISBN-10: 0395615569 Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Cengage learning, ISBN-13: 9780395615560/ISBN-10: 0395615569
74.
Zurück zum Zitat Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef
75.
Zurück zum Zitat Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRef Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRef
76.
Zurück zum Zitat Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59CrossRef Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59CrossRef
77.
Zurück zum Zitat Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, BurlingtonMATH Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, BurlingtonMATH
78.
Zurück zum Zitat Wixom BH, Watson HJ (2001) An empirical investigation of the factors affecting data warehousing success. MIS Q 25:17–41CrossRef Wixom BH, Watson HJ (2001) An empirical investigation of the factors affecting data warehousing success. MIS Q 25:17–41CrossRef
79.
Zurück zum Zitat Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer, BerlinCrossRefMATH Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer, BerlinCrossRefMATH
80.
Zurück zum Zitat Zhang D, Tsai JJ (2003) Machine learning and software engineering. Softw Quality J 11(2):87–119CrossRef Zhang D, Tsai JJ (2003) Machine learning and software engineering. Softw Quality J 11(2):87–119CrossRef
Metadaten
Titel
Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques
verfasst von
Anjana Gosain
Jaspreeti Singh
Publikationsdatum
20.12.2017
Verlag
Springer London
Erschienen in
Innovations in Systems and Software Engineering / Ausgabe 1/2018
Print ISSN: 1614-5046
Elektronische ISSN: 1614-5054
DOI
https://doi.org/10.1007/s11334-017-0308-z