Skip to main content
Top
Published in: Advances in Data Analysis and Classification 3/2023

05-09-2022 | Regular Article

Robust regression for interval-valued data based on midpoints and log-ranges

Authors: Qing Zhao, Huiwen Wang, Shanshan Wang

Published in: Advances in Data Analysis and Classification | Issue 3/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Flexible modelling of interval-valued data is of great practical importance with the development of advanced technologies in current data collection processes. This paper proposes a new robust regression model for interval-valued data based on midpoints and log-ranges of the dependent intervals, and obtains the parameter estimators using Huber loss function to deal with possible outliers in a data set. Besides, the use of logarithm transformation avoids the non-negativity constraints for the traditional modelling of ranges, which is beneficial to the flexible use of various regression methods. We conduct extensive Monte Carlo simulation experiments to compare the finite-sample performance of our model with that of the existing regression methods for interval-valued data. Results indicate that the proposed method shows competitive performance, especially in the data set with the existence of outliers and the scenarios where both midpoints and ranges of independent variables are related to those of the dependent one. Moreover, two empirical interval-valued data sets are applied to illustrate the effectiveness of our method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Billard L, Diday E (2000) Regression analysis for interval-valued data. In Proceedings of the Seventh Conference of the International Federation of Classification Societies, pages 369–374. Springer Billard L, Diday E (2000) Regression analysis for interval-valued data. In Proceedings of the Seventh Conference of the International Federation of Classification Societies, pages 369–374. Springer
go back to reference Billard L, Diday E (2002) Symbolic Regression Analysis. In Classification, Clustering, and Data Analysis. Springer, pp 281–288MATH Billard L, Diday E (2002) Symbolic Regression Analysis. In Classification, Clustering, and Data Analysis. Springer, pp 281–288MATH
go back to reference Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487MathSciNetCrossRef Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487MathSciNetCrossRef
go back to reference Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. SpringerCrossRefMATH Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. SpringerCrossRefMATH
go back to reference Carvalho FATD, Souza RMCRD, Chavent M, Lechevallier Y (2006) Adaptive hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recogn Lett 27(3):167–179CrossRef Carvalho FATD, Souza RMCRD, Chavent M, Lechevallier Y (2006) Adaptive hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recogn Lett 27(3):167–179CrossRef
go back to reference Diday E (1988) The symbolic approach in clustering and related methods of data analysis. Proceedings of IFCS, Classification and Related Methods of Data Analysis, pages 673–384 Diday E (1988) The symbolic approach in clustering and related methods of data analysis. Proceedings of IFCS, Classification and Related Methods of Data Analysis, pages 673–384
go back to reference Diday E (2016) Thinking by classes in data science: the symbolic data analysis paradigm. Wiley Interdisciplinary Reviews: Computational Statistics 8(5):172–205MathSciNetCrossRef Diday E (2016) Thinking by classes in data science: the symbolic data analysis paradigm. Wiley Interdisciplinary Reviews: Computational Statistics 8(5):172–205MathSciNetCrossRef
go back to reference Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. John Wiley & SonsMATH Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. John Wiley & SonsMATH
go back to reference Domingues MAO, Souza RMCRD, Cysneiros RJA (2010) A robust method for linear regression of symbolic interval data. Pattern Recogn Lett 31(13):1991–1996CrossRef Domingues MAO, Souza RMCRD, Cysneiros RJA (2010) A robust method for linear regression of symbolic interval data. Pattern Recogn Lett 31(13):1991–1996CrossRef
go back to reference Fagundes RAA, Souza RMCRD, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26(1):564–573CrossRef Fagundes RAA, Souza RMCRD, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26(1):564–573CrossRef
go back to reference Fagundes RAA, Souza RMCRD, Soares YMG (2016) Quantile regression of interval-valued data. In 23rd International Conference on Pattern Recognition (ICPR), pp 2586–2591. IEEE Fagundes RAA, Souza RMCRD, Soares YMG (2016) Quantile regression of interval-valued data. In 23rd International Conference on Pattern Recognition (ICPR), pp 2586–2591. IEEE
go back to reference Ferraro MB, Giordani P (2013) A proposal of robust regression for random fuzzy sets. In Synergies of Soft Computing and Statistics for Intelligent Data Analysis, pp 115–123. Springer Ferraro MB, Giordani P (2013) A proposal of robust regression for random fuzzy sets. In Synergies of Soft Computing and Statistics for Intelligent Data Analysis, pp 115–123. Springer
go back to reference Ferraro MB, Coppi R, Rodriguez GG, Colubi A (2010) A linear regression model for imprecise response. Int J Approximate Reasoning 51(7):759–770MathSciNetCrossRefMATH Ferraro MB, Coppi R, Rodriguez GG, Colubi A (2010) A linear regression model for imprecise response. Int J Approximate Reasoning 51(7):759–770MathSciNetCrossRefMATH
go back to reference Ferraro MB, Colubi A, Rodriguez GG, Coppi R (2011) A determination coefficient for a linear regression model with imprecise response. Environmetrics 22(4):516–529MathSciNetCrossRefMATH Ferraro MB, Colubi A, Rodriguez GG, Coppi R (2011) A determination coefficient for a linear regression model with imprecise response. Environmetrics 22(4):516–529MathSciNetCrossRefMATH
go back to reference Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. Wiley Online LibraryMATH Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. Wiley Online LibraryMATH
go back to reference Hao P, Guo J (2017) Constrained center and range joint model for interval-valued symbolic data regression. Comput Stat Data Anal 116:106–138MathSciNetCrossRefMATH Hao P, Guo J (2017) Constrained center and range joint model for interval-valued symbolic data regression. Comput Stat Data Anal 116:106–138MathSciNetCrossRefMATH
go back to reference Huber PJ (2011) Robust statistics. Springer Huber PJ (2011) Robust statistics. Springer
go back to reference Maronna RA, Martin DR, Yohai VJ (2006) Robust Statistics: Theory and Methods. John Wiley & SonsCrossRefMATH Maronna RA, Martin DR, Yohai VJ (2006) Robust Statistics: Theory and Methods. John Wiley & SonsCrossRefMATH
go back to reference Neto EAL, Carvalho FATD (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52(3):1500–1515MathSciNetCrossRefMATH Neto EAL, Carvalho FATD (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52(3):1500–1515MathSciNetCrossRefMATH
go back to reference Neto EAL, Carvalho FATD (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54(2):333–347MathSciNetCrossRefMATH Neto EAL, Carvalho FATD (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54(2):333–347MathSciNetCrossRefMATH
go back to reference Neto EAL, Carvalho FATD (2018) An exponential-type kernel robust regression model for interval-valued variables. Inf Sci 454–455:419–442MathSciNetCrossRefMATH Neto EAL, Carvalho FATD (2018) An exponential-type kernel robust regression model for interval-valued variables. Inf Sci 454–455:419–442MathSciNetCrossRefMATH
go back to reference Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Statistical Analysis and Data Mining: the ASA Data Science Journal 4(2):157–170MathSciNetCrossRefMATH Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Statistical Analysis and Data Mining: the ASA Data Science Journal 4(2):157–170MathSciNetCrossRefMATH
go back to reference R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, (2017) R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, (2017)
go back to reference Silva PD, Brito P (2021) MAINT.Data: Model and Analyse Interval Data Silva PD, Brito P (2021) MAINT.Data: Model and Analyse Interval Data
go back to reference Wang H, Guan R, Wu J (2012) Linear regression of interval-valued data based on complete information in hypercubes. J Syst Sci Syst Eng 21(4):422–442CrossRef Wang H, Guan R, Wu J (2012) Linear regression of interval-valued data based on complete information in hypercubes. J Syst Sci Syst Eng 21(4):422–442CrossRef
go back to reference Wei Y, Wang S, Wang H (2017) Interval-valued data regression using partial linear model. J Stat Comput Simul 87(16–18):3175–3194MathSciNetMATH Wei Y, Wang S, Wang H (2017) Interval-valued data regression using partial linear model. J Stat Comput Simul 87(16–18):3175–3194MathSciNetMATH
go back to reference Xu W (2010) Symbolic data analysis: interval-valued data regression. PhD thesis, University of Georgia Xu W (2010) Symbolic data analysis: interval-valued data regression. PhD thesis, University of Georgia
Metadata
Title
Robust regression for interval-valued data based on midpoints and log-ranges
Authors
Qing Zhao
Huiwen Wang
Shanshan Wang
Publication date
05-09-2022
Publisher
Springer Berlin Heidelberg
Published in
Advances in Data Analysis and Classification / Issue 3/2023
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-022-00518-2

Other articles of this Issue 3/2023

Advances in Data Analysis and Classification 3/2023 Go to the issue

Premium Partner