Skip to main content

2013 | OriginalPaper | Buchkapitel

20. Factors That Can Affect Model Performance

verfasst von : Max Kuhn, Kjell Johnson

Erschienen in: Applied Predictive Modeling

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Several of the preceding chapters have focused on technical pitfalls of predictive models, such as over-fitting and class imbalances. Often, true success may depend on aspects of the problem that are not directly related to the model itself. This chapter discusses topics such as: Type III errors (answering the wrong question, Section 20.1), the effect of unwanted noise in the response (Section 20.2) and in the predictors (Section 20.3), the impact of discretizing continuous outcomes (Section 20.4), extrapolation (Section 20.5), and the impact of a large number of samples (Section 20.6). In the Computing Section (20.7) we illustrate the implementation of an algorithm for determining samples’ similarity to the training set. Finally, exercises are provided at the end of the chapter to solidify the concepts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In marketing, Rzepakowski and Jaroszewicz (2012) defined the following classes of marketing models: “In the propensity models, historical information about purchases (or other success measures like visits) is used, while in the response models, all customers have been subject to a pilot campaign.”
 
2
Note that, in this case, the noise in the predictors is systematic and random. In other words, we can attribute the source of variation to a specific cause.
 
3
An example of this is the job scheduling data where the execution time of a job was binned into four groups. In this case, the queuing system cannot utilize estimates of job length but can use binned versions of this outcome.
 
Literatur
Zurück zum Zitat Abdi H, Williams L (2010). “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.CrossRef Abdi H, Williams L (2010). “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.CrossRef
Zurück zum Zitat Agresti A (2002). Categorical Data Analysis. Wiley–Interscience. Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.
Zurück zum Zitat Ahdesmaki M, Strimmer K (2010). “Feature Selection in Omics Prediction Problems Using CAT Scores and False Nondiscovery Rate Control.” The Annals of Applied Statistics, 4(1), 503–519.MathSciNetMATHCrossRef Ahdesmaki M, Strimmer K (2010). “Feature Selection in Omics Prediction Problems Using CAT Scores and False Nondiscovery Rate Control.” The Annals of Applied Statistics, 4(1), 503–519.MathSciNetMATHCrossRef
Zurück zum Zitat Alin A (2009). “Comparison of PLS Algorithms when Number of Objects is Much Larger than Number of Variables.” Statistical Papers, 50, 711–720.MathSciNetMATHCrossRef Alin A (2009). “Comparison of PLS Algorithms when Number of Objects is Much Larger than Number of Variables.” Statistical Papers, 50, 711–720.MathSciNetMATHCrossRef
Zurück zum Zitat Altman D, Bland J (1994). “Diagnostic Tests 3: Receiver Operating Characteristic Plots.” British Medical Journal, 309(6948), 188.CrossRef Altman D, Bland J (1994). “Diagnostic Tests 3: Receiver Operating Characteristic Plots.” British Medical Journal, 309(6948), 188.CrossRef
Zurück zum Zitat Ambroise C, McLachlan G (2002). “Selection Bias in Gene Extraction on the Basis of Microarray Gene–Expression Data.” Proceedings of the National Academy of Sciences, 99(10), 6562–6566.MATHCrossRef Ambroise C, McLachlan G (2002). “Selection Bias in Gene Extraction on the Basis of Microarray Gene–Expression Data.” Proceedings of the National Academy of Sciences, 99(10), 6562–6566.MATHCrossRef
Zurück zum Zitat Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.CrossRef Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.CrossRef
Zurück zum Zitat Armitage P, Berry G (1994). Statistical Methods in Medical Research. Blackwell Scientific Publications, Oxford, 3rd edition. Armitage P, Berry G (1994). Statistical Methods in Medical Research. Blackwell Scientific Publications, Oxford, 3rd edition.
Zurück zum Zitat Artis M, Ayuso M, Guillen M (2002). “Detection of Automobile Insurance Fraud with Discrete Choice Models and Misclassified Claims.” The Journal of Risk and Insurance, 69(3), 325–340.CrossRef Artis M, Ayuso M, Guillen M (2002). “Detection of Automobile Insurance Fraud with Discrete Choice Models and Misclassified Claims.” The Journal of Risk and Insurance, 69(3), 325–340.CrossRef
Zurück zum Zitat Austin P, Brunner L (2004). “Inflation of the Type I Error Rate When a Continuous Confounding Variable Is Categorized in Logistic Regression Analyses.” Statistics in Medicine, 23(7), 1159–1178.CrossRef Austin P, Brunner L (2004). “Inflation of the Type I Error Rate When a Continuous Confounding Variable Is Categorized in Logistic Regression Analyses.” Statistics in Medicine, 23(7), 1159–1178.CrossRef
Zurück zum Zitat Ayres I (2007). Super Crunchers: Why Thinking–By–Numbers Is The New Way To Be Smart. Bantam. Ayres I (2007). Super Crunchers: Why Thinking–By–Numbers Is The New Way To Be Smart. Bantam.
Zurück zum Zitat Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.CrossRef Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.CrossRef
Zurück zum Zitat Batista G, Prati R, Monard M (2004). “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.” ACM SIGKDD Explorations Newsletter, 6(1), 20–29.CrossRef Batista G, Prati R, Monard M (2004). “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.” ACM SIGKDD Explorations Newsletter, 6(1), 20–29.CrossRef
Zurück zum Zitat Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.CrossRef Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.CrossRef
Zurück zum Zitat Becton Dickinson and Company (1991). ProbeTec ET Chlamydia trachomatis and Neisseria gonorrhoeae Amplified DNA Assays (Package Insert). Becton Dickinson and Company (1991). ProbeTec ET Chlamydia trachomatis and Neisseria gonorrhoeae Amplified DNA Assays (Package Insert).
Zurück zum Zitat Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.CrossRef Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.CrossRef
Zurück zum Zitat Bentley J (1975). “Multidimensional Binary Search Trees Used for Associative Searching.” Communications of the ACM, 18(9), 509–517.MathSciNetMATHCrossRef Bentley J (1975). “Multidimensional Binary Search Trees Used for Associative Searching.” Communications of the ACM, 18(9), 509–517.MathSciNetMATHCrossRef
Zurück zum Zitat Berglund A, Kettaneh N, Uppgård L, Wold S, DR NB, Cameron (2001). “The GIFI Approach to Non–Linear PLS Modeling.” Journal of Chemometrics, 15, 321–336. Berglund A, Kettaneh N, Uppgård L, Wold S, DR NB, Cameron (2001). “The GIFI Approach to Non–Linear PLS Modeling.” Journal of Chemometrics, 15, 321–336.
Zurück zum Zitat Berglund A, Wold S (1997). “INLR, Implicit Non–Linear Latent Variable Regression.” Journal of Chemometrics, 11, 141–156.CrossRef Berglund A, Wold S (1997). “INLR, Implicit Non–Linear Latent Variable Regression.” Journal of Chemometrics, 11, 141–156.CrossRef
Zurück zum Zitat Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.CrossRef Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.CrossRef
Zurück zum Zitat Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.CrossRef Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.CrossRef
Zurück zum Zitat Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.CrossRef Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.CrossRef
Zurück zum Zitat Bhanu B, Lin Y (2003). “Genetic Algorithm Based Feature Selection for Target Detection in SAR Images.” Image and Vision Computing, 21, 591–608.CrossRef Bhanu B, Lin Y (2003). “Genetic Algorithm Based Feature Selection for Target Detection in SAR Images.” Image and Vision Computing, 21, 591–608.CrossRef
Zurück zum Zitat Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.MATH Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.MATH
Zurück zum Zitat Bishop C (2006). Pattern Recognition and Machine Learning. Springer. Bishop C (2006). Pattern Recognition and Machine Learning. Springer.
Zurück zum Zitat Bland J, Altman D (1995). “Statistics Notes: Multiple Significance Tests: The Bonferroni Method.” British Medical Journal, 310(6973), 170–170.CrossRef Bland J, Altman D (1995). “Statistics Notes: Multiple Significance Tests: The Bonferroni Method.” British Medical Journal, 310(6973), 170–170.CrossRef
Zurück zum Zitat Bland J, Altman D (2000). “The Odds Ratio.” British Medical Journal, 320(7247), 1468.CrossRef Bland J, Altman D (2000). “The Odds Ratio.” British Medical Journal, 320(7247), 1468.CrossRef
Zurück zum Zitat Bohachevsky I, Johnson M, Stein M (1986). “Generalized Simulated Annealing for Function Optimization.” Technometrics, 28(3), 209–217.MATHCrossRef Bohachevsky I, Johnson M, Stein M (1986). “Generalized Simulated Annealing for Function Optimization.” Technometrics, 28(3), 209–217.MATHCrossRef
Zurück zum Zitat Bone R, Balk R, Cerra F, Dellinger R, Fein A, Knaus W, Schein R, Sibbald W (1992). “Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis.” Chest, 101(6), 1644–1655.CrossRef Bone R, Balk R, Cerra F, Dellinger R, Fein A, Knaus W, Schein R, Sibbald W (1992). “Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis.” Chest, 101(6), 1644–1655.CrossRef
Zurück zum Zitat Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152. Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152.
Zurück zum Zitat Boulesteix A, Strobl C (2009). “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High–Dimensional Prediction.” BMC Medical Research Methodology, 9(1), 85.CrossRef Boulesteix A, Strobl C (2009). “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High–Dimensional Prediction.” BMC Medical Research Methodology, 9(1), 85.CrossRef
Zurück zum Zitat Box G, Cox D (1964). “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252. Box G, Cox D (1964). “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252.
Zurück zum Zitat Box G, Hunter W, Hunter J (1978). Statistics for Experimenters. Wiley, New York.MATH Box G, Hunter W, Hunter J (1978). Statistics for Experimenters. Wiley, New York.MATH
Zurück zum Zitat Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.MathSciNetMATHCrossRef Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.MathSciNetMATHCrossRef
Zurück zum Zitat Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.MathSciNetMATH Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.MathSciNetMATH
Zurück zum Zitat Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125. Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.
Zurück zum Zitat Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.MATH Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.MATH
Zurück zum Zitat Bridle J (1990). “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition.” In “Neurocomputing: Algorithms, Architectures and Applications,” pp. 227–236. Springer–Verlag. Bridle J (1990). “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition.” In “Neurocomputing: Algorithms, Architectures and Applications,” pp. 227–236. Springer–Verlag.
Zurück zum Zitat Brillinger D (2004). “Some Data Analyses Using Mutual Information.” Brazilian Journal of Probability and Statistics, 18(6), 163–183.MathSciNetMATH Brillinger D (2004). “Some Data Analyses Using Mutual Information.” Brazilian Journal of Probability and Statistics, 18(6), 163–183.MathSciNetMATH
Zurück zum Zitat Brodnjak-Vonina D, Kodba Z, Novi M (2005). “Multivariate Data Analysis in Classification of Vegetable Oils Characterized by the Content of Fatty Acids.” Chemometrics and Intelligent Laboratory Systems, 75(1), 31–43.CrossRef Brodnjak-Vonina D, Kodba Z, Novi M (2005). “Multivariate Data Analysis in Classification of Vegetable Oils Characterized by the Content of Fatty Acids.” Chemometrics and Intelligent Laboratory Systems, 75(1), 31–43.CrossRef
Zurück zum Zitat Brown C, Davis H (2006). “Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial.” Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.CrossRef Brown C, Davis H (2006). “Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial.” Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.CrossRef
Zurück zum Zitat Bu G (2009). “Apolipoprotein E and Its Receptors in Alzheimer’s Disease: Pathways, Pathogenesis and Therapy.” Nature Reviews Neuroscience, 10(5), 333–344.CrossRef Bu G (2009). “Apolipoprotein E and Its Receptors in Alzheimer’s Disease: Pathways, Pathogenesis and Therapy.” Nature Reviews Neuroscience, 10(5), 333–344.CrossRef
Zurück zum Zitat Buckheit J, Donoho DL (1995). “WaveLab and Reproducible Research.” In A Antoniadis, G Oppenheim (eds.), “Wavelets in Statistics,” pp. 55–82. Springer-Verlag, New York.CrossRef Buckheit J, Donoho DL (1995). “WaveLab and Reproducible Research.” In A Antoniadis, G Oppenheim (eds.), “Wavelets in Statistics,” pp. 55–82. Springer-Verlag, New York.CrossRef
Zurück zum Zitat Burez J, Van den Poel D (2009). “Handling Class Imbalance In Customer Churn Prediction.” Expert Systems with Applications, 36(3), 4626–4636.CrossRef Burez J, Van den Poel D (2009). “Handling Class Imbalance In Customer Churn Prediction.” Expert Systems with Applications, 36(3), 4626–4636.CrossRef
Zurück zum Zitat Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.MathSciNetMATH Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.MathSciNetMATH
Zurück zum Zitat Caputo B, Sim K, Furesjo F, Smola A (2002). “Appearance–Based Object Recognition Using SVMs: Which Kernel Should I Use?” In “Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision,”. Caputo B, Sim K, Furesjo F, Smola A (2002). “Appearance–Based Object Recognition Using SVMs: Which Kernel Should I Use?” In “Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision,”.
Zurück zum Zitat Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.MathSciNetMATHCrossRef Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.MathSciNetMATHCrossRef
Zurück zum Zitat Castaldi P, Dahabreh I, Ioannidis J (2011). “An Empirical Assessment of Validation Practices for Molecular Classifiers.” Briefings in Bioinformatics, 12(3), 189–202.CrossRef Castaldi P, Dahabreh I, Ioannidis J (2011). “An Empirical Assessment of Validation Practices for Molecular Classifiers.” Briefings in Bioinformatics, 12(3), 189–202.CrossRef
Zurück zum Zitat Chambers J (2008). Software for Data Analysis: Programming with R. Springer. Chambers J (2008). Software for Data Analysis: Programming with R. Springer.
Zurück zum Zitat Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.MathSciNetCrossRef Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.MathSciNetCrossRef
Zurück zum Zitat Chang CC, Lin CJ (2011). “LIBSVM: A Library for Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology, 2, 27: 1–27:27. Chang CC, Lin CJ (2011). “LIBSVM: A Library for Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology, 2, 27: 1–27:27.
Zurück zum Zitat Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). “SMOTE: Synthetic Minority Over–Sampling Technique.” Journal of Artificial Intelligence Research, 16(1), 321–357.MATH Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). “SMOTE: Synthetic Minority Over–Sampling Technique.” Journal of Artificial Intelligence Research, 16(1), 321–357.MATH
Zurück zum Zitat Chun H, Keleş S (2010). “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1), 3–25.MathSciNetCrossRef Chun H, Keleş S (2010). “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1), 3–25.MathSciNetCrossRef
Zurück zum Zitat Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.MathSciNetMATHCrossRef Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.MathSciNetMATHCrossRef
Zurück zum Zitat Clark R (1997). “OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets.” Journal of Chemical Information and Computer Sciences, 37(6), 1181–1188.CrossRef Clark R (1997). “OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets.” Journal of Chemical Information and Computer Sciences, 37(6), 1181–1188.CrossRef
Zurück zum Zitat Clark T (2004). “Can Out–of–Sample Forecast Comparisons Help Prevent Overfitting?” Journal of Forecasting, 23(2), 115–139.CrossRef Clark T (2004). “Can Out–of–Sample Forecast Comparisons Help Prevent Overfitting?” Journal of Forecasting, 23(2), 115–139.CrossRef
Zurück zum Zitat Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.MathSciNetCrossRef Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.MathSciNetCrossRef
Zurück zum Zitat Cleveland W (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots.” Journal of the American Statistical Association, 74(368), 829–836.MathSciNetMATHCrossRef Cleveland W (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots.” Journal of the American Statistical Association, 74(368), 829–836.MathSciNetMATHCrossRef
Zurück zum Zitat Cleveland W, Devlin S (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting.” Journal of the American Statistical Association, pp. 596–610. Cleveland W, Devlin S (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting.” Journal of the American Statistical Association, pp. 596–610.
Zurück zum Zitat Cohen G, Hilario M, Pellegrini C, Geissbuhler A (2005). “SVM Modeling via a Hybrid Genetic Strategy. A Health Care Application.” In R Engelbrecht, AGC Lovis (eds.), “Connecting Medical Informatics and Bio–Informatics,” pp. 193–198. IOS Press. Cohen G, Hilario M, Pellegrini C, Geissbuhler A (2005). “SVM Modeling via a Hybrid Genetic Strategy. A Health Care Application.” In R Engelbrecht, AGC Lovis (eds.), “Connecting Medical Informatics and Bio–Informatics,” pp. 193–198. IOS Press.
Zurück zum Zitat Cohen J (1960). “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement, 20, 37–46.CrossRef Cohen J (1960). “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement, 20, 37–46.CrossRef
Zurück zum Zitat Cohn D, Atlas L, Ladner R (1994). “Improving Generalization with Active Learning.” Machine Learning, 15(2), 201–221. Cohn D, Atlas L, Ladner R (1994). “Improving Generalization with Active Learning.” Machine Learning, 15(2), 201–221.
Zurück zum Zitat Cornell J (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data. Wiley, New York, NY.MATHCrossRef Cornell J (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data. Wiley, New York, NY.MATHCrossRef
Zurück zum Zitat Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.MATH Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.MATH
Zurück zum Zitat Costa N, Lourenco J, Pereira Z (2011). “Desirability Function Approach: A Review and Performance Evaluation in Adverse Conditions.” Chemometrics and Intelligent Lab Systems, 107(2), 234–244.CrossRef Costa N, Lourenco J, Pereira Z (2011). “Desirability Function Approach: A Review and Performance Evaluation in Adverse Conditions.” Chemometrics and Intelligent Lab Systems, 107(2), 234–244.CrossRef
Zurück zum Zitat Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience. Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience.
Zurück zum Zitat Craig-Schapiro R, Kuhn M, Xiong C, Pickering E, Liu J, Misko TP, Perrin R, Bales K, Soares H, Fagan A, Holtzman D (2011). “Multiplexed Immunoassay Panel Identifies Novel CSF Biomarkers for Alzheimer’s Disease Diagnosis and Prognosis.” PLoS ONE, 6(4), e18850.CrossRef Craig-Schapiro R, Kuhn M, Xiong C, Pickering E, Liu J, Misko TP, Perrin R, Bales K, Soares H, Fagan A, Holtzman D (2011). “Multiplexed Immunoassay Panel Identifies Novel CSF Biomarkers for Alzheimer’s Disease Diagnosis and Prognosis.” PLoS ONE, 6(4), e18850.CrossRef
Zurück zum Zitat Cruz-Monteagudo M, Borges F, Cordeiro MND (2011). “Jointly Handling Potency and Toxicity of Antimicrobial Peptidomimetics by Simple Rules from Desirability Theory and Chemoinformatics.” Journal of Chemical Information and Modeling, 51(12), 3060–3077.CrossRef Cruz-Monteagudo M, Borges F, Cordeiro MND (2011). “Jointly Handling Potency and Toxicity of Antimicrobial Peptidomimetics by Simple Rules from Desirability Theory and Chemoinformatics.” Journal of Chemical Information and Modeling, 51(12), 3060–3077.CrossRef
Zurück zum Zitat Davison M (1983). Multidimensional Scaling. John Wiley and Sons, Inc.MATH Davison M (1983). Multidimensional Scaling. John Wiley and Sons, Inc.MATH
Zurück zum Zitat Dayal B, MacGregor J (1997). “Improved PLS Algorithms.” Journal of Chemometrics, 11, 73–85.CrossRef Dayal B, MacGregor J (1997). “Improved PLS Algorithms.” Journal of Chemometrics, 11, 73–85.CrossRef
Zurück zum Zitat de Jong S (1993). “SIMPLS: An Alternative Approach to Partial Least Squares Regression.” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.CrossRef de Jong S (1993). “SIMPLS: An Alternative Approach to Partial Least Squares Regression.” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.CrossRef
Zurück zum Zitat de Jong S, Ter Braak C (1994). “Short Communication: Comments on the PLS Kernel Algorithm.” Journal of Chemometrics, 8, 169–174.CrossRef de Jong S, Ter Braak C (1994). “Short Communication: Comments on the PLS Kernel Algorithm.” Journal of Chemometrics, 8, 169–174.CrossRef
Zurück zum Zitat de Leon M, Klunk W (2006). “Biomarkers for the Early Diagnosis of Alzheimer’s Disease.” The Lancet Neurology, 5(3), 198–199.CrossRef de Leon M, Klunk W (2006). “Biomarkers for the Early Diagnosis of Alzheimer’s Disease.” The Lancet Neurology, 5(3), 198–199.CrossRef
Zurück zum Zitat Defernez M, Kemsley E (1997). “The Use and Misuse of Chemometrics for Treating Classification Problems.” TrAC Trends in Analytical Chemistry, 16(4), 216–221.CrossRef Defernez M, Kemsley E (1997). “The Use and Misuse of Chemometrics for Treating Classification Problems.” TrAC Trends in Analytical Chemistry, 16(4), 216–221.CrossRef
Zurück zum Zitat DeLong E, DeLong D, Clarke-Pearson D (1988). “Comparing the Areas Under Two Or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 44(3), 837–45.MATHCrossRef DeLong E, DeLong D, Clarke-Pearson D (1988). “Comparing the Areas Under Two Or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 44(3), 837–45.MATHCrossRef
Zurück zum Zitat Derksen S, Keselman H (1992). “Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables.” British Journal of Mathematical and Statistical Psychology, 45(2), 265–282.CrossRef Derksen S, Keselman H (1992). “Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables.” British Journal of Mathematical and Statistical Psychology, 45(2), 265–282.CrossRef
Zurück zum Zitat Derringer G, Suich R (1980). “Simultaneous Optimization of Several Response Variables.” Journal of Quality Technology, 12(4), 214–219. Derringer G, Suich R (1980). “Simultaneous Optimization of Several Response Variables.” Journal of Quality Technology, 12(4), 214–219.
Zurück zum Zitat Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.CrossRef Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.CrossRef
Zurück zum Zitat Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.MATH Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.MATH
Zurück zum Zitat Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC. Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.
Zurück zum Zitat Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997). “Support Vector Regression Machines.” Advances in Neural Information Processing Systems, pp. 155–161. Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997). “Support Vector Regression Machines.” Advances in Neural Information Processing Systems, pp. 155–161.
Zurück zum Zitat Drummond C, Holte R (2000). “Explicitly Representing Expected Cost: An Alternative to ROC Representation.” In “Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,” pp. 198–207.CrossRef Drummond C, Holte R (2000). “Explicitly Representing Expected Cost: An Alternative to ROC Representation.” In “Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,” pp. 198–207.CrossRef
Zurück zum Zitat Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285. Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285.
Zurück zum Zitat Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.MathSciNetMATHCrossRef Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.MathSciNetMATHCrossRef
Zurück zum Zitat Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford. Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.
Zurück zum Zitat Dwyer D (2005). “Examples of Overfitting Encountered When Building Private Firm Default Prediction Models.” Technical report, Moody’s KMV. Dwyer D (2005). “Examples of Overfitting Encountered When Building Private Firm Default Prediction Models.” Technical report, Moody’s KMV.
Zurück zum Zitat Efron B (1983). “Estimating the Error Rate of a Prediction Rule: Improvement on Cross–Validation.” Journal of the American Statistical Association, pp. 316–331. Efron B (1983). “Estimating the Error Rate of a Prediction Rule: Improvement on Cross–Validation.” Journal of the American Statistical Association, pp. 316–331.
Zurück zum Zitat Efron B, Hastie T, Johnstone I, Tibshirani R (2004). “Least Angle Regression.” The Annals of Statistics, 32(2), 407–499.MathSciNetMATHCrossRef Efron B, Hastie T, Johnstone I, Tibshirani R (2004). “Least Angle Regression.” The Annals of Statistics, 32(2), 407–499.MathSciNetMATHCrossRef
Zurück zum Zitat Efron B, Tibshirani R (1986). “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, pp. 54–75. Efron B, Tibshirani R (1986). “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, pp. 54–75.
Zurück zum Zitat Efron B, Tibshirani R (1997). “Improvements on Cross–Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association, 92(438), 548–560.MathSciNetMATH Efron B, Tibshirani R (1997). “Improvements on Cross–Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association, 92(438), 548–560.MathSciNetMATH
Zurück zum Zitat Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187. Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.
Zurück zum Zitat Eugster M, Hothorn T, Leisch F (2008). “Exploratory and Inferential Analysis of Benchmark Experiments.” Ludwigs-Maximilians-Universität München, Department of Statistics, Tech. Rep, 30. Eugster M, Hothorn T, Leisch F (2008). “Exploratory and Inferential Analysis of Benchmark Experiments.” Ludwigs-Maximilians-Universität München, Department of Statistics, Tech. Rep, 30.
Zurück zum Zitat Everitt B, Landau S, Leese M, Stahl D (2011). Cluster Analysis. Wiley. Everitt B, Landau S, Leese M, Stahl D (2011). Cluster Analysis. Wiley.
Zurück zum Zitat Ewald B (2006). “Post Hoc Choice of Cut Points Introduced Bias to Diagnostic Research.” Journal of clinical epidemiology, 59(8), 798–801.CrossRef Ewald B (2006). “Post Hoc Choice of Cut Points Introduced Bias to Diagnostic Research.” Journal of clinical epidemiology, 59(8), 798–801.CrossRef
Zurück zum Zitat Fanning K, Cogger K (1998). “Neural Network Detection of Management Fraud Using Published Financial Data.” International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1), 21–41.CrossRef Fanning K, Cogger K (1998). “Neural Network Detection of Management Fraud Using Published Financial Data.” International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1), 21–41.CrossRef
Zurück zum Zitat Faraway J (2005). Linear Models with R. Chapman & Hall/CRC, Boca Raton.MATH Faraway J (2005). Linear Models with R. Chapman & Hall/CRC, Boca Raton.MATH
Zurück zum Zitat Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.CrossRef Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.CrossRef
Zurück zum Zitat Forina M, Casale M, Oliveri P, Lanteri S (2009). “CAIMAN brothers: A Family of Powerful Classification and Class Modeling Techniques.” Chemometrics and Intelligent Laboratory Systems, 96(2), 239–245.CrossRef Forina M, Casale M, Oliveri P, Lanteri S (2009). “CAIMAN brothers: A Family of Powerful Classification and Class Modeling Techniques.” Chemometrics and Intelligent Laboratory Systems, 96(2), 239–245.CrossRef
Zurück zum Zitat Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning. Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning.
Zurück zum Zitat Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151. Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151.
Zurück zum Zitat Free Software Foundation (June 2007). GNU General Public License. Free Software Foundation (June 2007). GNU General Public License.
Zurück zum Zitat Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156. Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156.
Zurück zum Zitat Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.MathSciNetCrossRef Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.MathSciNetCrossRef
Zurück zum Zitat Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.MathSciNetMATHCrossRef Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.MathSciNetMATHCrossRef
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.MathSciNetMATHCrossRef Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.MathSciNetMATHCrossRef
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.CrossRef Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.CrossRef
Zurück zum Zitat Geisser S (1993). Predictive Inference: An Introduction. Chapman and Hall. Geisser S (1993). Predictive Inference: An Introduction. Chapman and Hall.
Zurück zum Zitat Geladi P, Kowalski B (1986). “Partial Least-Squares Regression: A Tutorial.” Analytica Chimica Acta, 185, 1–17.CrossRef Geladi P, Kowalski B (1986). “Partial Least-Squares Regression: A Tutorial.” Analytica Chimica Acta, 185, 1–17.CrossRef
Zurück zum Zitat Geladi P, Manley M, Lestander T (2003). “Scatter Plotting in Multivariate Data Analysis.” Journal of Chemometrics, 17(8–9), 503–511.CrossRef Geladi P, Manley M, Lestander T (2003). “Scatter Plotting in Multivariate Data Analysis.” Journal of Chemometrics, 17(8–9), 503–511.CrossRef
Zurück zum Zitat Gentleman R (2008).R Programming for Bioinformatics. CRC Press. Gentleman R (2008).R Programming for Bioinformatics. CRC Press.
Zurück zum Zitat Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber M, Iacus S, Irizarry R, Leisch F, Li C, Mächler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004). “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology, 5(10), R80.CrossRef Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber M, Iacus S, Irizarry R, Leisch F, Li C, Mächler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004). “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology, 5(10), R80.CrossRef
Zurück zum Zitat Giuliano K, DeBiasio R, Dunlay R, Gough A, Volosky J, Zock J, Pavlakis G, Taylor D (1997). “High–Content Screening: A New Approach to Easing Key Bottlenecks in the Drug Discovery Process.” Journal of Biomolecular Screening, 2(4), 249–259.CrossRef Giuliano K, DeBiasio R, Dunlay R, Gough A, Volosky J, Zock J, Pavlakis G, Taylor D (1997). “High–Content Screening: A New Approach to Easing Key Bottlenecks in the Drug Discovery Process.” Journal of Biomolecular Screening, 2(4), 249–259.CrossRef
Zurück zum Zitat Goldberg D (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison–Wesley, Boston.MATH Goldberg D (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison–Wesley, Boston.MATH
Zurück zum Zitat Golub G, Heath M, Wahba G (1979). “Generalized Cross–Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics, 21(2), 215–223.MathSciNetMATHCrossRef Golub G, Heath M, Wahba G (1979). “Generalized Cross–Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics, 21(2), 215–223.MathSciNetMATHCrossRef
Zurück zum Zitat Good P (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer. Good P (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer.
Zurück zum Zitat Gowen A, Downey G, Esquerre C, O’Donnell C (2010). “Preventing Over–Fitting in PLS Calibration Models of Near-Infrared (NIR) Spectroscopy Data Using Regression Coefficients.” Journal of Chemometrics, 25, 375–381.CrossRef Gowen A, Downey G, Esquerre C, O’Donnell C (2010). “Preventing Over–Fitting in PLS Calibration Models of Near-Infrared (NIR) Spectroscopy Data Using Regression Coefficients.” Journal of Chemometrics, 25, 375–381.CrossRef
Zurück zum Zitat Graybill F (1976). Theory and Application of the Linear Model. Wadsworth & Brooks, Pacific Grove, CA.MATH Graybill F (1976). Theory and Application of the Linear Model. Wadsworth & Brooks, Pacific Grove, CA.MATH
Zurück zum Zitat Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.MATHCrossRef Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.MATHCrossRef
Zurück zum Zitat Gupta S, Hanssens D, Hardie B, Kahn W, Kumar V, Lin N, Ravishanker N, Sriram S (2006). “Modeling Customer Lifetime Value.” Journal of Service Research, 9(2), 139–155.CrossRef Gupta S, Hanssens D, Hardie B, Kahn W, Kumar V, Lin N, Ravishanker N, Sriram S (2006). “Modeling Customer Lifetime Value.” Journal of Service Research, 9(2), 139–155.CrossRef
Zurück zum Zitat Guyon I, Elisseeff A (2003). “An Introduction to Variable and Feature Selection.” The Journal of Machine Learning Research, 3, 1157–1182.MATH Guyon I, Elisseeff A (2003). “An Introduction to Variable and Feature Selection.” The Journal of Machine Learning Research, 3, 1157–1182.MATH
Zurück zum Zitat Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification Using Support Vector Machines.” Machine Learning, 46(1), 389–422.MATHCrossRef Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification Using Support Vector Machines.” Machine Learning, 46(1), 389–422.MATHCrossRef
Zurück zum Zitat Hall M, Smith L (1997). “Feature Subset Selection: A Correlation Based Filter Approach.” International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858. Hall M, Smith L (1997). “Feature Subset Selection: A Correlation Based Filter Approach.” International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858.
Zurück zum Zitat Hall P, Hyndman R, Fan Y (2004). “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika, 91, 743–750.MathSciNetMATHCrossRef Hall P, Hyndman R, Fan Y (2004). “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika, 91, 743–750.MathSciNetMATHCrossRef
Zurück zum Zitat Hampel H, Frank R, Broich K, Teipel S, Katz R, Hardy J, Herholz K, Bokde A, Jessen F, Hoessler Y (2010). “Biomarkers for Alzheimer’s Disease: Academic, Industry and Regulatory Perspectives.” Nature Reviews Drug Discovery, 9(7), 560–574.CrossRef Hampel H, Frank R, Broich K, Teipel S, Katz R, Hardy J, Herholz K, Bokde A, Jessen F, Hoessler Y (2010). “Biomarkers for Alzheimer’s Disease: Academic, Industry and Regulatory Perspectives.” Nature Reviews Drug Discovery, 9(7), 560–574.CrossRef
Zurück zum Zitat Hand D, Till R (2001). “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning, 45(2), 171–186.MATHCrossRef Hand D, Till R (2001). “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning, 45(2), 171–186.MATHCrossRef
Zurück zum Zitat Hanley J, McNeil B (1982). “The Meaning and Use of the Area under a Receiver Operating (ROC) Curvel Characteristic.” Radiology, 143(1), 29–36.CrossRef Hanley J, McNeil B (1982). “The Meaning and Use of the Area under a Receiver Operating (ROC) Curvel Characteristic.” Radiology, 143(1), 29–36.CrossRef
Zurück zum Zitat Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg. Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg.
Zurück zum Zitat Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.MATHCrossRef Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.MATHCrossRef
Zurück zum Zitat Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report. Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report.
Zurück zum Zitat Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC. Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.
Zurück zum Zitat Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176. Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176.
Zurück zum Zitat Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.MathSciNetMATHCrossRef Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.MathSciNetMATHCrossRef
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition. Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.
Zurück zum Zitat Hawkins D (2004). “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences, 44(1), 1–12.CrossRef Hawkins D (2004). “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences, 44(1), 1–12.CrossRef
Zurück zum Zitat Hawkins D, Basak S, Mills D (2003). “Assessing Model Fit by Cross–Validation.” Journal of Chemical Information and Computer Sciences, 43(2), 579–586.CrossRef Hawkins D, Basak S, Mills D (2003). “Assessing Model Fit by Cross–Validation.” Journal of Chemical Information and Computer Sciences, 43(2), 579–586.CrossRef
Zurück zum Zitat Henderson H, Velleman P (1981). “Building Multiple Regression Models Interactively.” Biometrics, pp. 391–411. Henderson H, Velleman P (1981). “Building Multiple Regression Models Interactively.” Biometrics, pp. 391–411.
Zurück zum Zitat Hesterberg T, Choi N, Meier L, Fraley C (2008). “Least Angle and L 1 Penalized Regression: A Review.” Statistics Surveys, 2, 61–93.MathSciNetMATHCrossRef Hesterberg T, Choi N, Meier L, Fraley C (2008). “Least Angle and L 1 Penalized Regression: A Review.” Statistics Surveys, 2, 61–93.MathSciNetMATHCrossRef
Zurück zum Zitat Heyman R, Slep A (2001). “The Hazards of Predicting Divorce Without Cross-validation.” Journal of Marriage and the Family, 63(2), 473.CrossRef Heyman R, Slep A (2001). “The Hazards of Predicting Divorce Without Cross-validation.” Journal of Marriage and the Family, 63(2), 473.CrossRef
Zurück zum Zitat Hill A, LaPan P, Li Y, Haney S (2007). “Impact of Image Segmentation on High–Content Screening Data Quality for SK–BR-3 Cells.” BMC Bioinformatics, 8(1), 340.CrossRef Hill A, LaPan P, Li Y, Haney S (2007). “Impact of Image Segmentation on High–Content Screening Data Quality for SK–BR-3 Cells.” BMC Bioinformatics, 8(1), 340.CrossRef
Zurück zum Zitat Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354. Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354.
Zurück zum Zitat Holland J (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. Holland J (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.
Zurück zum Zitat Holland J (1992). Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA. Holland J (1992). Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA.
Zurück zum Zitat Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”. Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”.
Zurück zum Zitat Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.MathSciNetCrossRef Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.MathSciNetCrossRef
Zurück zum Zitat Hothorn T, Leisch F, Zeileis A, Hornik K (2005). “The Design and Analysis of Benchmark Experiments.” Journal of Computational and Graphical Statistics, 14(3), 675–699.MathSciNetCrossRef Hothorn T, Leisch F, Zeileis A, Hornik K (2005). “The Design and Analysis of Benchmark Experiments.” Journal of Computational and Graphical Statistics, 14(3), 675–699.MathSciNetCrossRef
Zurück zum Zitat Hsieh W, Tang B (1998). “Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography.” Bulletin of the American Meteorological Society, 79(9), 1855–1870.CrossRef Hsieh W, Tang B (1998). “Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography.” Bulletin of the American Meteorological Society, 79(9), 1855–1870.CrossRef
Zurück zum Zitat Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRef Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRef
Zurück zum Zitat Huang C, Chang B, Cheng D, Chang C (2012). “Feature Selection and Parameter Optimization of a Fuzzy-Based Stock Selection Model Using Genetic Algorithms.” International Journal of Fuzzy Systems, 14(1), 65–75.MathSciNet Huang C, Chang B, Cheng D, Chang C (2012). “Feature Selection and Parameter Optimization of a Fuzzy-Based Stock Selection Model Using Genetic Algorithms.” International Journal of Fuzzy Systems, 14(1), 65–75.MathSciNet
Zurück zum Zitat Huuskonen J (2000). “Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology.” Journal of Chemical Information and Computer Sciences, 40(3), 773–777.CrossRef Huuskonen J (2000). “Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology.” Journal of Chemical Information and Computer Sciences, 40(3), 773–777.CrossRef
Zurück zum Zitat Ihaka R, Gentleman R (1996). “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics, 5(3), 299–314. Ihaka R, Gentleman R (1996). “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics, 5(3), 299–314.
Zurück zum Zitat Jeatrakul P, Wong K, Fung C (2010). “Classification of Imbalanced Data By Combining the Complementary Neural Network and SMOTE Algorithm.” Neural Information Processing. Models and Applications, pp. 152–159. Jeatrakul P, Wong K, Fung C (2010). “Classification of Imbalanced Data By Combining the Complementary Neural Network and SMOTE Algorithm.” Neural Information Processing. Models and Applications, pp. 152–159.
Zurück zum Zitat Jerez J, Molina I, Garcia-Laencina P, Alba R, Ribelles N, Martin M, Franco L (2010). “Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem.” Artificial Intelligence in Medicine, 50, 105–115.CrossRef Jerez J, Molina I, Garcia-Laencina P, Alba R, Ribelles N, Martin M, Franco L (2010). “Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem.” Artificial Intelligence in Medicine, 50, 105–115.CrossRef
Zurück zum Zitat John G, Kohavi R, Pfleger K (1994). “Irrelevant Features and the Subset Selection Problem.” Proceedings of the Eleventh International Conference on Machine Learning, 129, 121–129. John G, Kohavi R, Pfleger K (1994). “Irrelevant Features and the Subset Selection Problem.” Proceedings of the Eleventh International Conference on Machine Learning, 129, 121–129.
Zurück zum Zitat Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc. Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc.
Zurück zum Zitat Johnson R, Wichern D (2001). Applied Multivariate Statistical Analysis. Prentice Hall. Johnson R, Wichern D (2001). Applied Multivariate Statistical Analysis. Prentice Hall.
Zurück zum Zitat Jolliffe I, Trendafilov N, Uddin M (2003). “A Modified Principal Component Technique Based on the lasso.” Journal of Computational and Graphical Statistics, 12(3), 531–547.MathSciNetCrossRef Jolliffe I, Trendafilov N, Uddin M (2003). “A Modified Principal Component Technique Based on the lasso.” Journal of Computational and Graphical Statistics, 12(3), 531–547.MathSciNetCrossRef
Zurück zum Zitat Kansy M, Senner F, Gubernator K (1998). “Physiochemical High Throughput Screening: Parallel Artificial Membrane Permeation Assay in the Description of Passive Absorption Processes.” Journal of Medicinal Chemistry, 41, 1007–1010.CrossRef Kansy M, Senner F, Gubernator K (1998). “Physiochemical High Throughput Screening: Parallel Artificial Membrane Permeation Assay in the Description of Passive Absorption Processes.” Journal of Medicinal Chemistry, 41, 1007–1010.CrossRef
Zurück zum Zitat Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1–20.CrossRef Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1–20.CrossRef
Zurück zum Zitat Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”. Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.
Zurück zum Zitat Kim J, Basak J, Holtzman D (2009). “The Role of Apolipoprotein E in Alzheimer’s Disease.” Neuron, 63(3), 287–303.CrossRef Kim J, Basak J, Holtzman D (2009). “The Role of Apolipoprotein E in Alzheimer’s Disease.” Neuron, 63(3), 287–303.CrossRef
Zurück zum Zitat Kim JH (2009). “Estimating Classification Error Rate: Repeated Cross–Validation, Repeated Hold–Out and Bootstrap.” Computational Statistics & Data Analysis, 53(11), 3735–3745.MathSciNetMATHCrossRef Kim JH (2009). “Estimating Classification Error Rate: Repeated Cross–Validation, Repeated Hold–Out and Bootstrap.” Computational Statistics & Data Analysis, 53(11), 3735–3745.MathSciNetMATHCrossRef
Zurück zum Zitat Kimball A (1957). “Errors of the Third Kind in Statistical Consulting.” Journal of the American Statistical Association, 52, 133–142.CrossRef Kimball A (1957). “Errors of the Third Kind in Statistical Consulting.” Journal of the American Statistical Association, 52, 133–142.CrossRef
Zurück zum Zitat Kira K, Rendell L (1992). “The Feature Selection Problem: Traditional Methods and a New Algorithm.” Proceedings of the National Conference on Artificial Intelligence, pp. 129–129. Kira K, Rendell L (1992). “The Feature Selection Problem: Traditional Methods and a New Algorithm.” Proceedings of the National Conference on Artificial Intelligence, pp. 129–129.
Zurück zum Zitat Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.CrossRef Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.CrossRef
Zurück zum Zitat Kohavi R (1995). “A Study of Cross–Validation and Bootstrap for Accuracy Estimation and Model Selection.” International Joint Conference on Artificial Intelligence, 14, 1137–1145. Kohavi R (1995). “A Study of Cross–Validation and Bootstrap for Accuracy Estimation and Model Selection.” International Joint Conference on Artificial Intelligence, 14, 1137–1145.
Zurück zum Zitat Kohavi R (1996). “Scaling Up the Accuracy of Naive–Bayes Classifiers: A Decision–Tree Hybrid.” In “Proceedings of the second international conference on knowledge discovery and data mining,” volume 7. Kohavi R (1996). “Scaling Up the Accuracy of Naive–Bayes Classifiers: A Decision–Tree Hybrid.” In “Proceedings of the second international conference on knowledge discovery and data mining,” volume 7.
Zurück zum Zitat Kohonen T (1995). Self–Organizing Maps. Springer. Kohonen T (1995). Self–Organizing Maps. Springer.
Zurück zum Zitat Kononenko I (1994). “Estimating Attributes: Analysis and Extensions of Relief.” In F Bergadano, L De Raedt (eds.), “Machine Learning: ECML–94,” volume 784, pp. 171–182. Springer Berlin / Heidelberg. Kononenko I (1994). “Estimating Attributes: Analysis and Extensions of Relief.” In F Bergadano, L De Raedt (eds.), “Machine Learning: ECML–94,” volume 784, pp. 171–182. Springer Berlin / Heidelberg.
Zurück zum Zitat Kuhn M (2008). “Building Predictive Models in R Using the caret Package.” Journal of Statistical Software, 28(5). Kuhn M (2008). “Building Predictive Models in R Using the caret Package.” Journal of Statistical Software, 28(5).
Zurück zum Zitat Kuiper S (2008). “Introduction to Multiple Regression: How Much Is Your Car Worth?” Journal of Statistics Education, 16(3). Kuiper S (2008). “Introduction to Multiple Regression: How Much Is Your Car Worth?” Journal of Statistics Education, 16(3).
Zurück zum Zitat Kvålseth T (1985). “Cautionary Note About R 2.” American Statistician, 39(4), 279–285. Kvålseth T (1985). “Cautionary Note About R 2.” American Statistician, 39(4), 279–285.
Zurück zum Zitat Lachiche N, Flach P (2003). “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers using ROC Curves.” In “Proceedings of the Twentieth International Conference on Machine Learning,” volume 20, pp. 416–424. Lachiche N, Flach P (2003). “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers using ROC Curves.” In “Proceedings of the Twentieth International Conference on Machine Learning,” volume 20, pp. 416–424.
Zurück zum Zitat Larose D (2006). Data Mining Methods and Models. Wiley. Larose D (2006). Data Mining Methods and Models. Wiley.
Zurück zum Zitat Lavine B, Davidson C, Moores A (2002). “Innovative Genetic Algorithms for Chemoinformatics.” Chemometrics and Intelligent Laboratory Systems, 60(1), 161–171.CrossRef Lavine B, Davidson C, Moores A (2002). “Innovative Genetic Algorithms for Chemoinformatics.” Chemometrics and Intelligent Laboratory Systems, 60(1), 161–171.CrossRef
Zurück zum Zitat Leach A, Gillet V (2003). An Introduction to Chemoinformatics. Springer. Leach A, Gillet V (2003). An Introduction to Chemoinformatics. Springer.
Zurück zum Zitat Leisch F (2002a). “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In W Härdle, B Rönz (eds.), “Compstat 2002 — Proceedings in Computational Statistics,” pp. 575–580. Physica Verlag, Heidelberg. Leisch F (2002a). “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In W Härdle, B Rönz (eds.), “Compstat 2002 — Proceedings in Computational Statistics,” pp. 575–580. Physica Verlag, Heidelberg.
Zurück zum Zitat Leisch F (2002b). “Sweave, Part I: Mixing R and LaTeX.” R News, 2(3), 28–31. Leisch F (2002b). “Sweave, Part I: Mixing R and LaTeX.” R News, 2(3), 28–31.
Zurück zum Zitat Levy S (2010). “The AI Revolution is On.” Wired. Levy S (2010). “The AI Revolution is On.” Wired.
Zurück zum Zitat Li J, Fine JP (2008). “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics, 9(3), 566–576.MATHCrossRef Li J, Fine JP (2008). “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics, 9(3), 566–576.MATHCrossRef
Zurück zum Zitat Lindgren F, Geladi P, Wold S (1993). “The Kernel Algorithm for PLS.” Journal of Chemometrics, 7, 45–59.CrossRef Lindgren F, Geladi P, Wold S (1993). “The Kernel Algorithm for PLS.” Journal of Chemometrics, 7, 45–59.CrossRef
Zurück zum Zitat Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In “Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining,” pp. 73–79. Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In “Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining,” pp. 73–79.
Zurück zum Zitat Lipinski C, Lombardo F, Dominy B, Feeney P (1997). “Experimental and Computational Approaches To Estimate Solubility and Permeability In Drug Discovery and Development Settings.” Advanced Drug Delivery Reviews, 23, 3–25.CrossRef Lipinski C, Lombardo F, Dominy B, Feeney P (1997). “Experimental and Computational Approaches To Estimate Solubility and Permeability In Drug Discovery and Development Settings.” Advanced Drug Delivery Reviews, 23, 3–25.CrossRef
Zurück zum Zitat Liu B (2007). Web Data Mining. Springer Berlin / Heidelberg. Liu B (2007). Web Data Mining. Springer Berlin / Heidelberg.
Zurück zum Zitat Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208. Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.
Zurück zum Zitat Lo V (2002). “The True Lift Model: A Novel Data Mining Approach To Response Modeling in Database Marketing.” ACM SIGKDD Explorations Newsletter, 4(2), 78–86.CrossRef Lo V (2002). “The True Lift Model: A Novel Data Mining Approach To Response Modeling in Database Marketing.” ACM SIGKDD Explorations Newsletter, 4(2), 78–86.CrossRef
Zurück zum Zitat Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.MATH Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.MATH
Zurück zum Zitat Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.MathSciNetMATH Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.MathSciNetMATH
Zurück zum Zitat Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.CrossRef Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.CrossRef
Zurück zum Zitat Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.MathSciNetMATH Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.MathSciNetMATH
Zurück zum Zitat Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.CrossRef Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.CrossRef
Zurück zum Zitat Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.CrossRef Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.CrossRef
Zurück zum Zitat Maindonald J, Braun J (2007). Data Analysis and Graphics Using R. Cambridge University Press, 2nd edition. Maindonald J, Braun J (2007). Data Analysis and Graphics Using R. Cambridge University Press, 2nd edition.
Zurück zum Zitat Mandal A, Johnson K, Wu C, Bornemeier D (2007). “Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques.” Journal of Chemical Information and Modeling, 47(3), 981–988.CrossRef Mandal A, Johnson K, Wu C, Bornemeier D (2007). “Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques.” Journal of Chemical Information and Modeling, 47(3), 981–988.CrossRef
Zurück zum Zitat Mandal A, Wu C, Johnson K (2006). “SELC: Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms.” Technometrics, 48(2), 273–283.MathSciNetCrossRef Mandal A, Wu C, Johnson K (2006). “SELC: Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms.” Technometrics, 48(2), 273–283.MathSciNetCrossRef
Zurück zum Zitat Martin J, Hirschberg D (1996). “Small Sample Statistics for Classification Error Rates I: Error Rate Measurements.” Department of Informatics and Computer Science Technical Report. Martin J, Hirschberg D (1996). “Small Sample Statistics for Classification Error Rates I: Error Rate Measurements.” Department of Informatics and Computer Science Technical Report.
Zurück zum Zitat Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012). “Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?” Journal of Chemical Information and Modeling, 52(10), 2570–2578.CrossRef Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012). “Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?” Journal of Chemical Information and Modeling, 52(10), 2570–2578.CrossRef
Zurück zum Zitat Massy W (1965). “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association, 60, 234–246.CrossRef Massy W (1965). “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association, 60, 234–246.CrossRef
Zurück zum Zitat McCarren P, Springer C, Whitehead L (2011). “An Investigation into Pharmaceutically Relevant Mutagenicity Data and the Influence on Ames Predictive Potential.” Journal of Cheminformatics, 3(51). McCarren P, Springer C, Whitehead L (2011). “An Investigation into Pharmaceutically Relevant Mutagenicity Data and the Influence on Ames Predictive Potential.” Journal of Cheminformatics, 3(51).
Zurück zum Zitat McClish D (1989). “Analyzing a Portion of the ROC Curve.” Medical Decision Making, 9, 190–195.CrossRef McClish D (1989). “Analyzing a Portion of the ROC Curve.” Medical Decision Making, 9, 190–195.CrossRef
Zurück zum Zitat Melssen W, Wehrens R, Buydens L (2006). “Supervised Kohonen Networks for Classification Problems.” Chemometrics and Intelligent Laboratory Systems, 83(2), 99–113.CrossRef Melssen W, Wehrens R, Buydens L (2006). “Supervised Kohonen Networks for Classification Problems.” Chemometrics and Intelligent Laboratory Systems, 83(2), 99–113.CrossRef
Zurück zum Zitat Mente S, Lombardo F (2005). “A Recursive–Partitioning Model for Blood–Brain Barrier Permeation.” Journal of Computer–Aided Molecular Design, 19(7), 465–481.CrossRef Mente S, Lombardo F (2005). “A Recursive–Partitioning Model for Blood–Brain Barrier Permeation.” Journal of Computer–Aided Molecular Design, 19(7), 465–481.CrossRef
Zurück zum Zitat Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469. Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469.
Zurück zum Zitat Mevik B, Wehrens R (2007). “The pls Package: Principal Component and Partial Least Squares Regression in R.” Journal of Statistical Software, 18(2), 1–24.CrossRef Mevik B, Wehrens R (2007). “The pls Package: Principal Component and Partial Least Squares Regression in R.” Journal of Statistical Software, 18(2), 1–24.CrossRef
Zurück zum Zitat Michailidis G, de Leeuw J (1998). “The Gifi System Of Descriptive Multivariate Analysis.” Statistical Science, 13, 307–336.MathSciNetMATHCrossRef Michailidis G, de Leeuw J (1998). “The Gifi System Of Descriptive Multivariate Analysis.” Statistical Science, 13, 307–336.MathSciNetMATHCrossRef
Zurück zum Zitat Min S, Lee J, Han I (2006). “Hybrid Genetic Algorithms and Support Vector Machines for Bankruptcy Prediction.” Expert Systems with Applications, 31(3), 652–660.CrossRef Min S, Lee J, Han I (2006). “Hybrid Genetic Algorithms and Support Vector Machines for Bankruptcy Prediction.” Expert Systems with Applications, 31(3), 652–660.CrossRef
Zurück zum Zitat Mitchell M (1998). An Introduction to Genetic Algorithms. MIT Press. Mitchell M (1998). An Introduction to Genetic Algorithms. MIT Press.
Zurück zum Zitat Molinaro A (2005). “Prediction Error Estimation: A Comparison of Resampling Methods.” Bioinformatics, 21(15), 3301–3307.CrossRef Molinaro A (2005). “Prediction Error Estimation: A Comparison of Resampling Methods.” Bioinformatics, 21(15), 3301–3307.CrossRef
Zurück zum Zitat Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.CrossRef Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.CrossRef
Zurück zum Zitat Montgomery D, Runger G (1993). “Gauge Capability and Designed Experiments. Part I: Basic Methods.” Quality Engineering, 6(1), 115–135.CrossRef Montgomery D, Runger G (1993). “Gauge Capability and Designed Experiments. Part I: Basic Methods.” Quality Engineering, 6(1), 115–135.CrossRef
Zurück zum Zitat Muenchen R (2009). R for SAS and SPSS Users. Springer. Muenchen R (2009). R for SAS and SPSS Users. Springer.
Zurück zum Zitat Myers R (1994). Classical and Modern Regression with Applications. PWS-KENT Publishing Company, Boston, MA, second edition. Myers R (1994). Classical and Modern Regression with Applications. PWS-KENT Publishing Company, Boston, MA, second edition.
Zurück zum Zitat Myers R, Montgomery D (2009). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley, New York, NY.MATH Myers R, Montgomery D (2009). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley, New York, NY.MATH
Zurück zum Zitat Neal R (1996). Bayesian Learning for Neural Networks. Springer-Verlag. Neal R (1996). Bayesian Learning for Neural Networks. Springer-Verlag.
Zurück zum Zitat Nelder J, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313.MATHCrossRef Nelder J, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313.MATHCrossRef
Zurück zum Zitat Netzeva T, Worth A, Aldenberg T, Benigni R, Cronin M, Gramatica P, Jaworska J, Kahn S, Klopman G, Marchant C (2005). “Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure–Activity Relationships.” In “The Report and Recommendations of European Centre for the Validation of Alternative Methods Workshop 52,” volume 33, pp. 1–19. Netzeva T, Worth A, Aldenberg T, Benigni R, Cronin M, Gramatica P, Jaworska J, Kahn S, Klopman G, Marchant C (2005). “Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure–Activity Relationships.” In “The Report and Recommendations of European Centre for the Validation of Alternative Methods Workshop 52,” volume 33, pp. 1–19.
Zurück zum Zitat Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia. Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia.
Zurück zum Zitat Olden J, Jackson D (2000). “Torturing Data for the Sake of Generality: How Valid Are Our Regression Models?” Ecoscience, 7(4), 501–510. Olden J, Jackson D (2000). “Torturing Data for the Sake of Generality: How Valid Are Our Regression Models?” Ecoscience, 7(4), 501–510.
Zurück zum Zitat Olsson D, Nelson L (1975). “The Nelder–Mead Simplex Procedure for Function Minimization.” Technometrics, 17(1), 45–51.MATHCrossRef Olsson D, Nelson L (1975). “The Nelder–Mead Simplex Procedure for Function Minimization.” Technometrics, 17(1), 45–51.MATHCrossRef
Zurück zum Zitat Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory. Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory.
Zurück zum Zitat Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.CrossRef Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.CrossRef
Zurück zum Zitat Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.MATHCrossRef Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.MATHCrossRef
Zurück zum Zitat Pepe MS, Longton G, Janes H (2009). “Estimation and Comparison of Receiver Operating Characteristic Curves.” Stata Journal, 9(1), 1–16. Pepe MS, Longton G, Janes H (2009). “Estimation and Comparison of Receiver Operating Characteristic Curves.” Stata Journal, 9(1), 1–16.
Zurück zum Zitat Perrone M, Cooper L (1993). “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks.” In RJ Mammone (ed.), “Artificial Neural Networks for Speech and Vision,” pp. 126–142. Chapman & Hall, London. Perrone M, Cooper L (1993). “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks.” In RJ Mammone (ed.), “Artificial Neural Networks for Speech and Vision,” pp. 126–142. Chapman & Hall, London.
Zurück zum Zitat Piersma A, Genschow E, Verhoef A, Spanjersberg M, Brown N, Brady M, Burns A, Clemann N, Seiler A, Spielmann H (2004). “Validation of the Postimplantation Rat Whole-embryo Culture Test in the International ECVAM Validation Study on Three In Vitro Embryotoxicity Tests.” Alternatives to Laboratory Animals, 32, 275–307. Piersma A, Genschow E, Verhoef A, Spanjersberg M, Brown N, Brady M, Burns A, Clemann N, Seiler A, Spielmann H (2004). “Validation of the Postimplantation Rat Whole-embryo Culture Test in the International ECVAM Validation Study on Three In Vitro Embryotoxicity Tests.” Alternatives to Laboratory Animals, 32, 275–307.
Zurück zum Zitat Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press. Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.
Zurück zum Zitat Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.MATHCrossRef Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.MATHCrossRef
Zurück zum Zitat Provost F, Fawcett T, Kohavi R (1998). “The Case Against Accuracy Estimation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Provost F, Fawcett T, Kohavi R (1998). “The Case Against Accuracy Estimation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453.
Zurück zum Zitat Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.CrossRef Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.CrossRef
Zurück zum Zitat Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348. Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348.
Zurück zum Zitat Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243. Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243.
Zurück zum Zitat Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers. Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.
Zurück zum Zitat Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”. Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”.
Zurück zum Zitat Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90. Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90.
Zurück zum Zitat Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.MathSciNetMATHCrossRef Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.MathSciNetMATHCrossRef
Zurück zum Zitat Radcliffe N, Surry P (2011). “Real–World Uplift Modelling With Significance–Based Uplift Trees.” Technical report, Stochastic Solutions. Radcliffe N, Surry P (2011). “Real–World Uplift Modelling With Significance–Based Uplift Trees.” Technical report, Stochastic Solutions.
Zurück zum Zitat Rännar S, Lindgren F, Geladi P, Wold S (1994). “A PLS Kernel Algorithm for Data Sets with Many Variables and Fewer Objects. Part 1: Theory and Algorithm.” Journal of Chemometrics, 8, 111–125. Rännar S, Lindgren F, Geladi P, Wold S (1994). “A PLS Kernel Algorithm for Data Sets with Many Variables and Fewer Objects. Part 1: Theory and Algorithm.” Journal of Chemometrics, 8, 111–125.
Zurück zum Zitat R Development Core Team (2008). R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments. R Foundation for Statistical Computing, Vienna, Austria. R Development Core Team (2008). R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments. R Foundation for Statistical Computing, Vienna, Austria.
Zurück zum Zitat R Development Core Team (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. R Development Core Team (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Zurück zum Zitat Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). “Detecting Novel Associations in Large Data Sets.” Science, 334(6062), 1518–1524.CrossRef Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). “Detecting Novel Associations in Large Data Sets.” Science, 334(6062), 1518–1524.CrossRef
Zurück zum Zitat Richardson M, Dominowska E, Ragno R (2007). “Predicting Clicks: Estimating the Click–Through Rate for New Ads.” In “Proceedings of the 16 th International Conference on the World Wide Web,” pp. 521–530. Richardson M, Dominowska E, Ragno R (2007). “Predicting Clicks: Estimating the Click–Through Rate for New Ads.” In “Proceedings of the 16 th International Conference on the World Wide Web,” pp. 521–530.
Zurück zum Zitat Ripley B (1995). “Statistical Ideas for Selecting Network Architectures.” Neural Networks: Artificial Intelligence and Industrial Applications, pp. 183–190. Ripley B (1995). “Statistical Ideas for Selecting Network Architectures.” Neural Networks: Artificial Intelligence and Industrial Applications, pp. 183–190.
Zurück zum Zitat Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press. Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press.
Zurück zum Zitat Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011). “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics, 12(1), 77. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011). “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics, 12(1), 77.
Zurück zum Zitat Robnik-Sikonja M, Kononenko I (1997). “An Adaptation of Relief for Attribute Estimation in Regression.” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 296–304. Robnik-Sikonja M, Kononenko I (1997). “An Adaptation of Relief for Attribute Estimation in Regression.” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 296–304.
Zurück zum Zitat Rodriguez M (2011). “The Failure of Predictive Modeling and Why We Follow the Herd.” Technical report, Concepcion, Martinez & Bellido. Rodriguez M (2011). “The Failure of Predictive Modeling and Why We Follow the Herd.” Technical report, Concepcion, Martinez & Bellido.
Zurück zum Zitat Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.MathSciNetMATHCrossRef Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.MathSciNetMATHCrossRef
Zurück zum Zitat Rumelhart D, Hinton G, Williams R (1986). “Learning Internal Representations by Error Propagation.” In “Parallel Distributed Processing: Explorations in the Microstructure of Cognition,” The MIT Press. Rumelhart D, Hinton G, Williams R (1986). “Learning Internal Representations by Error Propagation.” In “Parallel Distributed Processing: Explorations in the Microstructure of Cognition,” The MIT Press.
Zurück zum Zitat Rzepakowski P, Jaroszewicz S (2012). “Uplift Modeling in Direct Marketing.” Journal of Telecommunications and Information Technology, 2, 43–50. Rzepakowski P, Jaroszewicz S (2012). “Uplift Modeling in Direct Marketing.” Journal of Telecommunications and Information Technology, 2, 43–50.
Zurück zum Zitat Saar-Tsechansky M, Provost F (2007a). “Decision–Centric Active Learning of Binary–Outcome Models.” Information Systems Research, 18(1), 4–22.MATHCrossRef Saar-Tsechansky M, Provost F (2007a). “Decision–Centric Active Learning of Binary–Outcome Models.” Information Systems Research, 18(1), 4–22.MATHCrossRef
Zurück zum Zitat Saar-Tsechansky M, Provost F (2007b). “Handling Missing Values When Applying Classification Models.” Journal of Machine Learning Research, 8, 1625–1657.MATH Saar-Tsechansky M, Provost F (2007b). “Handling Missing Values When Applying Classification Models.” Journal of Machine Learning Research, 8, 1625–1657.MATH
Zurück zum Zitat Saeys Y, Inza I, Larranaga P (2007). “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics, 23(19), 2507–2517.CrossRef Saeys Y, Inza I, Larranaga P (2007). “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics, 23(19), 2507–2517.CrossRef
Zurück zum Zitat Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227. Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.
Zurück zum Zitat Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009). “State–of–the–Art in Parallel Computing with R.” Journal of Statistical Software, 31(1). Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009). “State–of–the–Art in Parallel Computing with R.” Journal of Statistical Software, 31(1).
Zurück zum Zitat Serneels S, Nolf ED, Espen PV (2006). “Spatial Sign Pre-processing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling, 46(3), 1402–1409.CrossRef Serneels S, Nolf ED, Espen PV (2006). “Spatial Sign Pre-processing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling, 46(3), 1402–1409.CrossRef
Zurück zum Zitat Shachtman N (2011). “Pentagon’s Prediction Software Didn’t Spot Egypt Unrest.” Wired. Shachtman N (2011). “Pentagon’s Prediction Software Didn’t Spot Egypt Unrest.” Wired.
Zurück zum Zitat Siegel E (2011). “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It.” Technical report, Prediction Impact Inc. Siegel E (2011). “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It.” Technical report, Prediction Impact Inc.
Zurück zum Zitat Simon R, Radmacher M, Dobbin K, McShane L (2003). “Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification.” Journal of the National Cancer Institute, 95(1), 14–18.CrossRef Simon R, Radmacher M, Dobbin K, McShane L (2003). “Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification.” Journal of the National Cancer Institute, 95(1), 14–18.CrossRef
Zurück zum Zitat Smola A (1996). “Regression Estimation with Support Vector Learning Machines.” Master’s thesis, Technische Universit at Munchen. Smola A (1996). “Regression Estimation with Support Vector Learning Machines.” Master’s thesis, Technische Universit at Munchen.
Zurück zum Zitat Spector P (2008). Data Manipulation with R. Springer. Spector P (2008). Data Manipulation with R. Springer.
Zurück zum Zitat Steyerberg E (2010). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, 1st ed. softcover of orig. ed. 2009 edition. Steyerberg E (2010). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, 1st ed. softcover of orig. ed. 2009 edition.
Zurück zum Zitat Stone M, Brooks R (1990). “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Component Regression.” Journal of the Royal Statistical Society, Series B, 52, 237–269.MathSciNetMATH Stone M, Brooks R (1990). “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Component Regression.” Journal of the Royal Statistical Society, Series B, 52, 237–269.MathSciNetMATH
Zurück zum Zitat Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.CrossRef Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.CrossRef
Zurück zum Zitat Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.MathSciNetMATHCrossRef Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.MathSciNetMATHCrossRef
Zurück zum Zitat Tetko I, Tanchuk V, Kasheva T, Villa A (2001). “Estimation of Aqueous Solubility of Chemical Compounds Using E–State Indices.” Journal of Chemical Information and Computer Sciences, 41(6), 1488–1493.CrossRef Tetko I, Tanchuk V, Kasheva T, Villa A (2001). “Estimation of Aqueous Solubility of Chemical Compounds Using E–State Indices.” Journal of Chemical Information and Computer Sciences, 41(6), 1488–1493.CrossRef
Zurück zum Zitat Tibshirani R (1996). “Regression Shrinkage and Selection via the lasso.” Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.MathSciNetMATH Tibshirani R (1996). “Regression Shrinkage and Selection via the lasso.” Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.MathSciNetMATH
Zurück zum Zitat Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.CrossRef Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.CrossRef
Zurück zum Zitat Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.MathSciNetMATHCrossRef Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.MathSciNetMATHCrossRef
Zurück zum Zitat Ting K (2002). “An Instance–Weighting Method to Induce Cost–Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), 659–665.CrossRef Ting K (2002). “An Instance–Weighting Method to Induce Cost–Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), 659–665.CrossRef
Zurück zum Zitat Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.MathSciNetMATH Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.MathSciNetMATH
Zurück zum Zitat Titterington M (2010). “Neural Networks.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 1–8.MATHCrossRef Titterington M (2010). “Neural Networks.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 1–8.MATHCrossRef
Zurück zum Zitat Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001). “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics, 17(6), 520–525.CrossRef Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001). “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics, 17(6), 520–525.CrossRef
Zurück zum Zitat Tumer K, Ghosh J (1996). “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers.” Pattern Recognition, 29(2), 341–348.CrossRef Tumer K, Ghosh J (1996). “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers.” Pattern Recognition, 29(2), 341–348.CrossRef
Zurück zum Zitat US Commodity Futures Trading Commission and US Securities & Exchange Commission (2010). Findings Regarding the Market Events of May 6, 2010. US Commodity Futures Trading Commission and US Securities & Exchange Commission (2010). Findings Regarding the Market Events of May 6, 2010.
Zurück zum Zitat Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.MATHCrossRef Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.MATHCrossRef
Zurück zum Zitat Van Der Putten P, Van Someren M (2004). “A Bias–Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000.” Machine Learning, 57(1), 177–195.MATHCrossRef Van Der Putten P, Van Someren M (2004). “A Bias–Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000.” Machine Learning, 57(1), 177–195.MATHCrossRef
Zurück zum Zitat Van Hulse J, Khoshgoftaar T, Napolitano A (2007). “Experimental Perspectives On Learning From Imbalanced Data.” In “Proceedings of the 24 th International Conference On Machine learning,” pp. 935–942. Van Hulse J, Khoshgoftaar T, Napolitano A (2007). “Experimental Perspectives On Learning From Imbalanced Data.” In “Proceedings of the 24 th International Conference On Machine learning,” pp. 935–942.
Zurück zum Zitat Vapnik V (2010). The Nature of Statistical Learning Theory. Springer. Vapnik V (2010). The Nature of Statistical Learning Theory. Springer.
Zurück zum Zitat Varma S, Simon R (2006). “Bias in Error Estimation When Using Cross–Validation for Model Selection.” BMC Bioinformatics, 7(1), 91.MathSciNetCrossRef Varma S, Simon R (2006). “Bias in Error Estimation When Using Cross–Validation for Model Selection.” BMC Bioinformatics, 7(1), 91.MathSciNetCrossRef
Zurück zum Zitat Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404. Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404.
Zurück zum Zitat Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer. Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer.
Zurück zum Zitat Venables W, Smith D, the R Development Core Team (2003). An Introduction to R. R Foundation for Statistical Computing, Vienna, Austria, version 1.6.2 edition. ISBN 3-901167-55-2, URL http://www.R-project.org. Venables W, Smith D, the R Development Core Team (2003). An Introduction to R. R Foundation for Statistical Computing, Vienna, Austria, version 1.6.2 edition. ISBN 3-901167-55-2, URL http://​www.​R-project.​org.
Zurück zum Zitat Venkatraman E (2000). “A Permutation Test to Compare Receiver Operating Characteristic Curves.” Biometrics, 56(4), 1134–1138.MathSciNetMATHCrossRef Venkatraman E (2000). “A Permutation Test to Compare Receiver Operating Characteristic Curves.” Biometrics, 56(4), 1134–1138.MathSciNetMATHCrossRef
Zurück zum Zitat Veropoulos K, Campbell C, Cristianini N (1999). “Controlling the Sensitivity of Support Vector Machines.” Proceedings of the International Joint Conference on Artificial Intelligence, 1999, 55–60. Veropoulos K, Campbell C, Cristianini N (1999). “Controlling the Sensitivity of Support Vector Machines.” Proceedings of the International Joint Conference on Artificial Intelligence, 1999, 55–60.
Zurück zum Zitat Wager TT, Hou X, Verhoest PR, Villalobos A (2010). “Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach To Enable Alignment of Druglike Properties.” ACS Chemical Neuroscience, 1(6), 435–449.CrossRef Wager TT, Hou X, Verhoest PR, Villalobos A (2010). “Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach To Enable Alignment of Druglike Properties.” ACS Chemical Neuroscience, 1(6), 435–449.CrossRef
Zurück zum Zitat Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag. Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag.
Zurück zum Zitat Wang C, Venkatesh S (1984). “Optimal Stopping and Effective Machine Complexity in Learning.” Advances in NIPS, pp. 303–310. Wang C, Venkatesh S (1984). “Optimal Stopping and Effective Machine Complexity in Learning.” Advances in NIPS, pp. 303–310.
Zurück zum Zitat Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137. Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137.
Zurück zum Zitat Weiss G, Provost F (2001a). “The Effect of Class Distribution on Classifier Learning: An Empirical Study.” Department of Computer Science, Rutgers University. Weiss G, Provost F (2001a). “The Effect of Class Distribution on Classifier Learning: An Empirical Study.” Department of Computer Science, Rutgers University.
Zurück zum Zitat Weiss G, Provost F (2001b). “The Effect of Class Distribution On Classifier Learning: An Empirical Study.” Technical Report ML-TR-44, Department of Computer Science, Rutgers University. Weiss G, Provost F (2001b). “The Effect of Class Distribution On Classifier Learning: An Empirical Study.” Technical Report ML-TR-44, Department of Computer Science, Rutgers University.
Zurück zum Zitat Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley. Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley.
Zurück zum Zitat Westphal C (2008). Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies. CRC Press. Westphal C (2008). Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies. CRC Press.
Zurück zum Zitat Whittingham M, Stephens P, Bradbury R, Freckleton R (2006). “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?” Journal of Animal Ecology, 75(5), 1182–1189.CrossRef Whittingham M, Stephens P, Bradbury R, Freckleton R (2006). “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?” Journal of Animal Ecology, 75(5), 1182–1189.CrossRef
Zurück zum Zitat Willett P (1999). “Dissimilarity–Based Algorithms for Selecting Structurally Diverse Sets of Compounds.” Journal of Computational Biology, 6(3), 447–457.MathSciNetCrossRef Willett P (1999). “Dissimilarity–Based Algorithms for Selecting Structurally Diverse Sets of Compounds.” Journal of Computational Biology, 6(3), 447–457.MathSciNetCrossRef
Zurück zum Zitat Williams G (2011). Data Mining with Rattle and R : The Art of Excavating Data for Knowledge Discovery. Springer. Williams G (2011). Data Mining with Rattle and R : The Art of Excavating Data for Knowledge Discovery. Springer.
Zurück zum Zitat Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.MathSciNetMATHCrossRef Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.MathSciNetMATHCrossRef
Zurück zum Zitat Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.MathSciNetMATHCrossRef Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.MathSciNetMATHCrossRef
Zurück zum Zitat Wold H (1966). “Estimation of Principal Components and Related Models by Iterative Least Squares.” In P Krishnaiah (ed.), “Multivariate Analyses,” pp. 391–420. Academic Press, New York. Wold H (1966). “Estimation of Principal Components and Related Models by Iterative Least Squares.” In P Krishnaiah (ed.), “Multivariate Analyses,” pp. 391–420. Academic Press, New York.
Zurück zum Zitat Wold H (1982). “Soft Modeling: The Basic Design and Some Extensions.” In K Joreskog, H Wold (eds.), “Systems Under Indirect Observation: Causality, Structure, Prediction,” pt. 2, pp. 1–54. North–Holland, Amsterdam. Wold H (1982). “Soft Modeling: The Basic Design and Some Extensions.” In K Joreskog, H Wold (eds.), “Systems Under Indirect Observation: Causality, Structure, Prediction,” pt. 2, pp. 1–54. North–Holland, Amsterdam.
Zurück zum Zitat Wold S (1995). “PLS for Multivariate Linear Modeling.” In H van de Waterbeemd (ed.), “Chemometric Methods in Molecular Design,” pp. 195–218. VCH, Weinheim. Wold S (1995). “PLS for Multivariate Linear Modeling.” In H van de Waterbeemd (ed.), “Chemometric Methods in Molecular Design,” pp. 195–218. VCH, Weinheim.
Zurück zum Zitat Wold S, Johansson M, Cocchi M (1993). “PLS–Partial Least-Squares Projections to Latent Structures.” In H Kubinyi (ed.), “3D QSAR in Drug Design,” volume 1, pp. 523–550. Kluwer Academic Publishers, The Netherlands. Wold S, Johansson M, Cocchi M (1993). “PLS–Partial Least-Squares Projections to Latent Structures.” In H Kubinyi (ed.), “3D QSAR in Drug Design,” volume 1, pp. 523–550. Kluwer Academic Publishers, The Netherlands.
Zurück zum Zitat Wold S, Martens H, Wold H (1983). “The Multivariate Calibration Problem in Chemistry Solved by the PLS Method.” In “Proceedings from the Conference on Matrix Pencils,” Springer–Verlag, Heidelberg.MATH Wold S, Martens H, Wold H (1983). “The Multivariate Calibration Problem in Chemistry Solved by the PLS Method.” In “Proceedings from the Conference on Matrix Pencils,” Springer–Verlag, Heidelberg.MATH
Zurück zum Zitat Wolpert D (1996). “The Lack of a priori Distinctions Between Learning Algorithms.” Neural Computation, 8(7), 1341–1390.CrossRef Wolpert D (1996). “The Lack of a priori Distinctions Between Learning Algorithms.” Neural Computation, 8(7), 1341–1390.CrossRef
Zurück zum Zitat Yeh I (1998). “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete research, 28(12), 1797–1808.CrossRef Yeh I (1998). “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete research, 28(12), 1797–1808.CrossRef
Zurück zum Zitat Yeh I (2006). “Analysis of Strength of Concrete Using Design of Experiments and Neural Networks.” Journal of Materials in Civil Engineering, 18, 597–604.CrossRef Yeh I (2006). “Analysis of Strength of Concrete Using Design of Experiments and Neural Networks.” Journal of Materials in Civil Engineering, 18, 597–604.CrossRef
Zurück zum Zitat Youden W (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35.CrossRef Youden W (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35.CrossRef
Zurück zum Zitat Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann. Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann.
Zurück zum Zitat Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.MathSciNetCrossRef Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.MathSciNetCrossRef
Zurück zum Zitat Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.MathSciNetCrossRef Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.MathSciNetCrossRef
Zurück zum Zitat Zou H, Hastie T (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67(2), 301–320.MathSciNetMATHCrossRef Zou H, Hastie T (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67(2), 301–320.MathSciNetMATHCrossRef
Zurück zum Zitat Zou H, Hastie T, Tibshirani R (2004). “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics, 15, 2006.MathSciNet Zou H, Hastie T, Tibshirani R (2004). “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics, 15, 2006.MathSciNet
Metadaten
Titel
Factors That Can Affect Model Performance
verfasst von
Max Kuhn
Kjell Johnson
Copyright-Jahr
2013
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-6849-3_20

Premium Partner