Methods Inf Med 2008; 47(02): 167-173
DOI: 10.3414/ME0447
Original Article
Schattauer GmbH

Double-smoothing in Kernel Hazard Rate Estimation

R. Weißbach
1   Institut für Wirtschafts- und Sozialstatistik, Fachbereich Statistik, Universität Dortmund, Dortmund, Germany
,
A. Pfahlberg
2   Institut für Medizininformatik, Biometrieund Epidemiologie, Medizinische Fakultät, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
O. Gefeller
2   Institut für Medizininformatik, Biometrieund Epidemiologie, Medizinische Fakultät, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
› Author Affiliations
Further Information

Publication History

Received: 11 August 2006

accepted: 05 June 2007

Publication Date:
18 January 2018 (online)

Summary

Objectives: In oncological studies, the hazard rate can be used to differentiate subgroups of the study population according to their patterns of survival risk over time. Nonparametric curve estimation has been suggested as an exploratory means of revealing such patterns. The decision about the type of smoothing parameter is critical for performance in practice. In this paper, we study data-adaptive smoothing.

Methods: A decade ago, the nearest-neighbor bandwidth was introduced for censored data in survival analysis. It is specified by one parameter, namely the number of nearest neighbors. Bandwidth selection in this setting has rarely been investigated, although the heuristical advantages over the frequently-studied fixed bandwidth are quite obvious. The asymptotical relationship between the fixed and the nearest-neighbor bandwidth can be used to generate novel approaches.

Results: We develop a new selection algorithm termed double-smoothing for the nearest-neighbor bandwidth in hazard rate estimation. Our approach uses a finite sample approximation of the asymptotical relationship between the fixed and nearest-neighbor bandwidth. By so doing, we identify the nearest-neighbor bandwidth as an additional smoothing step and achieve further data-adaption after fixed bandwidth smoothing. We illustrate the application of the new algorithm in a clinical study and compare the outcome to the traditional fixed bandwidth result, thus demonstrating the practical performance of the technique.

Conclusion: The double-smoothing approach enlarges the methodological repertoire for selecting smoothing parameters in nonparametric hazard rate estimation. The slight increase in computational effort is rewarded with a substantial amount of estimation stability, thus demonstrating the benefit of the technique for biostatistical applications.

 
  • References

  • 1 Nelson W. Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 1972; 14: 945-966.
  • 2 Aalen OO. Nonparametric Estimation of Partial Transition Probabilities in Multiple Decrement Models. Annals of Statistics 1978; 6: 534-545.
  • 3 Bobrowski L. Introduction of Similarity Measures and Medical Diagnosis Support through Separable, Linear Data Transformation. Methods Inf Med 2006; 45: 200-203.
  • 4 Jones MC, Marron JS, Scheather SJ. A Brief Survey on Bandwidth Selection for Density Estimation. Journal of the American Statistical Association 1996; 91: 401-407.
  • 5 Parzen E. On the Estimation of a Probability Density Function and the Mode. Annals of Mathematical Statistics 1962; 33: 1065-1076.
  • 6 Wand MP, Jones MC. Kernel Smoothing. London: Chapman & Hall; 1995
  • 7 Wagner TJ. Nonparametric Estimates of Probability Densities. IEEE Transactions on Information Theory 1975; 21: 438-440.
  • 8 Breiman L, Meisel W, Purcell E. Variable kernel estimates of multivariate densities. Technometrics 1977; 19: 135-144.
  • 9 Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American StatisticalAssociation 1958; 53: 457-481.
  • 10 Gefeller O, Dette H. Nearest Neighbour Kernel Estimation of the Hazard Function from Censored Data. Journal of Statistical Computation and Simulation 1992; 43: 93-101.
  • 11 Wang JL. In: Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. pp 4140-4150.
  • 12 Lawless JF. Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons; 1982
  • 13 Andersen PK, Borgan Ø, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer; 1993
  • 14 Pflüger R, Gefeller O. A Bridge from the Nearest Neighbour to the Fixed Bandwidth in Nonparametric Functional Estimation,. In: R. Decker, W. Gaul (eds).. Classification and Information Processing at the Turn of the Millenium. Berlin: Springer; 2000. pp 119-126.
  • 15 Weißbach R. A general kernel functional estimator with general bandwidth – strong consistency and applications. Journal of Nonparametric Statistics 2006; 18: 1-12.
  • 16 Dette H, Gefeller O. Definitions of Nearest Neighbour Distances for Censored Data on the Nearest Neighbour Kernel Estimators of the Hazard Rate. Journal of Nonparametric Statistics 1995; 4: 271-282.
  • 17 Hjort NL. Semiparametric Estimation of the Hazard Rates. In: Advanced Study Workshop on Survival Analysis and Related Topics. NATO; 1991
  • 18 Silverman BW. Density Estimation. London: Chapman & Hall; 1986
  • 19 Pfahlberg A, Kölmel KF, Grange JM, Mastrangelo G, Krone B, Botev IN. et al. Inverse association between melanoma and previous vaccinations against tuberculosis and smallpox: results of the FEBIM study. Journal of Investigative Dermatology 2002; 119: 570-575.
  • 20 Kölmel KF, Pfahlberg A, Mastrangelo G, Niin M, Botev IN, Seebacher C. et al. Infections and melanoma risk: results of a multicentre EORTC case-control study. Melanoma Reseach 1999; 9: 511-519.
  • 21 Kölmel KF, Grange JM, Krone B, Mastrangelo G, Rossi CR, Henzf BM. et al. Prior immunisation with vaccinia or BCG is associated with an improved prognosis of patients with malignant melanoma. An EORTC cohort study on 542 patients. European Journal of Cancer 2004; 41: 118-125.
  • 22 Balch CM, Soong SJ, Gershenwald JE, Thompson JF, Reintgen DS, Cascinelli N. et al. Prognostic factors analysis of 17,600 melanoma patients: Validation of the American Joint Committee on Cancer melanoma staging system. Journal of Clinical Oncology 2001; 19: 3622-3634.
  • 23 Valenta Z, Pitha J, Podrapska I, Poledne R. Gaining Insight from Flexible Models. Methods Inf Med 2006; 45: 186-190.
  • 24 Tanner MA, Wong WH. The estimation of the hazard function from randomly censored data by the kernel method. Annals of Statistics 1983; 11: 989-993.
  • 25 Gefeller O, Pflüger R, Bregenzer T. The Implementation of a Data-Driven Selection Procedure for the Smoothing Parameter in Nonparametric Hazard Rate Estimation Using SAS/IML Software. In: Proceedings of the 13th SAS European Users Group International Conference. SAS Institute Inc. Carry; 1996 pp 1288-1300.
  • 26 Hall P, Hu TC, Marron JS. On the Amount of Noise Inherent in Bandwidth Selection for a Kernel Density Estimator. Annals of Statistics 1987; 15: 163-181.
  • 27 Hall P, Sheather SJ, Jones MC, Marron JS. On Optimal Data-Based Bandwidth Selection in Kernel Density Estimation. Biometrika 1991; 78: 263-269.
  • 28 Gasser T, Müller HG, Mammitzsch V. Kernels for Nonparametric Curve Estimation. Journal of the Royal Statistical Society, Series B 1985; 47: 238-352.
  • 29 Mudholkar GS, Srivastava DK, Freimer M. The Exponential Weibull Family: A Reanalysis of the Bus-Motor-Failure Data. Technometrics 1995; 37: 436-445.
  • 30 Radespiel-Tröger M, Gefeller O, Rabenstein T, Hothorn T. Association between Split Selection Instability and Predictive Error in Survival Trees. Methods Inf Med 2006; 45: 548-556.