Abstract
The body of literature on classification methods which estimate boundaries between the groups (classes) by optimizing a function of the L p -norm distances of observations in each group from these boundaries, is maturing fast. The number of published research articles on this topic, especially on mathematical programming (MP) formulations and techniques for L p -norm classification, is now sizable. This paper highlights historical developments that have defined the field, and looks ahead at challenges that may shape new research directions in the next decade. In the first part, the paper summarizes basic concepts and ideas, and briefly reviews past research. Throughout, an attempt is made to integrate a number of the most important L p -norm methods proposed to date within a unified framework, emphasizing their conceptual differences and similarities, rather than focusing on mathematical detail. In the second part, the paper discusses several potential directions for future research in this area. The long-term prospects of L p -norm classification (and discriminant) research may well hinge upon whether or not the channels of communication between on the one hand researchers active in L p -norm classification, who tend to have their roots primarily in the decision sciences, the management sciences, computer science and engineering, and on the other hand practitioners and researchers in the statistical classification community, will be improved. This paper offers potential reasons for the lack of communication between these groups, and suggests ways in which L p -norm research may be strengthened from a statistical viewpoint. The results obtained in L p -norm classification studies are clearly relevant and of importance to all researchers and practitioners active in classification and discriminant analysis. The paper also briefly discusses artificial neural networks, a promising non-traditional method for classification which has recently emerged, and suggests that it may be useful to explore hybrid classification methods that take advantage of the complementary strengths of different methods, e.g., neural network and L p -norm methods.
Similar content being viewed by others
References
P.L. Abad and W.J. Banks, New LP based heuristics for the classification problem, European Journal of Operational Research 67(1993)88–100.
N.P. Archer and S. Wang, Application of the back propagation neural network algorithm with monotonicity constraints for two-group classification problems, Computers and Operations Research 24(1993)60–75.
O.K. Asparoukhov, Microprocessor system for investigation of thromboembolic complications, Unpublished Ph.D. Dissertation, Technical University of Sofia, Bulgaria (in Bulgarian), 1985.
O.K. Asparoukhov and A. Stam, Mathematical programming formulations for two-group classification with binary variables, Working Paper 96-92, International Institute for Applied Systems Analysis, Laxenburg, Austria, 1996.
S.M. Bajgier and A. Hill, An experimental comparison of statistical and linear programming approaches to the discriminant problem, Decision Sciences 13(1982)604–618.
W.J. Banks and P.L. Abad, An efficient optimal solution algorithm for the classification problem, Decision Sciences 22(1991)1008–1023.
W.J. Banks and P.L. Abad, On the performance of linear programming heuristics applied on a quadratic transformation in the classification problem, European Journal of Operational Research 74(1994)23–28.
I. Barrodale and F.D.K. Roberts, Applications of mathematical programming to L p approximations, in: Nonlinear Programming, J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Academic Press, New York, 1970, pp. 447–464.
I. Barrodale and F.D.K. Roberts, An improved algorithm for discrete L 1 approximation, SIAM Journal on Numerical Analysis 10(1973)839–848.
G. Bassett, Jr. and R. Koenker, The asymptotic distribution of the least absolute error estimator, Journal of the American Statistical Association 73(1978)618–622.
J.A. Benediktsson, P.H. Swain and O.K. Ersory, Neural network approaches versus statistical methods in classification of multisource remote sensing data, IEEE Transactions on Geoscience and Remote Sensing 28(1990)540–551.
K.P. Bennett and O.L. Mangasarian, Neural network training via linear programming, in: Advances in Optimization and Parallel Computing, P.M. Pardalos, ed., North-Holland, Amsterdam, 1992, pp. 56–67.
K.P. Bennett and O.L. Mangasarian, Robust linear programming discrimination of two linearly separable sets, Optimization Methods and Software 1(1992)23–34.
BMDP, BMDP Statistical Software Release 7, Volumes 1, 2, 1990.
F.B. Bryant, Analyze your data optimally using ODA 1.0, Decision Line (May 1994)16–19.
J.D. Camm, A.S. Raturi and S. Tsubakitani, Cutting Big M down to size, Interfaces 20(1990) 61–66.
T.M. Cavalier, J.P. Ignizio and A.L. Soyster, Discriminant analysis via mathematical programming: Certain problems and their causes, Computers and Operations Research 16(1989)353–362.
E.U. Choo and W.C. Wedley, Optimal criterion weights in repetitive multicriteria decision-making, Journal of the Operational Research Society 36(1985)983–992.
G. Cybenko, Approximations by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems 2(1989)303–314.
W.R. Dillon and M. Goldstein, On the performance of some multinomial classification rules, Journal of the American Statistical Association 73(1978)305–313.
Y. Dodge (ed.), Statistical Data Analysis Based on the L 1-Norm and Related Methods, North-Holland, Amsterdam, 1987.
A.P. Duarte Silva, Minimizing misclassification costs in two-group classification analysis, Unpublished Ph.D. Dissertation, The University of Georgia, 1995.
A.P. Duarte Silva and A. Stam, BestClass: A SAS-based software package of nonparametric methods for two-group classification, Working Paper 94-396, Terry College of Business, The University of Georgia, Athens, GA, 1994.
A.P. Duarte Silva and A. Stam, Second-order mathematical programming formulations for discriminant analysis, European Journal of Operational Research 74(1994)4–22.
A.P. Duarte Silva and A. Stam, An efficient mixed integer programming algorithm for minimizing the training sample misclassification cost in two-group classification, Working Paper 96-93, International Institute for Applied Systems Analysis, Laxenburg, Austria, 1996.
a]_R.A. Eisenbeis, Pitfalls in the application of discriminant analysis, Journal of Finance 32(1977) 875–900.
S.S. Erenguc and G.J. Koehler, Survey of mathematical programming models and experimental results for linear discriminant analysis, Managerial and Decision Economics 11(1990)215–226.
B. Farrell, A better way to test for significance, Marketing News (January 1995)1.
L.P. Fatti, D.M. Hawkins and E.L. Raath, Discriminant analysis, in: Topics in Applied Multivariate Analysis, D.M. Hawkins, ed., Cambridge University Press, Cambridge, England, 1982, pp. 1–71.
M. Fischer, S. Gopal, P. Staufer and K. Steinocher, Evaluation of neural pattern classifiers for a remote sensing application, Working Paper, Department of Geography, Wirtschaftsuniversität Wien, Vienna, Austria, 1994.
R.A. Fisher, The use of multiple measurements in taxonomy problems, Annals of Eugenics 7(1936) 179–188.
N. Freed and F. Glover, A linear programming approach to the discriminant problem, Decision Sciences 12(1981)68–74.
N. Freed and F. Glover, Simple but powerful goal programming formulations for the discriminant problem, European Journal of Operational Research 7(1981)44–60.
N. Freed and F. Glover, Evaluating alternative linear programming models to solve the two-group discriminant problem, Decision Sciences 17(1986)151–162.
N. Freed and F. Glover, Resolving certain difficulties and improving the classification power of LP discriminant analysis formulations, Decision Sciences 17(1986)589–595.
W.V. Gehrlein, General mathematical programming formulations for the statistical classification problem, Operations Research Letters 5(1986)299–304.
F.R. Glahe and J.G. Hunt, The small sample properties of simultaneous equation least absolute estimators vis-à-vis least squares estimators, Econometrica 38(1970)742–753.
N. Glick, Sample-based classification procedures related to empiric distributions, IEEE Transactions on Information Theory 22(1976)454–461.
L.W. Glorfeld, A robust methodology for discriminant analysis based on least-absolute-value estimation, Managerial and Decision Economics, 11(1990)267–277.
L.W. Glorfeld and D.L. Olson, Variable selection for robust discriminant analysis based on the L 1 metric, Proceedings of the American Institute of the Decision Sciences (1982)745–747.
F. Glover, Improved linear programming models for discriminant analysis, Decision Sciences 21 (1990)771–785.
F. Glover, S. Keene and B. Duea, A new class of models for the discriminant problem, Decision Sciences 19(1988)269–280.
W. Gochet, A. Stam, V. Srinivasan and S. Chen, Multigroup discriminant analysis using linear programming, Operations Research 45(1997)213–225.
M. Goldstein and W.R. Dillon, Discrete Discriminant Analysis, Wiley, New York, 1978.
R. Gonin and A.H. Money (eds.), Nonlinear L p -Norm Estimation, Marcel Dekker, New York, 1989.
D.J. Hand, Discrimination and Classification, Wiley, New York, 1981.
D.J. Hand, Kernel Discriminant Analysis, Wiley, New York, 1982.
D.J. Hand, Discriminant analysis for categorical data, in: Lecture Notes and Program of the 4th European Courses in Advanced Statistics Program: Analysis of Categorical Data Theory and Application, Leiden, The Netherlands, 1993, pp. 135–174.
J. Hertz, A. Krogh and R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Redwood City, CA, 1991.
F.S. Hillier and G.J. Lieberman, Introduction to Operations Research, 5th ed., McGraw-Hill, New York, 1990.
J.H. Hosseini and R.L. Armacost, The two-group discriminant problem with equal group mean vectors: An experimental evaluation of six linear/nonlinear programming formulations, European Journal of Operational Research 77(1994)241–252.
C.J. Huberty, Issues in the use and interpretation of discriminant analysis, Psychological Bulletin 95(1984)156–171.
C.J. Huberty, Applied Discriminant Analysis, Wiley, New York, 1994.
T. Ibaraki and S. Muroga, Adaptive linear classifier by linear programming, IEEE Transactions on Systems, Science and Cybernetics SSC-6(1970)53–62.
J.P. Ignizio, Linear Programming in Single and Multiple Objective Systems, Prentice-Hall, Englewood Cliffs, NJ, 1982.
B.A. Jain, and B.N. Nag, Artificial neural network models for pricing initial public offerings, Decision Sciences 26(1995)283–302.
E.A. Joachimsthaler and A. Stam, Four approaches to the classification problem in discriminant analysis: An experimental study, Decision Sciences 19(1988)322–333.
E.A. Joachimsthaler and A. Stam, Mathematical programming approaches for the classification problem in two-group discriminant analysis, Multivariate Behavioral Research 25(1990)427–454.
M.W. Kattan and J.R. Beck, Artificial neural networks for medical classification decisions, Archives of Pathology and Laboratory Medicine 119(1995)672–677.
G.J. Koehler, Characterization of unacceptable solutions in LP discriminant analysis, Decision Sciences 20(1989)239–257.
G.J. Koehler, Unacceptable solutions and the hybrid discriminant model, Decision Sciences 20 (1989)844–848.
G.J. Koehler, Considerations for mathematical programming models in discriminant analysis, Managerial and Decision Economics 11(1990)227–234.
G.J. Koehler, Improper linear discriminant classifiers, European Journal of Operational Research 50(1991)188–198.
G.J. Koehler, Linear discriminant functions determined by genetic search, ORSA Journal on Computing 3(1991)345–357.
G.J. Koehler, A response to Xiao's “Necessary and sufficient conditions of unacceptable solutions in LP discriminant analysis”: Something is amiss, Decision Sciences 25(1994)331–333.
G.J. Koehler and S.S. Erenguc, Minimizing misclassifications in linear discriminant analysis, Decision Sciences 21(1990)63–85.
R. Koenker and S. Portnoy, L-estimation for linear models, Journal of the American Statistical Association 82(1987)851–857.
J.S. Koford and G.F. Groner, The use of an adaptive threshold element to design a linear optimal pattern classifier, IEEE Transactions on Information Theory IT-12(1966)42–50.
W.J. Krzanowski, Principles of Multivariate Analysis, Clarendon Press, Oxford, England, 1988.
P.A. Lachenbruch, C. Sneeringer and L.T. Revo, Robustness of the linear and quadratic discriminant function to certain types of non-normality, Communications in Statistics 1(1973)39–57.
K.F. Lam and E.U. Choo, Software package for linear programming in classification problems, Faculty of Business Administration, Simon Fraser University, Burnaby, BC, Canada, 1991.
K.F. Lam and E.U. Choo, A linear goal programming model for classification with non-monotonic attributes, Computers and Operations Research 20(1993)403–408.
K.F. Lam, E.U. Choo and W.C. Wedley, Linear goal programming in estimation of classification probability, European Journal of Operational Research 67(1993)101–110.
K.F. Lam, E.U. Choo and J.W. Moy, Minimizing deviations from group mean: A new linear programming approach for the classification problem, European Journal of Operational Research 88(1996)358–367.
K.-N. Lau and G.V. Post, A note on discriminant analysis using LAD, Decision Sciences 23(1992) 260–265.
K.D. Lawrence and J.L. Arthur (eds.), Robust Regression: Analysis and Application, Marcel Dekker, New York, 1990.
C.K. Lee and J.K. Ord, Discriminant analysis using least absolute deviations, Decision Sciences 21(1990)86–176.
J.M. Liitschwager and C. Wang, Integer programming solution of a classification problem, Management Science 24(1978)1515–1525.
R.P. Lippmann, Pattern classification using neural networks, IEEE Communication Magazine 27 (1989)47–64.
M.A. Mahmood and E.C. Lawrence, A performance analysis of parametric and nonparametric discriminant approaches to business decision making, Decision Sciences 18(1987)308–326.
O.L. Mangasarian, Linear and nonlinear separation of patterns by linear programming, Operations Research 13(1965)444–452.
O.L. Mangasarian, Multi-surface method of pattern separation, IEEE Transactions on Information Theory IT-14(1968)801–807.
O.L. Mangasarian, Mathematical programming in neural networks, ORSA Journal on Computing 5(1993)349–360.
P. Marcotte, G. Marquis and G. Savard, A new implicit enumeration scheme for the discriminant analysis problem, Computers and Operations Research 22(1995)625–639.
P. Marcotte and G. Savard, Novel approaches to the discriminant problem, Zeitschrift für Operations Research 36(1991)517–545.
I.S. Markham and C.T. Ragsdale, Combining neural networks and statistical predictions to solve the classification problem in discriminant analysis, Decision Sciences 26(1995)229–242.
C.A. Markowski, On the balancing of error rates for LP discriminant methods, Managerial and Decision Economics 11(1990)235–241.
C.A. Markowski and E.P. Markowski, An experimental comparison of the discriminant problem with both qualitative and quantitative variables, European Journal of Operational Research 28 (1987)74–78.
C.A. Markowski, An adaptive statistical method for the discriminant problem, European Journal of Operational Research 73(1994)480–486.
E.P. Markowski and C.A. Markowski, Some difficulties and improvements in applying linear programming formulations to the discriminant problem, Decision Sciences 16(1985)237–247.
G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, Wiley, New York, 1992.
R.C. Minnick, Linear-input logic, IEEE Transactions on Electronics and Computers EC-10(1961) 6–16.
D.F. Morrison, Multivariate Statistical Methods, 3rd ed., McGraw-Hill, New York, 1990.
S. Mukhopadhyay, A. Roy, L.S. Kim and S. Govil, A polynomial time algorithm for generating neural networks for pattern classification — its stability properties and some test results, Neural Computing 5(1993)225–238.
S.C. Narula and J.F. Wellington, The minimum sum of absolute error regressions: A state of the art review, International Statistical Review 50(1982)317–326.
S.C. Narula and J.F. Wellington, On the robustness of the simple linear minimum sum of absolute errors regression, in: Robust Regression: Analysis and Application, K.D. Lawrence and J.L. Arthur, eds., Marcel Dekker, New York, 1990, pp. 129–141.
R. Nath, Estimation of misclassification probabilities in the linear programming approaches to the two-group discriminant problem, Decision Sciences 15(1984)248–252.
R. Nath, W.M. Jackson and T.W. Jones, A comparison of the classical and the linear programming approaches to the classification problem in discriminant analysis, Journal of Statistical Computation and Simulation 41(1992)73–93.
R. Nath and T.W. Jones, A variable selection criterion in the linear programming approaches to discriminant analysis, Decision Sciences 19(1988)554–563.
E. Patuwo, M.Y. Hu and M.S. Hung, Two-group classification using neural networks, Decision Sciences 24(1993)825–845.
R. Pavur and C. Loucopoulos, Examining optimal criterion weights in mixed integer programming approaches to the multiple-group classification problem, Journal of the Operational Research Society 46(1995)626–640.
S.J. Press and S. Wilson, Choosing between logistic regression and discriminant analysis, Journal of the American Statistical Association 73(1978)699–705.
R.R. Rawlings, V.B. Faden, B.I. Graubhard and M.J. Eckardt, A study of discriminant analysis techniques applied to multivariate lognormal data, Journal of Statistical Computing and Simulation 26(1986)79–100.
C.T. Ragsdale and A. Stam, Mathematical programming formulations for the discriminant problem: An old dog does new tricks, Decision Sciences 22(1991)296–307.
J. Remme, J.D.F. Habbema and J. Hermans, A simulative comparison of linear, quadratic and kernel discrimination, Journal of Statistical Computing and Simulation 11(1980)87–106.
A. Roy, S. Govil and R. Miranda, An algorithm to generate radial basis function (RBF)-like nets for classification problems, Neural Networks 8(1995)179–202.
A. Roy and S. Mukhopadhyay, Pattern classification using linear programming, ORSA Journal on Computing 3(1990)66–80.
P.A. Rubin, Separation failure in linear programming discriminant models, Decision Sciences 22 (1989)519–535.
P.A. Rubin, Evaluating the maximize minimum distance formulation of the linear discriminant problem, European Journal of Operational Research 41(1989)240–248.
P.A. Rubin, Heuristic solution procedures for a mixed-integer programming discriminant model, Managerial and Decision Economics 11(1990)255–266.
P.A. Rubin, A comparison of linear programming and parametric approaches to the two-group discriminant problem, Decision Sciences 21(1990)373–386.
P.A. Rubin, A comment regarding polynomial discriminant analysis, European Journal of Operational Research 72(1994)29–31.
D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning internal representations by error propagation, in: Parallel Distributed Processing, Vol. 1: Foundations, D.E. Rumelhart, J.L. McClelland and the PDP Group, eds., MIT Press, Cambridge, MA, 1986, pp. 318–362.
D.E. Rumelhart, J.L. McClelland and the PDP Group (eds.), Parallel Distributed Processing, Vol. 1: Foundations, MIT Press, Cambridge, MA, 1986.
B. Rypley, Neural networks and related methods for classification, Journal of the Royal Statistical Society Series B 56(1994)409–456.
SAS Institute, SAS/STAT User's Guide, Version 6, 1st ed., SAS Institute, Inc., Cary, NC, 1990.
C.A.B. Smith, Some examples of discrimination, Annals of Eugenics 13(1947)272–282.
F.W. Smith, Pattern classifier design by linear programming, IEEE Transactions on Computers C-17(1968)367–372.
R.C. Soltysik and P.R. Yarnold, ODA 1.0: Optimal Discriminant Analysis for DOS, Optimal Data Analysis, Chicago, IL, 1993.
R.C. Soltysik and P.R. Yarnold, The Warmack—Gonzalez algorithm for linear two-category multivariate optimal discriminant analysis, Computers and Operations Research 21(1994)735–745.
D.J. Spiegelhalter and R.P. Knill-Jones, Statistical and knowledge-based approaches to clinical decision-support systems, with an application to gastroenterology, Journal of the Royal Statistical Society Series A 147, Part 1 (1984)35–77.
V.A. Sposito, Some properties of L p -estimators, in: Robust Regression: Analysis and Application, K.D. Lawrence and J.L. Arthur, eds., Marcel Dekker, New York, 1990, pp. 23–58.
SPSS, Statistical Package for the Social Sciences, SPSS 6.1 Base System, Users Guide, Parts 1 and 2, Prentice-Hall, Englewood Cliffs, NJ, 1990.
V. Srinivasan and Y.H. Kim, Credit granting: A comparative analysis of classification procedures, Journal of Finance 42(1987)665–683.
V. Srinivasan and A. Shocker, Linear programming techniques for multi-dimensional analysis of preferences, Psychometrica 38(1973)337–369.
A. Stam, Extensions of mathematical programming-based classification rules: A multicriteria approach, European Journal of Operational Research 48(1990)351–361.
A. Stam and E.A. Joachimsthaler, Solving the classification problem in discriminant analysis via linear and nonlinear programming methods, Decision Sciences 20(1989)285–293.
A. Stam and E.A. Joachimsthaler, A comparison of a robust mixed-integer approach to existing methods for establishing classification rules for the discriminant problem, European Journal of Operational Research 46(1990)113–120.
A. Stam and D.G. Jones, Classification performance of mathematical programming techniques in discriminant analysis: Results for small and medium sample sizes, Managerial and Decision Economics 11(1990)243–253.
A. Stam and C.T. Ragsdale, On the classification gap in MP-based approaches to the discriminant problem, Naval Research Logistics 39(1992)545–559.
A. Stam and D.R. Ungar, RAGNU: A microcomputer package for two-group mathematical programming-based nonparametric classification, European Journal of Operational Research 86 (1995)374–388.
M.J. Strube, Optimal data analysis (ODA 1.0), OR/MS Today (October 1994)45–46.
V. Subramanian, M.S. Hung and M.Y. Hu, An experimental evaluation of neural networks for classification, Computers and Operations Research 20(1993)769–782.
K.Y. Tam and M.Y. Kiang, Managerial applications of neural networks: The case of bank failure predictions, Management Science 38(1992)926–947.
A. Wald, Contributions to the theory of statistical estimation and testing hypothesis, Annals of Mathematical Statistics 10(1939)299–326.
A. Wald, Statistical decision functions, Annals of Mathematical Statistics 20(1949)165–205.
P. Wanarat and R. Pavur, Examining the effect of second-order terms in mathematical programming approaches to the classification problem, European Journal of Operational Research, to appear.
S. Wang, Self-organizing approach to managerial nonlinear discriminant analysis: A hybrid method of linear discriminant analysis and neural networks, INFORMS Journal of Computing 8(1996) 118–124.
R.E. Warmack and R.C. Gonzalez, An algorithm for the optimal solution of linear inequalities and its application to pattern recognition, IEEE Transactions on Computers C-22(1973)1065–1075.
P.D. Wasserman, Neural Computing, Theory and Practice, Van Nostrand Reinhold, New York, 1989.
B. Xiao, Necessary and sufficient conditions of unacceptable solutions in LP discriminant analysis, Decision Sciences 24(1993)699–712.
B. Xiao, Necessary and sufficient conditions for unacceptable solutions in NLP discriminant analysis, European Journal of Operational Research 77(1994)404–412.
B. Xiao, Decision power and solutions of LP discriminant models: Rejoinder, Decision Sciences 25(1994)335–336.
P.R. Yarnold, Discriminating geriatric and non-geriatric patients using functional status information: An example of classification tree analysis via UniODA, Educational and Psychological Measurement 66(1996)656–667.
P.R. Yarnold, L.A. Hart and R.C. Soltysik, Optimizing the classification performance of logistic regression and Fisher's discriminant analysis, Educational and Psychological Measurement 54 (1994)73–85.
P.R. Yarnold and R.C. Soltysik, Theoretical distributions of optima for univariate discrimination of random data, Decision Sciences 22(1991)739–752.
P.R. Yarnold and R.C. Soltysik, Refining two-group multivariable classification models using univariate optimal discriminant analysis, Decision Sciences 22(1991)1158–1164.
P.R. Yarnold, R.C. Soltysik and G.J. Martin, Heart rate variability and susceptibility for sudden cardiac death: An example of multivariable optimal discriminant analysis, Statistics in Medicine 13(1994)1015–1021.
P.R. Yarnold, R.C. Soltysik, W.C. McCormick, R. Burns, E.H.B. Lin, T. Bush and G.J. Martin, Application of multivariable optimal discriminant analysis in general internal medicine, Journal of General Internal Medicine 10(1995)601–606.
Y. Yoon, G. Swales and T.M. Margavio, A comparison of discriminant analysis versus artificial neural networks, Journal of the Operational Research Society 44(1993)51–60.
Rights and permissions
About this article
Cite this article
Stam, A. Nontraditional approaches to statistical classification: Some perspectives on L_p-norm methods. Annals of Operations Research 74, 1–36 (1997). https://doi.org/10.1023/A:1018958001886
Issue Date:
DOI: https://doi.org/10.1023/A:1018958001886