Article

Data mining with sparse grids using simplicial basis functions

Authors:
Jochen Garcke

Universität Bonn, Wegelerstraβe 6, D-53115 Bonn, Germany

Universität Bonn, Wegelerstraβe 6, D-53115 Bonn, Germany
View Profile

,
Michael Griebel

Universität Bonn, Wegelerstraβe 6, D-53115 Bonn, Germany

Universität Bonn, Wegelerstraβe 6, D-53115 Bonn, Germany
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 87–96https://doi.org/10.1145/502512.502528

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 87–96

ABSTRACT

Recently we presented a new approach [18] to the classification problem arising in data mining. It is based on the regularization network approach but, in contrast to other methods which employ ansatz functions associated to data points, we use a grid in the usually high-dimensional feature space for the minimization process. To cope with the curse of dimensionality, we employ sparse grids [49]. Thus, only O(h_n^-1n^d-1) instead of O(h_n^-d) grid points and unknowns are involved. Here d denotes the dimension of the feature space and h_n = 2^-n gives the mesh size. We use the sparse grid combination technique [28] where the classification problem is discretized and solved on a sequence of conventional grids with uniform mesh sizes in each dimension. The sparse grid solution is then obtained by linear combination. In contrast to our former work, where d-linear functions were used, we now apply linear basis functions based on a simplicial discretization. This allows to handle more dimensions and the algorithm needs less operations per data point.We describe the sparse grid combination technique for the classification problem, give implementational details and discuss the complexity of the algorithm. It turns out that the method scales linearly with the number of given data points. Finally we report on the quality of the classifier built by our new method on data sets with up to 10 dimensions. It turns out that our new method achieves correctness rates which are competitive to that of the best existing methods.

References

1.E. Arge, M. Dmhlen, and A. Tveito. Approximation of scattered data using smooth grid functions. J. Comput. Appl. Math, 59:191-205, 1995. Google ScholarDigital Library
2.R. Balder. Adaptive Verfahren fiat elliptische und parabolische Differentialgleichungen auf diinnen Gittern. Dissertation, Technische Universitait Miinchen, 1994.Google Scholar
3.G. Baszenski. N -t h order polynomial spline blending. In W. Schempp and K. Zeller, editors, Multivariate Approximation Theory III, ISNM 75, pages 35-46. Birkhaiuser, Basel, 1985.Google Scholar
4.S. D. Bay. The UCI KDD archive. http://kdd.ics.uci.edu, 1999.Google Scholar
5.M. J. A. Berry and G. S. Linoff. Mastering Data Mining. Wiley, 2000.Google Scholar
6.C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/amlearn/MLRepository.html.Google Scholar
7.D. Botkin, J. Janak, and J. Wallis. Some ecological consequences of a computer model of forest growth. J. Ecology, 60:849-872, 1972.Google ScholarCross Ref
8.H.-J. Bungartz. Diinne Gitter und deren Anwendun 9 bei der adaptiven LSsung der dreidimensionalen Poisson-Gleichung. Dissertation, Institut ffir Informatik, Technische Universitait Miinchen, 1992.Google Scholar
9.H.-J. Bungartz, T. Dornseifer, and C. Zenger. Tensor product approximation spaces for the efficient numerical solution of partial differential equations. In Proc. Int. Workshop on Scientific Computations, Konya, 1996. Nova Science Publishers, 1997.Google Scholar
10.H.-J. Bungartz, M. Griebel, D. RSschke, and C. Zenger. Pointwise convergence of the combination technique for the Laplace equation. East-West J. Numer. Math., 2:21-45, 1994. also as SFB-Bericht 342/16/93A, Institut fiir Informatik, TU Miinchen, 1993.Google Scholar
11.V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods. John Wiley & Sons, 1998. Google ScholarDigital Library
12.K. Cios, W. Pedrycz, and R. Swiniarski. Data Mining Methods for Knowledge Discovery. Kluwer, 1998. Google ScholarDigital Library
13.T. G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895-1924, 1998. Google ScholarDigital Library
14.K. Frank, S. Heinrich, and S. Pereverzev. Information Complexity of Multivariate Fredholm Integral Equations in Sobolev Classes. J. of Complexity, 12:17-34, 1996. Google ScholarDigital Library
15.H. Freudenthal. Simplizialzerlegungen yon beschra.nkter Flachheit. Annals of Mathematics, 43:580-582, 1942.Google ScholarCross Ref
16.J. Garcke and M. Griebel. On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique. Journal of Computational Physics, 165(2):694-716, 2000. also as SFB 256 Preprint 670, Institut fiir Angewandte Mathematik, Universitait Bonn, 2000. Google ScholarDigital Library
17.J. Garcke and M. Griebel. On the parallelization of the sparse grid approach for data mining. SFB 256 Preprint 721, Universitgt Bonn, 2001. http://wissrech.iam.unibonn.de/research/pub/garcke/psm.pdf.Google Scholar
18.J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. 2000. Submitted, also as SFB 256 Preprint 675, Institut fiir Angewandte Mathematik, Universitait Bonn, 2000.Google Scholar
19.T. Gerstner and M. Griebel. Numerical Integration using Sparse Grids. Numer. Algorithms, 18:209-232, 1998. (also as SFB 256 preprint 553, Univ. Bonn, 1998).Google ScholarCross Ref
20.F. Girosi. An equivalence between sparse approximation and support vector machines. Neural Computation, 10(6):1455-1480a 1998. Google ScholarDigital Library
21.F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7:219-265, 1995. Google ScholarDigital Library
22.G. Golub, M. Heath, and G. Wahba. Generalized cross validation as a method for choosing a good ridge parameter. Technometrics, 21:215-224, 1979.Google ScholarCross Ref
23.M. Griebel. The combination technique for the sparse grid solution of PDEs on multiprocessor machines. Parallel Processing Letters, 2(1):61-70, 1992. also as SFB Bericht 342/14/91 A, Institut fiir Informatik, TU Mfinchen, 1991.Google ScholarCross Ref
24.M. Griebel. Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing, 61(2):151-179, 1998. also as Proceedings Large-Scale Scientific Computations of Engineering and Environmental Problems, 7. June - 11. June, 1997, Varna, Bulgaria, Notes on Numerical Fluid Mechanics 62, Vieweg-Verlag, Braunschweig, M. Griebel, 0. Iliev, S. Margenov and P. Vassilevski (editors). Google ScholarDigital Library
25.M. Griebel, W. Huber, T. Stortkuhl, and C. Zenger. On the parallel solution of 3D PDEs on a network of workstations and on vector computers. In A. Bode and M. D. Cin, editors, Parallel Computer Architectures: Theory, Hardware, Software, Applications, volume 732 of Lecture Notes in Computer Science, pages 276-291. Springer Verlag, 1993. Google ScholarDigital Library
26.M. Griebel and S. Knapek. Optimized tensor-product approximation spaces. Constructive Approximation, 16(4):525-540, 2000.Google ScholarCross Ref
27.M. Griebel, P. Oswald, and T. Schiekofer. Sparse grids for boundary integral equations. Numer. Mathematik, 83(2):279-312, 1999. also as SFB 256 report 554, Universitat Bonn.Google ScholarCross Ref
28.M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263-281. IMACS, Elsevier, North Holland, 1992. also as SFB Bericht, 342/19/90 A, Institut fiir Informatik, TU Miinchen, 1990.Google Scholar
29.M. Griebel and V. Thurner. The efficient solution of fluid dynamics problems by the combination technique. Int. J. Num. Meth. for Heat and Fluid Flow, 5(3):251-269, 1995. also as SFB Bericht 342/1/93 A, Institut fiir Informatik. TU Miinchen, 1993.Google ScholarCross Ref
30.M. Hegland, O. M. Nielsen, and Z. Shen. High dimensional smoothing based on multilevel analysis. Technical report, Data Mining Group, The Australian National University, Canberra, November 2000. Submitted to SIAM J. Scientific Computing.Google Scholar
31.J. Hoschek and D. Lasser. Grundlagen der goemetrischen Datenverarbeitung, chapter 9. Teubner, 1992.Google Scholar
32.H. W. Kuhn. Some combinatorial lemmas in topology. IBM J. Res. Develop., 4:518-524, 1960.Google ScholarDigital Library
33.Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine for classification. Computational Optimization and Applications, 20(1), 2001. to appear. Google ScholarDigital Library
34.G. Melli. Datgen: A program that creates structured data. Website. http://www.datasetgenerator.com.Google Scholar
35.W. D. Penny and S. J. Roberts. Bayesian neural networks for classification: how useful is the evidence framework ? Neural Networks, 12:877-892, 1999. Google ScholarDigital Library
36.B. D. Ripley. Neural networks and related methods for classification. Journal of the Royal Statistical Society B, 56(3):409-456, 1994.Google Scholar
37.S. L. Salzberg. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1:317-327, 1997. Google ScholarDigital Library
38.T. Schiekofer. Die Methode der Finiten Differenzen auf diinnen Gittern zur L5sung elliptischer und parabolischer partieller Differentialgleichungen. Doktorarbeit, Institut fiir Angewandte Mathematik, Universitait Bonn, 1999.Google Scholar
39.W. Sickel and F. Sprengel. Interpolation on sparse grids and Nikol'skij-Besov spaces of dominating mixed smoothness. J. Comput. Anal. Appl., 1:263-288, 1999.Google Scholar
40.S. Singh. 2d spiral pattern recognition with possibilistic measures. Pattern Recognition Letters, 19(2):141-147, 1998. Google ScholarDigital Library
41.S. A. Smolyak. Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl. Akad. Nauk SSSR, 148:1042-1043, 1963. Russian, Engl. Transl.: Soviet Math. Dokl. 4:240-243, 1963.Google Scholar
42.V. N. Temlyakov. Approximation of functions with bounded mixed derivative. Proc. Steklov Inst. Math,, 1, 1989.Google Scholar
43.A. N. Tikhonov and V. A. Arsenin. Solutios of ill-posed problems. W.H. Winston, Washington D.C., 1977.Google Scholar
44.F. Utreras. Cross-validation techniques for smoothing spline functions in one or two dimensions. In T. Gasser and M. Rosenblatt, editors, Smoothing techniques for curve estimation, pages 196-231. Springer-Verlag, Heidelberg, 1979.Google Scholar
45.V. N. Vapnik. Estimation of dependences based on empirical data. Springer-Verlag, Berlin, 1982. Google ScholarDigital Library
46.V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995. Google ScholarDigital Library
47.G. Wahba. Spline models for observational data, volume 59 of Series in Applied Mathematics. SIAM, Philadelphia, 1990.Google Scholar
48.A. Wieland. Spiral data set. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/airepository/ai/areas/neural/bench/cmu/0.html.Google Scholar
49.C. Zenger. Sparse grids. In W. Hackbusch, editor, Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, 1990, volume 31 of Notes on Num. Fluid Mech. Vieweg-Vcrlag, 1991.Google Scholar

Index Terms

Data mining with sparse grids using simplicial basis functions

Recommendations

Classification with sparse grids using simplicial basis functions

Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods, which employ ansatz functions associated to data points, we use a grid in ...
Read More
On the Parallelization of the Sparse Grid Approach for Data Mining
LSSC '01: Proceedings of the Third International Conference on Large-Scale Scientific Computing-Revised Papers

Recently we presented a new approach [5, 6] to the classification problem arising in data mining. It is based on the regularization network approach, but in contrast to other methods which employ ansatz functions associated to data points, we use basis ...
Read More
Data mining with sparse grids
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
approximation
classification
combination technique
data mining
simplicial discretization
sparse grids
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 820
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data mining with sparse grids using simplicial basis functions

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Classification with sparse grids using simplicial basis functions

On the Parallelization of the Sparse Grid Approach for Data Mining

Data mining with sparse grids