skip to main content
10.1145/502512acmconferencesBook PagePublication PageskddConference Proceedingsconference-collections
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ACM2001 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
KDD01: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining San Francisco California August 26 - 29, 2001
ISBN:
978-1-58113-391-2
Published:
26 August 2001
Sponsors:
SIGMOD, SIGKDD, AAAI
Next Conference
Bibliometrics
Skip Abstract Section
Abstract

The end of the 20th century saw an explosion of investment in connecting computer systems--within organizations, between organizations, and between organizations and individuals--and a corresponding explosion in the on-line collection of data. Now that we have entered the 21st century we face the problem of extracting useful knowledge from these data, which is becoming increasingly difficult as volume and complexity push traditional analysis methods beyond their limits. Knowledge Discovery and Data Mining (KDD) techniques address this problem. The annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining brings together researchers and practitioners focusing on new developments and challenges in KDD. KDD-2001, the seventh conference in the series, was held in San Francisco, on August 26-29, 2001.We received 203 research-paper submissions from twenty-three countries. Each submitted research paper was reviewed by at least three members of the program committee. This period of independent review was followed by discussion among the reviewers, and when necessary we requested additional reviews from other experts. Twenty papers were selected to appear in the program as full papers (10%), and another thirty-two were selected to appear in the program as poster papers (16%). The Industry Track received thirty-four submissions, from which eleven (32%) were selected. In addition, the Program Committee referred three research papers to the Industry Track, of which one was selected. For the Industry Track, papers were selected because they presented useful knowledge for practitioners, or because they bridged a gap between industry and research.The program for KDD-2001 also included three keynote lectures, five invited talks by well-known practitioners (as part of the Industry Track), and three panel discussions on topics of current interest. There were six tutorials, geared both for novices and for experts, plus six specialized workshops on cutting-edge research issues. The 2001 KDDCUP competition focused on problems of bioinformatics and drug design. And, finally, the program included dozens of exhibits of products from vendors and from research projects.

Article
Challenges for knowledge discovery in biology

Bioinformatics is the study of information flow in biology. Interest in the field has exploded in the last 10 years with the emergence of techniques for large scale experimental data collection-including genome sequencing, gene expression analysis, ...

Article
Extracting targeted data from the web

Tom M. Mitchell is author of the textbook "Machine Learning" (McGraw Hill, 1997), President of the American Association for Artificial Intelligence and a member of the National Research Council's Computer Science and Telecommunications Board. He is Vice ...

Article
Mass collaboration and data mining

Mass Collaboration is a new "P2P"-style approach to large-scale knowledge sharing, with applications in customer support, focused community development, and capturing knowledge distributed within large organizations. Effectively supporting this paradigm ...

Article
Applications of generalized support vector machines to predictive modeling

The work of the Russian mathematician Vladimir Vapnik (AT&T Labs) enables us to go back to the roots of theoretical statistics, leaving behind Fisher's parameters in favor of the general approaches started in the 1930s by Glivenko-Cantelli-Kolmogorov. ...

Article
Data mining: are we there yet?

Data mining started its move out of the statistics and machine learning ghettos and into the mainstream almost 10 years ago. With great fanfare and a large influx of venture capital, data mining was going to change the very nature of business. Yet data ...

Article
Mining e-commerce data: the good, the bad, and the ugly

Organizations conducting Electronic Commerce (e-commerce) can greatly benefit from the insight that data mining of transactional and clickstream data provides. Such insight helps not only to improve the electronic channel (e.g., a web site), but it is ...

Article
Data mining platform for database developers
Article
Recommender systems in commerce and community

Recommender systems have been revolutionizing the way shoppers and information seekers find what they want. We will study some of the tremendous successes and spectacular failures of recommenders in E-commerce to understand the causes of the success or ...

Article
The "DGX" distribution for mining massive, skewed data

Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability distribution, the Discrete Gaussian Exponential (DGX), to achieve excellent ...

Article
Data mining criteria for tree-based regression and classification

This paper is concerned with the construction of regression and classification trees that are more adapted to data mining applications than conventional trees. To this end, we propose new splitting criteria for growing trees. Conventional splitting ...

Article
Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction

Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual page-requests at Web sites. Profiling consists of using ...

Article
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces

The similarity join is an important operation for mining high-dimensional feature spaces. Given two data sets, the similarity join computes all tuples (x, y) that are within a distance ε.One of the most efficient algorithms for processing similarity-...

Article
Mining the network value of customers

One of the major applications of data mining is in helping companies determine which potential customers to market to. If the expected profit from a customer is greater than the cost of marketing to her, the marketing action for that customer is ...

Article
Empirical bayes screening for multi-item associations

This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. In our case, "unusually frequent" involves estimates of the frequency of each ...

Article
Proximal support vector machine classifiers

Instead of a standard support vector machine (SVM) that classifies points by assigning them to one of two disjoint half-spaces, points are classified by assigning them to the closest of two parallel planes (in input or feature space) that are pushed ...

Article
Data mining with sparse grids using simplicial basis functions

Recently we presented a new approach [18] to the classification problem arising in data mining. It is based on the regularization network approach but, in contrast to other methods which employ ansatz functions associated to data points, we use a grid ...

Article
Mining time-changing data streams

Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months ...

Article
Visualizing multi-dimensional clusters, trends, and outliers using star coordinates

Interactive visualizations are effective tools in mining scientific, engineering, and business data to support decision-making activities. Star Coordinates is proposed as a new multi-dimensional visualization technique, which supports various ...

Article
Ensemble-index: a new approach to indexing large databases

The problem of similarity search (query-by-content) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The most promising solutions involve performing dimensionality reduction on ...

Article
Robust space transformations for distance-based operations

For many KDD operations, such as nearest neighbor search, distance-based clustering, and outlier detection, there is an underlying κ-D data space in which each tuple/object is represented as a point in the space. In the presence of differing scales, ...

Article
Molecular feature mining in HIV data

We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 ...

Article
Discovering unexpected information from your competitors' web sites

Ever since the beginning of the Web, finding useful information from the Web has been an important problem. Existing approaches include keyword-based search, wrapper-based information extraction, Web query and user preferences. These approaches ...

Article
Personalization from incomplete data: what you don't know can hurt

Clickstream data collected at any web site (site-centric data) is inherently incomplete, since it does not capture users' browsing behavior across sites (user-centric data). Hence, models learned from such data may be subject to limitations, the nature ...

Article
Probabilistic query models for transaction data

We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying problem: the query selectivity estimation (i.e., finding ...

Article
Extracting collective probabilistic forecasts from web games

Game sites on the World Wide Web draw people from around the world with specialized interests, skills, and knowledge. Data from the games often reflects the players' expertise and will to win. We extract probabilistic forecasts from data obtained from ...

Article
Tri-plots: scalable tools for multidimensional data mining

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What ...

Article
Efficient discovery of error-tolerant frequent itemsets in high dimensions

We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (...

Article
Learning and making decisions when costs and probabilities are both unknown

In many data mining domains, misclassification costs are different for different examples, in the same way that class membership probabilities are example-dependent. In these domains, both costs and probabilities are unknown for test examples, so both ...

Article
Data mining case study: modeling the behavior of offenders who commit serious sexual assaults

This paper looks at the use of a Self Organizing Map (SOM), to link of records of crimes of serious sexual attacks. Once linked a profile can be derived of the offender(s) responsible.The data was drawn from the major crimes database at the National ...

Article
A human-computer cooperative system for effective high dimensional clustering

High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Therefore, techniques have recently been proposed to find clusters in hidden subspaces of the data. However, since the behavior ...

Contributors
  • IBM Research - Almaden
  • Leonard N. Stern School of Business
  • Google LLC

Recommendations

Acceptance Rates

KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
YearSubmittedAcceptedRate
KDD '191,2001109%
KDD '1898310711%
KDD '17748649%
KDD '161,115666%
KDD '1581916020%
KDD '141,03615115%
KDD '1372612517%
KDD '0859311820%
KDD '0757311119%
KDD '032984615%
KDD '023074414%
KDD '012373113%
Overall8,6351,13313%