nach oben

1999 | Buch

Kapitel lesen Erstes Kapitel lesen

Discovery Science

Second International Conference, DS’99 Tokyo, Japan, December 6–8, 1999 Proceedings

herausgegeben von: Setsuo Arikawa, Koichi Furukawa

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Invited Papers

The Melting Pot of Automated Discovery: Principles for a New Science

After two decades of research on automated discovery, many principles are shaping up as a foundation of discovery science. In this paper we view discovery science as automation of discovery by systems who autonomously discover knowledge and a theory for such systems. We start by clarifying the notion of discovery by automated agent. Then we present a number of principles and discuss the ways in which different principles can be used together. Further augmented, a set of principles shall become a theory of discovery which can explain discovery systems and guide their construction. We make links between the principles of automated discovery and disciplines which have close relations with discovery science, such as natural sciences, logic, philosophy of science and theory of knowledge, artificial intelligence, statistics, and machine learning.

Jan M. Żytkow

Expressive Probability Models in Science

The paper is a brief summary of an invited talk given at the Discovery Science conference. The principal points are as follows: first, that probability theory forms the basis for connecting hypotheses and data; second, that the expressive power of the probability models used in scientific theory formation has expanded significantly; and finally, that still further expansion is required to tackle many problems of interest. This further expansion should combine probability theory with the expressive power of first-order logical languages. The paper sketches an approximate inference method for representation systems of this kind.

Stuart Russell

Contributed Papers

Weighted Majority Decision among Several Region Rules for Scientific Discovery

We consider the classification problem of how to predict the values of a categorical attribute of interest using the other numerical attributes in a given set of tuples. Decision by voting such as bagging and boosting attempts to enhance the existing classification techniques like decision trees by using a majority decision among them. However, a high accuracy ratio of prediction sometimes requires complicated predictors, and makes it hard to understand the simple laws affecting the values of the attribute of interest. We instead consider another approach of using of at most several fairly simple voters that can compete with complex prediction tools. We pursue this idea to handle numeric datasets and employ region splitting rules as relatively simple voters. The results of empirical tests show that the accuracy of decision by several voters is comparable to that of decision trees, and the computational cost is inexpensive.

Akihiro Nakaya, Hideharu Furukawa, Shinichi Morishita

CAEP: Classification by Aggregating Emerging Patterns

Emerging patterns (EPs) are itemsets whose supports change significantly from one dataset to another; they were recently proposed to capture multi-attribute contrasts between data classes, or trends over time. In this paper we propose a new classifier, CAEP, using the following main ideas based on EPs: (i) Each EP can sharply differentiate the class membership of a (possibly small) fraction of instances containing the EP, due to the big difference between its supports in the opposing classes; we define the differentiating power of the EP in terms of the supports and their ratio, on instances containing the EP. (ii) For each instance t, by aggregating the differentiating power of a fixed, automatically selected set of EPs, a score is obtained for each class. The scores for all classes are normalized and the largest score determines t’s class. CAEP is suitable for many applications, even those with large volumes of high (e.g. 45) dimensional data; it does not depend on dimension reduction on data; and it is usually equally accurate on all classes even if their populations are unbalanced. Experiments show that CAEP has consistent good predictive accuracy, and it almost always outperforms C4.5 and CBA. By using efficient, border-based algorithms (developed elsewhere) to discover EPs, CAEP scales up on data volume and dimensionality. Observing that accuracy on the whole dataset is too coarse description of classifiers, we also used a more accurate measure, sensitivity and precision, to better characterize the performance of classifiers. CAEP is also very good under this measure.

Guozhu Dong, Xiuzhen Zhang, Limsoon Wong, Jinyan Li

An Appropriate Abstraction for an Attribute-Oriented Induction

An attribute-oriented induction is a useful data mining method that generalizes databases under an appropriate abstraction hierarchy to extract meaningful knowledge. The hierarchy is well designed so as to exclude meaningless rules from a particular point of view. However, there may exist several ways of generalizing databases according to user’s intention. It is therefore important to provide a multi-layered abstraction hierarchy under which several generalizations are possible and are well controlled. In fact, too-general or too-specific databases are inappropriate for mining algorithms to extract significant rules. From this viewpoint, this paper proposes a generalization method based on an information theoretical measure to select an appropriate abstraction hierarchy. Furthermore, we present a system, called ITA (Information Theoretical Abstraction), based on our method and an attribute-oriented induction. We perform some practical experiments in which ITA discovers meaningful rules from a census database US Census Bureau and discuss the validity of ITA based on the experimental results.

Yoshimitsu Kudoh, Makoto Haraguchi

Collaborative Hypothesis Testing Processes by Interactive Production Systems

We have developed an interactive production system architecture to simulate collaborative hypothesis testing processes, using the Wason’s 2-4-6 task. In interactively solving situations two systems find a target, conducting experiments alternately. In independently solving situations, each of two systems finds a target without interaction. If the performance in the former situations exceeds in the latter situations, we approve of “emergence”. The primary results obtained from computer simulations in which hypothesis testing strategies were controlled are: (1) generally speaking collaboration neither provided the benefits of interaction nor caused emergence when only the experimental space was shared. (2) As the different degree of strategies was larger, the benefits of interaction increased. (3) The benefits came from complementary effects of interaction. That is, disadvantage of one system that used an ineffective strategy was supplemented by the other system that used an advantageous strategy. In a few cases we approved of emergence, the complementary interaction of two systems brought a supplementary ability of disconfirmation.

Kazuhisa Miwa

Computer Aided Discovery of User’s Hidden Interest for Query Restructuring

Most Internet users use Search Engines to get information on the WWW. However, users cannot be content with the output of search engines because it’s hard to express the user’s own interest as a search query. Therefore, we propose a system which provides users with keywords related to user interest. Users can recall keywords expressing user’s hidden (not described in a search query) interest as search keywords. Previous refinements are simple modifications of the search queries, for example, adding narrow keywords or relational keywords. However, the restructuring described in this paper means replacement of the search keywords to express user interest concretely.

Wataru Sunayama, Yukio Ohsawa, Masahiko Yachida

Iterative Naive Bayes

Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 25 benchmark datasets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

João Gama

Schema Design for Causal Law Mining from Incomplete Database

The paper describes the causal law mining from an incomplete database. First we extend the definition of association rules in order to deal with uncertain attribute values in records. As Agrawal’s well-know algorithm generates too many irrelevant association rules, a filtering technique based on minimal AIC principle is applied here. The graphic representation of association rules validated by a filter may have directed cycles. The authors propose a method to exclude useless rules with a stochastic test, and to construct Bayesian networks from the remaining rules. Finally, a schem for Causal Law Mining is proposed as an integration of the techniques described in the paper.

Kazunori Matsumoto, Kazuo Hashimoto

Design and Evaluation of an Environment to Automate the Construction of Inductive Applications

Here is presented CAMLET that is a platform for automatic composition of inductive applications using ontologies that specify inductive learning methods. CAMLET constructs inductive applications using process and object ontologies. After instantiating, compiling and executing the basic design specification, CAMLET refines the specification based on the following refinement strategies: crossover of control structures, random generation and process replacement by heuristic. Using fourteen different data sets form the UCI repository of ML databases and and the database on meningoencephalitis with human expert’s evaluation, experimental results have shown us that CAMLET supports a user in constructing inductive applications with better competence.

Akihiro Suyama, Naoya Negishi, Takahira Yamaguchi

Designing Views in HypothesisCreator: System for Assisting in Discovery

We discuss the significance of designing views on data in a computational system assisting scientists in the process of discovery. A view on data is considered as a particular way to interpret the data. In the scientific literature, devising a new view capturing the essence of data is a key to discovery. A system HypothesisCreator, which we have been developing to assist scientists in the process of discovery, supports users’ designing views on data and have the function of searching for good views on the data. In this paper we report a series of computational experiments on scientific data with HypothesisCreator and analyses of the produced hypotheses, some of which select several views good for explaining given data, searched and selected from over ten millions of designed views. Through these experiments we have convinced that view is one of the important factors in discovery process, and that discovery systems should have an ability of designing and selecting views on data in a systematic way so that experts on the data can employ their knowledge and thoughts efficiently for their purposes.

Osamu Maruyama, Tomoyuki Uchida, Kim Lan Sim, Satoru Miyano

Discovering Poetic Allusion in Anthologies of Classical Japanese Poems

Waka is a form of traditional Japanese poetry with a 1300-year history. In this paper we attempt to semi-automatically discover instances of poetic allusion, or more generally, to find similar poems in anthologies of WAKA poems. The key to success is how to define the similarity measure on poems. We first examine the existing similarity measures on strings, and then give a unifying framework that captures the essences of the measures. This framework makes it easy to design new measures appropriate to finding similar poems. Using the measures, we report successful results in finding poetic allusion between two anthologies Kokinshū and Shinkokinshū. Most interestingly, we have found an instance of poetic allusion that has never been pointed out in the long history of WAKA research.

Kouichi Tamari, Mayumi Yamasaki, Takuya Kida, Masayuki Takeda, Tomoko Fukuda, Ichirō Nanri

Characteristic Sets of Strings Common to Semi-structured Documents

We consider Maximum Agreement Problem which is, given positive and negative documents, to find a characteristic set that matches many of positive documents but rejects many of negative ones. A characteristic set is a sequence (x₁,...,x_d) of strings such that each x_i is a suffix of x_i+1 and all xi’s appear in a document without overlaps. A characteristic set matches semi-structured documents with primitives or user’s de_ned macros. For example, (“set”, “characteristic set”, “</title> characteristic set”) is a characteristic set extracted from an HTML file. But, an algorithm that solves Maximum Agreement Problem does not output useless characteristic sets, such as those made of only tags of HTML, since such characteristic sets may match most of positive documents but also match most of negative ones. We present an algorithm that, given an integer d which is the number of strings in a characteristic set, solves Maximum Agreement Problem in O(n₂h_d) time, where n is the total length of documents and h is the height of the su_x tree of the documents.

Daisuke Ikeda

Approximation of Optimal Two-Dimensional Association Rules for Categorical Attributes Using Semidefinite Programming

We consider the problem of finding two-dimensional association rules for categorical attributes. Suppose we have two conditional attributes A and B both of whose domains are categorical, and one binary target attribute whose domain is “positive”, “negative”. We want to split the Cartesian product of domains of A and B into two subsets so that a certain objective function is optimized, i.e., we want to find a good segmentation of the domains of A and B. We consider in this paper the objective function that maximizes the confidence under the constraint of the upper bound of the support size. We first prove that the problem is NP-hard, and then propose an approximation algorithm based on semidefinite programming. In order to evaluate the effectiveness and efficiency of the proposed algorithm, we carry out computational experiments for problem instances generated by real sales data consisting of attributes whose domain size is a few hundreds at maximum. Approximation ratios of the solutions obtained measured by comparing solutions for semidefinite programming relaxation range from 76% to 95%. It is observed that the performance of generated association rules are significantly superior to that of one-dimensional rules.

Katsuki Fujisawa, Yukinobu Hamuro, Naoki Katoh, Takeshi Tokuyama, Katsutoshi Yada

Data Mining of Generalized Association Rules Using a Method of Partial-Match Retrieval

This paper proposes an efficient method for data mining of generalized association rules on the basis of partial-match retrieval. A generalized association rule is derived from regularities of data patterns, which are found in the database under a given data hierarchy with enough frequencies. The pattern search is a central part of data mining of this type and occupies most of the running time. In this paper, we regard a data pattern as a partial-match query in partial-match retrieval then the pattern search becomes a problem to find partial-match queries of which answers include sufficient number of database records. The proposed method consists of a selective enumeration of candidate queries and an efficient partial-match retrieval using signatures. A signature, which is a bit sequence of fixed length, is associated with data, a record and a query. The answer for a query is fast computed by bit operations among the signatures. The proposed data mining method is realized based on an extended signature method that can deal with a data hierarchy. We also discuss design issues and mathematical properties of the method.

Kazunori Matsumoto, Takeo Hayase, Nobuyuki Ikeda

Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several researchers, random sampling is difficult to use due to the difficulty of determining an appropriate sample size. In this paper, we take a sequential sampling approach for solving this difficulty, and propose an adaptive sampling algorithm that solves a general problem covering many problems arising in applications of discovery science. The algorithm obtains examples sequentially in an on-line fashion, and it determines from the obtained examples whether it has already seen a large enough number of examples. Thus, sample size is not fixed a priori; instead, it adaptively depends on the situation. Due to this adaptiveness, if we are not in a worst case situation as fortunately happens in many practical applications, then we can solve the problem with a number of examples much smaller than the required in the worst case. For illustrating the generality of our approach, we also describe how different instantiations of it can be applied to scale up knowledge discovery problems that appear in several areas.

Carlos Domingo, Ricard Gavaldà, Osamu Watanabe

Scheduled Discovery of Exception Rules

This paper presents an algorithm for discovering pairs of an exception rule and a common sense rule under a prespecified schedule. An exception rule, which represents a regularity of exceptions to a common sense rule, often exhibits interestingness. Discovery of pairs of an exception rule and a common sense rule has been successful in various domains. In this method, however, both the number of discovered rules and time needed for discovery depend on the values of thresholds, and an appropriate choice of them requires expertise on the data set and on the discovery algorithm. In order to circumvent this problem, we propose two scheduling policies for updating values of these thresholds based on a novel data structure. The data structure consists of multiple balanced search-trees, and efficiently manages discovered patterns with multiple indices. One of the policies represents a full specification of up-dating by the user, and the other iteratively improves a threshold value by deleting the worst pattern with respect to its corresponding index. Preliminary results on four real-world data sets are highly promising. Our algorithm settled values of thresholds appropriately, and discovered interesting exception-rules from all these data sets.

Einoshin Suzuki

Learning in Constraint Databases

For several years, Inductive Logic Programming (ILP) has been developed into two main directions: on one hand, the classical symbolic framework of ILP has been extended to deal with numeric values and a few works have emerged, stating that an interesting domain for modeling symbolic and numeric values in ILP was Constraint Logic Programming; on the other hand, applications of ILP in the context of Data Mining have been developed, with the benefit that ILP systems were able to deal with databases composed of several relations.In this paper, we propose a new framework for learning, expressed in terms of Constraint Databases: from the point of view of ILP, it gives a uniform way to deal with symbolic/numeric values and it extends the classical framework by allowing the representation of infinite sets of positive/negative examples; from the point of view of Data Mining, it can be applied not only to relational databases, but also to spatial databases. A prototype has been implemented and experiments are currently in progress.

Teddy Turmeaux, Christel Vrain

Discover Risky Active Faults by Indexing an Earthquake Sequence

A Method for finding areas with the highest risks of near-future earthquakes, from data of observed past earthquakes, have been desired. The presented Fatal Fault Finder (F3) finds risky active faults by applying KeyGraph, which was presented as a document indexing algorithm, to a sequence of focal faults of earthquakes in stead of a document. This strategy of F3 is supported by analogies between a document and an earthquake sequence: The occurrences of words in a document and of earthquakes in a sequence have common causal structures, and KeyGraph previously indexed documents taking advantage of the causal structure in a document. As an effect, risky faults are obtained from an earthquake sequence, in a similar manner as keywords are obtained from a document, by KeyGraph. The empirically obtained risky faults by F3 corresponded finely with real earthquake occurrences and seismologists’ risk estimations.

Yukio Ohsawa, Masahiko Yachida

Machine Discovery Based on the Co-occurrence of References in a Search Engine

This paper describes a new method of discovering clusters of related Web pages. By clustering Web pages and visualizing them in the form of graph, users can easily access to related pages. Since related Web pages are often referred from the same Web page, the number of co-occurrence of references in a search engine is used for discovering relation among pages. Two URLs are given to a search engine as keywords, and the value of the number of pages searched from both URLs divided by the number of pages searched from either URL, which is called Jaccard coefficient, is calculated as the criteria for evaluating the relation between the two URLs. The value is used for deciding the length of an edge in a graph so that vertices of related pages will be located close to each other. Our system based on the method succeeds in discovering clusters of various genres, although the system does not interpret the contents of the pages. The method of calculating Jaccard coefficient is easily processed by computer systems, and it is suitable for the discovery from the data acquired through the internet.

Tsuyoshi Murata

Smoothness Prior Approach to Explore the Mean Structure in Large Time Series Data

This article is addressed to the problem of modeling and exploring time series with mean value structure of large scale time series data and time-space data. A smoothness priors modeling approach [11] is taken and applied to POS and GPS data. In this approach, the observed series are decomposed into several components each of which are expressed by smoothness priors models. In the analysis of POS and GPS data, various useful information were extracted by this decomposition, and result in some discoveries in these areas.

Genshiro Kitagawa, Tomoyuki Higuchi, Fumiyo N. Kondo

Automatic Detection of Geomagnetic Sudden Commencement Using Lifting Wavelet Filters

This paper proposes a method for detecting geomagnetic sudden commencement (SC) from a geomagnetic horizontal (H) component by using lifting wavelet filters. Lifting wavelet filters are biorthogonal wavelet filers containing free parameters. Our method is to learn such free parameters based on some training signals which contain the SC. The learnt wavelet filters have the feature of training signals. Applying such wavelet filters to the test signals, we can detect the time when SC phenomena occurred.

Shigeru Takano, Teruya Minamoto, Hiroki Arimura, Koichi Niijima, Toshihiko Iyemori, Tohru Araki

A Noise Resistant Model Inference System

Within the empirical ILP setting we propose a method of inducing definite programs from examples — even when those examples are incomplete and occasionally incorrect. This system, named NRMIS, is a top-down batch learner that can make use of intensional background knowledge and learn programs involving multiple target predicates. It consists of three components: a generalization of Shapiro’s contradiction backtracing algorithm; a heuristic guided search of refinement graphs; and a LIME-like theory evaluator. Although similar in spirit to MIS, NRMIS avoids its dependence on an oracle while retaining the expressiveness of a hypothesis language that allows recursive clauses and function symbols. NRMIS is tested on domains involving noisy and sparse data. The results illustrate NRMIS’s ability to induce accurate theories in all of these situations.

Eric McCreath, Mark Reid

A Graphical Method for Parameter Learning of Symbolic-Statistical Models

We present an efficient method for statistical parameter learning of a certain class of symbolic-statistical models (called PRISM programs) including hidden Markov models (HMMs). To learn the parameters, we adopt the EM algorithm, an iterative method for maximum likelihood estimation. For the efficient parameter learning, we first introduce a specialized data structure for explanations for each observation, and then apply a graph-based EM algorithm. The algorithm can be seen as a generalization of Baum-Welch algorithm, an EM algorithm specialized for HMMs. We show that, given appropriate data structure, Baum-Welch algorithm can be simulated by our graph-based EM algorithm.

Yoshitaka Kameya, Nobuhisa Ueda, Taisuke Sato

Parallel Execution for Speeding Up Inductive Logic Programming Systems

This paper describes a parallel algorithm and its implementation for a hypothesis space search in Inductive Logic Programming (ILP). A typical ILP system, Progol, regards induction as a search problem for finding a hypothesis, and an efficient search algorithm is used to find the optimal hypothesis. In this paper, we formalize the ILP task as a generalized branch-and-bound search and propose three methods of parallel executions for the optimal search. These methods are implemented in KL1, a parallel logic programming language, and are analyzed for execution speed and load balancing. An experiment on a benchmark test set was conducted using a shared memory parallel machine to evaluate the performance of the hypothesis search according to the number of processors. The result demonstrates that the statistics obtained coincide with the expected degree of parallelism.

Hayato Ohwada, Fumio Mizoguchi

Discovery of a Set of Nominally Conditioned Polynomials

This paper shows that a connectionist law discovery method called RF6 can discover a law in the form of a set of nominally conditioned polynomials, from data containing both nominal and numeric values. RF6 learns a compound of nominally conditioned polynomials by using single neural networks, and selects the best one among candidate networks, and decomposes the selected network into a set of rules. Here a rule means a nominally conditioned polynomial. Experiments showed that the proposed method works well in discovering such a law even from data containing irrelevant variables and a small amount of noise.

Ryohei Nakano, Kazumi Saito

H-Map: A Dimension Reduction Mapping for Approximate Retrieval of Multi-dimensional Data

We propose a projection mapping H-Map to reduce dimensionality of multi-dimensional data, which can be applied to any metric space such as L₁ or L_∞ metric space, as well as Euclidean space. We investigate properties of H-Map and show its usefulness for spatial indexing, by comparison with a traditional Karhunen-Loéve (K-L) trans-formation, which can be applied only to Euclidean space. H-Map does not require coordinates of data unlike K-L transformation. H-Map has an advantage in using spatial indexing such as R-tree because it is a continuous mapping from a metric space to an L_∞ metric space, where a hyper-sphere is a hyper-cube in the usual sense.

Takeshi Shinohara, Jianping Chen, Hiroki Ishizaka

Normal Form Transformation for Object Recognition Based on Support Vector Machines

This paper proposes Normal Form Transformation (NFT) as a preprocessing of Support Vector Machines (SVMs). Object recognition from images can be regarded as a fundamental technique in discovery science. Aspect-based recognition with SVMs is effective under constrained situations. However, object recognition from rotated, shifted, magnified or reduced images is a difficult task for simple SVMs. In order to circumvent this problem, we propose NFT, which rotates an image based on low-luminance directed vector and shifts, magnifies or reduces the image based on the object’s maximum horizontal distance and maximum vertical distance. We have applied SVMs with NFT to a database of 7200 images concerning 100 different objects. The recognition rates were over 97% in these experiments except for cases of extreme reduction. These results clearly demonstrate the effectiveness of the proposed approach in aspect-based recognition.

Shinsuke Sugaya, Einoshin Suzuki

Posters

A Definition of Discovery in Terms of Generalized Descriptional Complexity

Hideo Bannai, Satoru Miyano

Feature Selection Using Consistency Measure

Feature selection methods search for an “optimal” subset of features. Many methods exist. We evaluate consistency measure along with different search techniques applied in the literature and suggest a guideline of its use.

Manoranjan Dash, Huan Liu, Hiroshi Motoda

A Model of Children’s Vocabulary Acquisition Using Inductive Logic Programming

Koichi Furukawa, Ikuo Kobayashi, Tomonobu Ozaki, Mutsumi Imai

Automatic Acquisition of Image Processing Procedures from Sample Sets of Classified Images Based on Requirement of Misclassification Rate

We proposed a new image processing expert system for knowledge discovery from an image database. This system can construct an image recognition procedure to satisfy the given condition of misclassification rates. The system was applied to a defect part extraction problem on LSI packages with promising results.

Toshihiro Hamada, Akinobu Shimizu, Toyofumi Saito, Jun-ichi Hasegawa, Jun-ichiro Toriwaki

“Thermodynamics” from Time Series Data Analysis

Hiroshi H. Hasegawa, Takashi Washio, Yukari Ishimiya

Developing a Knowledge Network of URLs

We presented the system KN which visualizes Web graph. The graph is personal and progressive. Each user develops interactively his knowledge with the system.URLs obtained by a search engine has almost no connections each other. But, with an adequate background knowledge, we have a Web graph in which some URLs have connections. Moreover, when we get a URL from our system, we easily see the structure around the URL.Increasing background knowledges and the contents of URL database is an important future work. It is also a future work to build a system that enables a user to add some information to a URL. For example, keywords used in a search is an important and very personal property for the URLs.

Daisuke Ikeda, Tsuyoshi Taguchi, Sachio Hirokawa

Derivation of the Topology Structure from Massive Graph Data

Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda

Mining Association Algorithm Based on ROC Convex Hull Method in Bibliographic Navigation System

Abstract

Minoru Kawahara, Hiroyuki Kawano

Regularization of Linear Regression Models in Various Metric Spaces

Vladimir V. Krepets

Argument-Based Agent Systems

Software Demonstration

Shinya Maeda, Yuichi Umeda, Cyunyang Guan, Hajime Sawamura

Graph-Based Induction for General Graph Structured Data

Takashi Matsuda, Tadashi Horiuchi, Hiroshi Motoda, Takashi Washio, Kohei Kumazawa, Naohide Arai

Rules Extraction by Constructive Learning of Neural Networks and Hidden-Unit Clustering

Marghny H. Mohamed, Koichi Niijima

Weighted Majority Decision among Region Rules for a Categorical Dataset

Akihiro Nakaya, Shinichi Morishita

Rule Discovery Technique Using GP with Crossover to Maintain Variety

Many GP learning methods have been proposed to decrease node combinations in order to keep the node combinations from explosively increasing. We propose a technique using an opposite approach which tests a greater number of combinations in order to decrease the chances of the search being ‘trapped’ in a local optimum. In the proposed technique, how ‘different’ the individual structure is is used as an index in selecting individuals for genetic operations. Therefore, variety in the GP group is strongly maintained, and it is expected that GP learning is always done to a new combination.

Ayahiko Niimi, Eiichiro Tazaki

From Visualization to Interactive Animation of Database Records

Different from the other information visualization systems, our system provides a generic framework for developing various different types of information visualization. Its visual object materializes a record as a directly manipulable object in an interactive VR environment. Users can make its copies to reuse in a different VR environment.

Makoto Ohigashi, Yuzuru Tanaka

Extraction of Primitive Motion for Human Motion Recognition

Ryuta Osaki, Mitsuomi Shimada, Kuniaki Uehara

Finding Meaningful Regions Containing Given Keywords from Large Text Collections

Kunihiko Sadakane, Hiroshi Imai

Mining Adaptation Rules from Cases in CBR Systems

Shadan Saniepour, Behrouz H. Far

An Automatic Acquisition of Acoustical Units for Speech Recognition Based on Hidden Markov Network

We have proposed a method to acquire speech recognition units automatically. From the experimental results, vowel-consonant type acoustic segments were obtained.

Motoyuki Suzuki, Takafumi Hayashi, Hiroki Mori, Shozo Makino, Hirotomo Aso

Knowledge Discovery from Health Data Using Weighted Aggregation Classifiers

Toru Takae, Minoru Chikamune, Hiroki Arimura, Ayumi Shinohara, Hitoshi Inoue, Shun-ichi Takeya, Keiko Uezono, Terukazu Kawasaki

Search for New Methods for Assignment of Complex Molecular Spectra

Takehiko Tanaka, Takashi Imajo

Automatic Discovery of Definition Patterns Based on the MDL Principle

Masatoshi Tsuchiya, Sadao Kurohashi

Detection of the Structure of Particle Velocity Distribution by Finite Mixture Distribution Model

Genta Ueno, Shinobu Machida, Nagatomo Nakamura, Tomoyuki Higuchi, Tohru Araki

Mutagenes Discovery Using PC GUHA Software System

Premysl Zak, Jaroslava Halova

Discovering the Primary Factors of Cancer from Health and Living Habit Questionnaires

We have successfully obtained mining results with the questionnaire data. This application is very significant because the results are directly apply to the cancer prevention and cancer control. The following step is to collect more data (the other district questionnaire data) and do further mining to generate more genaral rules.

Xiaolong Zhang, Tetsuo Narita

Backmatter

Titel: Discovery Science
herausgegeben von: Setsuo Arikawa
Koichi Furukawa
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-46846-2
Print ISBN: 978-3-540-66713-1
DOI: https://doi.org/10.1007/3-540-46846-3

Springer Professional