Top

1999 | Book

Read chapter Read first chapter

Classification in the Information Age

Proceedings of the 22nd Annual GfKl Conference, Dresden, March 4–6, 1998

Editors: Prof. Dr. Wolfgang Gaul, Prof. Dr. Hermann Locarek-Junge

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Classification, Data Analysis, and Knowledge Organization

Included in: Professional Book Archive

Frontmatter

Plenary and Semi Plenary Presentations

Frontmatter

Classification and Information

Scientific Information Systems and Metadata

This article begins with a short survey on the history of the classification of knowledge. It briefly discusses the traditional means of keeping track of scientific progress, i.e., collecting, classifying, abstracting, and reviewing all publications in a field. The focus of the article, however, is on modern electronic information and communication systems that try to provide high-quality information by automatic document retrieval or by using metadata, a new tool to guide search engines. We report, in particular, on efforts of this type made jointly by a number of German scientific societies. A full version of this paper including all hypertext references, links to online papers and references to the literature can be found under the URL: http://elib.zib.de/math.org.softinf.pub

M. Grötschel, J. Lügger

Multiple Expert Fusion

The problem of classifier combination is considered in the context of the two main fusion scenarios: fusion of opinions based on identical and on distinct representations. We show that in both cases (distinct and shared representations), the expert fusion involves the computation of a linear or nonlinear function of the a posteriori class probabilities estimated by the individual experts. Classifier combination can therefore be viewed in a unified way as a multistage classification process whereby the a posteriori class probabilities generated by the individual classifiers are considered as features for a second stage classification scheme 1.

J. Kittler

How To Make a Multimedia Textbook and How to Use It

Taking the example of an innovative textbook on algorithm design, we illustrate the problems to be faced in scientific electronic publishing within the context of open educational environments. The issues to be considered particularly relate to design and choice of media types and document types. In reviewing the textbook’s production process, we show that the problems to be faced are mostly due to a lack of tools to support the authors.

Thomas Ottmann, Matthias Will

Clustering and Neural Network Approaches

This paper describes how clustering problems can be resolved by neural network (NN) approaches such as Hopfield nets, multi-layer perceptrons, and Kohonen’s ’self-organizing maps’ (SOMs). We emphasize the close relationship between the NN approach and classical clustering methods. In particular, we show how SOMs are derived by stochastic approximation from a new continuous version (K-criterion) of a finite-sample clustering criterion proposed by Anouar et al. (1997). In this framework we determine the asymptotic behaviour of Kohonen’s method, design a new finite-sample version of the SOM approach of the k-means type, and propose various generalizations along the lines of classical ’regression clustering’, ’principal component clustering’, and ’maximum-likelihood clustering’.

Hans-Hermann Bock

Data Model and Classification by Trees

Let D be an ultrametric (or tree) distance, T its tree representation, and Δ a dissimilarity matrix that is an estimate of D. Our aim is to reconstruct T from Δ. This problem is encountered, for example, in Biology and Archaeology, where T represents the history of some living species or relics from the past, and where Δ estimates the pairwise divergence times between these species or relics. Moreover, we assume that the variance-covariance matrix of the elements in Δ is available. This matrix may be a consequence of the experimental process used to collect the data, or induced by the data model at hand. We propose a way of benefiting from this additional knowledge, by modifying the usual agglomerative (or ascending) algorithm. At each step of the algorithm, this involves reducing the dissimilarity matrix Δ so that the variance of its elements is minimized. In this way, we obtain better estimates for selecting the pair of objects to be agglomerated and estimating the edge-lengths. The method we propose applies to both ultrametric and tree distances, and it has a low computational complexity. This method has been used to deal with data issued from biological sequences, which implies a rather complex, non-diagonal variance-covariance matrix. Very good results have been obtained, specially concerning the ability to recover the structure of the true tree T.

Olivier Gascuel

A Framework for the Design, Development and Evaluation of Multimedia Based Learning Environments: ”ISTOS: An Application in the Computer Networking Domain”

We experience nowadays a major shift in the way we understand the design and development of educational environments in our institutions. New technologies find their way into the traditional educational settings, although a multitude of problems has to be dealt with. In this paper authors comment first on the general aspects of multimedia technology and multimedia product delivery along with problems arising when integrating such products into educational environments. Moreover authors present an instructional framework based on Cognitive Flexibility Theory and Cognitive Apprentiship prescriptions in their attempt to address these problems and propose a solution.

A. Pombortsis, S. Demetriadis, A. Karoulis

Natural and Constrained Classification of Data by Fuzzy Clustering Techniques

Assigning objects to some similarity classes is fundamental to the process of scientific discovery and even to daily life. This universal process, that started some millenia ago by giving generic names to objects, has been the subject of automatic data processing procedures for about half a century. Nowadays, a broad choice of techniques for the classification of objects, described by a set of multidimensional variables, are fully operational. These techniques are all based on the principle that similar objects should be gathered in a same cluster, whereas dissimilar objects belong to different clusters. The paper situates the question of the ‘natural’ classes in a broader perspective before proposing a conceptual frame, within which a more pragmatic approach is developed, in line with the different classification algorithms and the concept of anisotropic parameter space. Against this background, a number of well-known methods are analysed and compared.

E. Trauwaert

Finance and Risk

From Variance to Value at Risk: A Unified Perspective on Standardized Risk Measures

Risk is a concept which matters to many issues in economics and finance. The range of risk measures proposed goes from classics like variance to modern approaches like Value-at-Risk (VaR). In this paper, after a short characterization of manager’s intuitive notion of risk, an overview of those risk measures is given which try to measure risk in a standardized way independent of individually varying perception. Then, it is shown that all these measures including Value-at-Risk, basically, are special cases of a certain well-known family of risk measures. From this point of view, the most critical features of each measure, particularly of VaR, become immediately evident.

Hans Wolfgang Brachinger

Does the Planning Horizon Affect the Portfolio Structure?

Does the composition of the optimal portfolio depend on the planning horizon? According to popular opinion there exists a planning horizon effect if initial wealth has to be allocated between shares and the risk-free asset: the percentage invested into shares should increase if the planning horizon is extended. The paper reviews the theoretical underpinnings of the statement. In the framework of expected utility the findings are mixed. Some results can be derived which contradict the popular opinion. But one can also find results which support the popular opinion. The conclusions depend on the class of utility functions under consideration and on the alternatives to be compared. However, from the analysis of shortfall models strong evidence in favor of the popular opinion can be inferred. In addition, the optimal percentage invested in the stock market can easily be quantified. The well-known shortfall criteria of Roy, Kataoka, and Telser are studied in some detail.

G. Bamberg, G. Dorfleitner, R. Lasch

Statistical Approach in Financial Risk Analysis and Management — Review of Concepts

The paper presents a short overview of general ideas in the analysis and management of financial risk. The emphasis is put on the application of quantitative methods. First of all, some historical remarks are given. Then the different concepts of understanding are discussed. The review of commonly used statistical measures, where risk is understood as volatility and as sensitivity is given. Finally, the general method of risk management is presented. This method aims at keeping the sensitivity of portfolio at a desired level.

Krzysztof Jajuga

Classification and Related Aspects of Data Analysis and Learning

Frontmatter

Classification, Data Analysis, and Statistics

On Data-Based Checking of Hypotheses in the Presence of Uncertain Knowledge

Interval-probability (IP) is a substantial generalization of classical probability. It allows to adequately model different aspects of uncertainty without loosing the neat connection to the methodology of classical statistics. Therefore it provides a well-founded basis for data-based reasoning in the presence of uncertain knowledge. — The paper supports that claim by outlining the generalization of Neyman-Pearson-tests to IP. After introducing some basics of the theory of IP according to Weichselberger (1995, 1998) the fundamental concepts for tests are extended to IP; then the Huber-Strassen-theory is briefly reviewed in this context and related theorems for general IP are given. Finally further results are sketched.

Th. Augustin

Multivariate Directional Tests with Incomplete Data

Test procedures for the analysis of multivariate problems are introduced, which are more powerful than the conventional Hotelling’s T2 for detecting alternatives where the (treatment) effect has the same direction for all observed variables. They are able to analyse incomplete data, versions exist which do not require normally distributed variables, and, for complete data, the number of dependent variables can arbitrarily exceed the number of independent subjects.

Thomas Bregenzer

Classification and Positioning of Data Mining Tools

Various models for the KDD (Knowledge Discovery in Databases) process are known, which mainly differ with respect to the number and description of process activities. We present a process unification by assigning the single steps of these models to five main stages and concentrate on data mining aspects. An overview concerning data mining software tools with focus on inbuilt algorithms and additional support provided for the main stages of the KDD process is given within a classification and positioning framework. Finally, an application of a modification of an association rule algorithm is used as empirical example to demonstrate what can be expected when data mining tools are used to handle large data sets.

W. Gaul, F. Säuberlich

Hazard Rate Estimation from Censored Data

The hazard rate has become an important statistical tool in the methodologic repertoire of modern failure time analysis. Hazard rate estimation is increasingly being employed in a variety of practical applications. In this paper, a brief review on nonparametric kernel methods for estimating the hazard rate from censored data is provided and the current software situation regarding implementations of this methodology is described.

O. Gefeller

A Sequential Modification of EM Algorithm

In the framework of estimating finite mixture distributions we consider a sequential learning scheme which is equivalent to the EM algorithm in case of a repeatedly applied finite set of observations. A typical feature of the sequential version of the EM algorithm is a periodical substitution of the estimated parameters. The different computational aspects of the considered scheme are illustrated by means of artificial data randomly generated from a multivariate Bernoulli distribution.

J. Grim

Analysis of the Stability of Clusters of Variables via Bootstrap

Many cluster algorithms only allow the calculation of a partition without the possibility of evaluating the stability or variability of the solution due to the randomness of the sample. Resampling methods as the bootstrap provide a general framework within which one can analyse the stability of the results of a cluster analysis. We use it in the context of investigating psychological concepts based on variables of a questionnaire. We propose several measures to evaluate the variability of the clustering and exemplify the approach with a study on belief-attitudes of adults.

U. Halekoh, K. Schweizer

Models and Methods for Clusterwise Linear Regression

Three models for linear regression clustering are given, and corresponding methods for classification and parameter estimation are developed and discussed: The mixture model with fixed regressors (ML-estimation), the fixed partition model with fixed regressors (ML-estimation), and the mixture model with random regressors (Fixed Point Clustering). The number of clusters is treated as unknown. The approaches are compared via an application to Fisher’s Iris data. By the way, a broadly ignored feature of these data is discovered.

C. Hennig

Statistical Clustering Under Distortions: Optimality and Robustness

Statistical clustering of multivariate observations is considered under three types of distortions: small-sample effects, presence of outliers, and Markov dependence of class indices. Asymptotic expansions of risk are constructed and used for analysis of robustness characteristics and also for synthesis of new clustering algorithms under distortions.

Yu. Kharin

Discrete Scan Statistics for Detecting Change-points in Binomial Sequences

A finite sequence of independent binomial variables is considered for which the null hypothesis of homogeneity is to be tested against the alternative hypotheses of one or two change-points, respectively. The tests are based on the likelihood ratio statistic or on an approximate linearization of the log likelihood ratio statistic. This yields scan type or cusum statistics in a discrete situation. At the same time maximum likelihood or approximate maximum likelihood estimates of the locations of the change-points are derived. For the problem with two change-points — corresponding to the detection of a cluster in time — we distinguish between the cases with a fixed and a variable distance of the two points. Under the null hypothesis exact upper bounds for the upper tails are derived. In the special case of Bernoulli variables these bounds are considerably simplified.

J. Krauth

Dynamic Confinement, Classification, and Imaging

The problem of matching two images of the same objects but after movements or slight deformations arises in medical imaging, but also in the microscopic analysis of physical or biological structures. We present a new matching strategy consisting of two steps. We consider the grey level function (modulo a normalization) as a probability density function. First, we apply a density based clustering method in order to obtain a tree or more generally a hierarchy which classifies the points on which the grey level function is defined. Secondly, we use the identification of the hierarchical representation of the two images to guide the image matching or to define a distance between the images for object recognition. The transformation invariance properties of the representations, that we will demonstrate, permit to extract invariant image points. But in addition, using the identification of the hierarchical structures, they permit also to find the correspondence between invariant points even if these have moved locally. Finally, we mention possibilities to construct hierarchies which integrate more geometrical information. The method’s results on real images will be discussed.

J. Mattes, J. Demongeot

Approximation of Distributions by Sets

The well-known ’k-means’ clustering can be regarded as an approximation of a given distribution (which can be a sample) by a set of optimally chosen k points. However, in many cases approximative sets of different types are of interest. For example, approximation of a distribution by circles is important in allocating communication stations, the circles being interpreted as working areas of the stations. The paper covers two related topics. First we propose a heuristic algorithm to find k circles of a given radius r that fit with the planar data set. Then we analyse the problem of consistency: does a sequence of sample-based sets of optimal circles converge to the class of optimal circles for the population? The positive answer is given for arbitrary finite-dimensional normed linear spaces.

K. Pärna, J. Lember, A. Viiart

Computation of the Minimum Covariance Determinant Estimator

Robust estimation of location and scale in the presence of outliers is an important task in classification. Outlier sensitive estimation will lead to a large number of misclassifications. Rousseeuw introduced two estimators with high breakdown point, namely the minimum-volume-ellipsoid estimator (MVE) and the minimum-covariance-determinant estimator (MCD). While the MCD estimator has better theoretical properties than the MVE, the latter one appears to be used more widely. This may be due to the lack of fast algorithms for computing the MCD, up to now.In this paper two branch-and-bound algorithms for the exact computation of the MCD are presented. The results of their application to simulated samples are compared with a new heuristic algorithm “multistart iterative trimming” and the steepest descent method suggested by Hawkins. The results show that multistart iterative trimming is a good and very fast heuristic for the MCD which can be applied to samples of large size.

Christoph Pesch

Graphical Tools for the Detection of Multiple Outliers in Spatial Statistics Models

Cerioli and Riani (1998) recently suggested a novel approach to the exploratory analysis of spatially autocorrelated data, which is based on a forward search. In this paper we suggest a modification of the original technique, namely a block-wise forward search algorithm. Furthermore, we show the effectiveness of our approach in two examples which may be claimed to be ‘difficult’ to analyse in practice. In this respect we also show that our method can provide useful guidance to the identification of nonstationary trends over the observation area. Throughout, the emphasis is on exploratory methods and joint display of cogent graphical plots for the visualization of relevant spatial features of the data.

Marco Riani, Andrea Cerioli

Classification for Repeated Measurements in Gaussian and Related Populations

The present paper deals with likelihood classification rules for jointly assigning n₃ individuals or one individual n₃-times measured from a population Π₃ to one of two populations Π₁ and Π₂. Several cases of completely or only partly known parameters of Gaussian or elliptically contoured overall sample distributions are considered. Special emphasis will be on geometric representation formulae for different classification rules. Based upon such formulae it is possible to evaluate probabilities of correct classification explicitely as has been proved in Krause and Richter (1994a,b).

W.-D. Richter

Optimal vs. Classical Linear Dimension Reduction

We describe a computer intensive method for linear dimension reduction which minimizes the classification error directly. Simulated annealing (Bohachevsky et al. (1986)) is used to solve this problem. The classification error is determined by an exact integration. We avoid distance or scatter measures which are only surrogates to circumvent the classification error. Simulations (in two dimensions) and analytical approximations demonstrate the superiority of optimal classification over the classical procedures. We compare our procedure to the well-known canonical discriminant analysis (homoscedastic case) as described in McLachlan (1992) and to a method by Young et al. (1987) for the heteroscedastic case. Special emphasis is put on the case when the distance based methods collapse. The computer intensive algorithm always achieves minimal classification error.

Michael C. Röhl, Claus Weihs

Testing for the Number of States in Hidden Markov Models with Application to Ion Channel Data

Noisy data recorded from ion channels can be adequately modelled by hidden Markov models with a finite number of states. We address the problem of testing for the number of hidden states by means of the likelihood ratio test. Under the null hypothesis some parameters are on the boundary of the parameter space or some parameters are only identifiable under the alternative, and therefore the likelihood ratio tests have to be applied under nonstandard conditions. The exact asymptotic distribution of the likelihood ratio statistic cannot be derived analytically. Thus, we investigate its asymptotic distribution by simulation studies. We apply these tests to data recorded from potassium channels.

M. Wagner, S. Michalek, J. Timmer

Clustan Graphics3 Interactive Graphics for Cluster Analysis

ClustanGraphics3 is a new interactive program for hierarchical cluster analysis. It can display shaded representations of proximity matrices, dendrograms and scatterplots for 11 clustering methods, with an intuitive user interface and new optimization features. Algorithms are proposed which optimize the rank correlation of the proximity matrix by seriation, compute cluster exemplars and truncate a large dendrogram and proximity matrix. ClustanGraphics3 is illustrated by a market segmentation study for automobiles and a taxonomy of 20 species based on the amino acids in their protein cytochrome-c molecules. The paper concludes with an overview.

David Wishart

Conceptual Analysis and Learning

Conceptual Meaning of Clusters

The interpretation of cluster analysis solutions in the case of objectattribute data can be supported by methods of Formal Concept Analysis leading to a conceptual understanding of the “meaning” of clusters, partitions and dendrograms. The central idea is the embedding of a given cluster in a conceptual scale which represents the user’s granularity with respect to the values of attributes in the original data. This method is demonstrated using data from ALLBUS 1996.

P. Bittner, C. Eckes, K. E. Wolff

Group Theoretical Structures for Representing Data Contexts

Data contexts, which describe the relation between objects, attributes, and attribute values, can often be described advantageously by algebraic structures. In this contribution, a representation of data contexts by finite abelian groups will be discussed. For this representation, a framework is given by contexts which have the elements of a group as objects (which label the rows of the data table or context), the elements of its character group as attributes (which label the columns), and the elements of the complex unit circle as attribute values (these label the entries in the cells of the data table). The non-empty extents of the appropriately scaled context are exactly the subgroups and their cosets. For the analysis of data, it is important to examine which data contexts are isomorphic to or can be embedded into such group contexts. This will be explained by some examples, in particular taken from the field of experimental designs.

A. Großkopf

The Comparative Efficacy of Some Combinatorial Tests for Detection of Clusters and Mixtures of Probability Distributions

Assume that nq-dimensional data points have been obtained and subjected to a cluster analysis algorithm. A potential concern is whether the resulting clusters have a “causal” interpretation or whether they are merely consequences of “random” fluctuation. In previous reports, the asymptotic properties of a number of potentially useful combinatorial tests based on the theory of random interval graphs were described. In the present work, comparisons of the asymptotic efficacy of a class of these tests are provided. As a particular illustration of potential applications, we discuss the detection of mixtures of probability distributions and provide some numerical illustrations.

B. Harris, E. Godehardt

Efficient State-Space Representation by Neural Maps for Reinforcement Learning

For some reinforcement learning algorithms the optimality of the generated strategies can be proven. In practice, however, restrictions in the number of training examples and computational resources corrupt optimality. The efficiency of the algorithms depends strikingly on the formulation of the task, including the choice of the learning parameters and the representation of the system states. We propose here to improve the learning efficiency by an adaptive classification of the system states which tends to group together states if they are similar and aquire the same action during learning. The approach is illustrated by two simple examples. Two further applications serve as a test of the proposed algorithm.

Michael Herrmann, Ralf Der

Match-Gaphs of Random Digraphs

We present some results devoted to the existence, numbers and orders of strong connected clusters and cliques in random digraphs and their match-graphs. These results may provide an interesting sociological interpretation.

Jerzy Jaworski, Zbigniew Palka

An Improved Training Method for Feed-Forward Neural Networks

In many fields of signal processing feed-forward neural networks, especially multilayer perceptron neural networks, are used as approximators. We suggest to state the weight adaptation process (training) as an optimization procedure solving a conventional nonlinear regression problem. Thus the presented theory can easily be adapted to any similar problem.Recently it has been shown that second order methods yield to a fast decrease of the training error for small and medium scaled neural networks. Especially Marquardt’s algorithm is well known for its simplicity and high robustness.In this paper we show an innovative approach to minimize the training error. We will demonstrate that an extension of Marquardt’s algorithm, i.e. the adaptation of the increasing/decreasing factor, leads to much better convergence properties than the original formula. Simulation results illustrate excellent robustness concerning the initial values of the weights and less overall computational costs.

M. Lendl, R. Unbehauen

Decision Support By Order Diagrams

Formal Concept Analysis is a mathematical method of qualitative data analysis. The present experiment was aimed at finding out if and how far graphical representation tools of Formal Concept Analysis can be used to support choice decisions. This new approach is mainly descriptive and intends both to increase the large number of practical applications of Formal Concept Analysis and to enrich the cognitive-psychological field of Decision Analysis by mathematicalalgebraic aspects.

Torsten Rönsch, Guido Weißhahn

Neural Network Classification in Exponential Models with Unknown Statistics

This contribution demonstrates the possibility to achieve a Bayesian (or nearly Bayesian) classification of exponentially distributed data by perceptrons with at most two hidden layers. The number of hidden layers depends on how much is known about the sufficient statistics figuring in the corresponding exponential distributions. A practical applicability is illustrated by classification of normally distributed data. Experiments with such data proved that, in the learning based on correct classification information, the error backpropagation rule is able to create in the hidden layers surprisingly good approximations of apriori unknown sufficient statistics. This enables the trained network to imitate Bayesian classifiers and to achieve minimum classification errors.

I. Vajda

Conceptual Landscapes of Knowledge: A Pragmatic Paradigm for Knowledge Processing

Knowledge understood on the basis of Peirce’s Pragmatism can be activated by using the metaphor of landscape. This approach is outlined by discussing conceptual landscapes of knowledge within the development of Formal Concept Analysis. Various tasks of knowledge processing are considered such as exploring, searching, recognizing, identifying, analyzing, investigating, deciding, improving, restructuring, and memorizing. For all these tasks examples of concrete applications are given which show the fruitfulness of the landscape paradigm of knowledge; in most of those applications, the conceptual structures are implemented by using the management system TOSCANA.

R. Wille

Usage of New Media and the Internet

Frontmatter

Information Systems, Multimedia, and WWW

Remote Data Analysis Using Java

In this paper we examine the use of Java’s networking capabilities in a client/server application for data analysis. On the client side runs an applet which collects analysis requests. The applet sends the requests to the server where the analysis actually takes place. The server sends the analysis results back to the client which displays them graphically. We demonstrate the use of Java’s RMI by means of a simple classification example.

S. Kuhlins, M. Schader

Metrics for World Wide Web Information Systems

The number of information systems in the world wide web is growing continuously. However, the development process of web information systems is not yet sufficiently supported by adequate methods during all phases. Especially methods for cost estimation are missing. Thus, the development of web information systems does not only bear the risk of unforeseen high implementation cost, but also of uncontrollable maintenance cost. In this paper we present measures for world wide web information systems based on a conceptual model. Existing cost estimation methods in software engineering are transferred to the development of web information systems. Furthermore, the computation of the size of an information system allows its classification and helps to find similar web information systems as reference.

K. Lenz, A. Oberweis, A. v. Poblotzki

Modeling Concepts for Flexible Workflow Support

In order to completely support business processes using a workflow management system, all the knowledge inherent in these processes has to be captured by the workflow modeling language. To meet the requisitions for flexibilty and expressiveness we developed the workflow language EventFlow_L based on an event model and we integrated the Unified Modeling Language (UML) for data modeling and the Object Constraint Language (OCL) as a connecting element. The events generated during execution of a workflow can be analyzed to learn from completed processes.

R. Schätzle, W. Stucky

Navigation and Classification on the Internet and Virtual Universities

Visualization and Categorization of Cached Hypermedia Data

We argue that categorization and visualization of the explored information space (e.g. the web) is desired in order to improve the re-access of information. Traditional browsers support only mechanisms such as bookmarks and/or history lists in order to revisit and to reload documents. This paper presents an innovative tool (CineCat) that uses the client side cache to categorize and visualize the explored web space. CineCat supports filtering techniques based on categories to provide simplified views of the information space.

F. Dridi, T. Hülsbusch, G. Neumann

Virtualization of Course Structures Through Adaptive Internet Techniques

Currently, most curricula in university are based on independent courses and do not take into account existing knowledge and learning preferences of the students. We provide a model which tries to deal with these requirements and provides more flexibility. The model is based on the division of courses into self-contained concepts and user-specific navigation structures.

E. Köppen, G. Neumann

Navigation in Cyberspace Using Multi-Dimensional Scaling to Create Three-dimensional Navigational Maps

This paper presents results regarding the performance of multidimensional scaling (MDS) when used to create three-dimensional navigation maps. MDS aims at reducing high-dimensional space into low-dimensional landscapes. Combined with browsers which are capable of visualizing three-dimensional object information by applying the conceptual basis of Virtual Reality Modeling Language (VRML), MDS opens new possibilities for cognitive receptive navigation. Web3D, a prototype implementation using MDS, reveals the potential of this idea.

D. Schoder

Structured Documentation. An Approach to Identify and Model Knowledge Components for Learning Purposes

Global markets and worldwide collaborative networks demand a new approach to knowledge acquisition, distribution and use. Teaching and learning are no longer restricted to school and university alone but become a lifelong challenge. Organizational knowledge management evolves to a strong success factor in competitive markets. As large parts of organizational knowledge exist in the form of documents, not structured data in database systems, the methodical analysis and structuring of documents becomes an important issue for corporate information management. The potential of semantically structured documents as a platform for cross media organization of corporate knowledge components is demonstrated by the research project ELBE.

E. Schoop

Applications in Economics

Frontmatter

Finance, Capital, and Risk Management

Measurement of the Probability of Insolvency with Mixture-of-Experts Networks

The information how probable it is that a given company becomes insolvent is important for owners, creditors and other financiers of this company. Especially investors are in need of this information to calculate and control the risk they take with an investment decision. We show in this paper how the probability of corporate failure can be measured with artificial neural networks (ANN), namely mixture-of-experts networks. With the help of 8,660 financial statements of 3,125 industrial companies we developed a mixture-of-experts network that is able to classify 90% of all companies which became insolvent within the next three years correctly; the corresponding misclassification rate of actually solvent firms is only 29% (Jerschensky (1998))

J. Baetge, A. Jerschensky

An Application of Methods of Multivariate Data Analysis to Compare Different Approaches to Exchange Rate Forecasting

The aim of this paper is to compare different methods of forecasting exchange rates using methods of multivariate data analysis. Traditional forecasting models like the random walk, prominent structural models and approaches using the forward rate to forecast the spot rate are evaluated. Time series analysis is conducted employing univariate time series models as well as multivariate time series models and error correction models. Model identification, estimation and forecasting are examplified using the DM/US-Dollar exchange rate. Forecasting performance is measured by different criteria, within the scope of the investigation methods of multivariate data analysis are efficiently employed.

U. Bankhofer, C. Rennhak

Intelligent Software Agents: Current Technology and Potential Impact on Electronic Markets and Intermediaries

The unprecedented potential of intelligent software agents for reducing work and information overload offers significantly decreased transaction costs in internet-based business throughout the whole value chain, in particular on the part of the end customers. Changed cost structures will transform intermediation and enable new market forms. The current state of agent technology and its application in the design of retail markets is explored and categorized. The ability of agents to negotiate on behalf of their owners is shown to be of crucial importance. The economic theory of transaction costs is applied to question wideheld believes about the future development and organization of electronic markets in the light of the new technology.

Th. Burkhardt

Clustering of Stocks in Risk Context

This paper describes selecting of risk measures from statistical and fundamentals risk measures set. Two concepts of risk are analyzed: volatility of returns and sensitivity of returns. To group stocks is used k-centroids method (isodata). Statistical verification of this method is presented. The paper illustrates the uses of chosen risk measures in analyzing stocks listed on the Warsaw Stock Exchange.

M. Czekala, K. Kuziak

Measuring Risk in Value-at-Risk Based on Student’s t-Distribution

Distributional assumptions of financial return data are an important issue for asset-pricing and portfolio management as well as risk controlling. In order to capture the departure of empirical observations of financial return data from normality the Student’s t-distribution has been proposed as an alternative fat-tailed distribution in the literature. In this paper we (i) briefly summarize the Student’s t-distribution; (ii) compare the tail behavior of the Student’s t-distribution with empirical data; and (iii) discuss some implications of the empirical results on the risk management based on Value-at-Risk. We also suggest a simple statistic as a measure of tail-thickness based on the sample quantile and the first absolute moment.

S. Huschens, J.-R. Kim

Using ANN to Estimate VaR

There are various statistical techniques to estimate the market risk of a portfolio by identifying market risk factors and modeling the impact of factor changes on portfolio value. This paper shows how Value-at-Risk (VaR) estimates for market risk are obtained using artificial neural networks (ANN).

H. Locarek-Junge, R. Prinzler

Forecasting Discretized Daily USD/DEM Exchange Rate Movements with Quantitative Models

This paper takes a new look at the old theme of forecasting daily USD/DEM changes. Fundamental data with daily availability are used to build up quantitative models. The purpose of this paper is twofold: The first contribution of the paper is to analyse the influence of discretization to financial data. Second it examines the capability of a neural network for forecasting daily exchange rate movements and compares its predictive power with that of linear regression and discriminant analysis in case of discretized data. Thus the objective of this study is to address the issues faced by users of quantitative forecasting systems in terms of appropriate data transformations and model selection.

E. Steurer, M. Rothenhäusler, Y. Yeo

Study Major Choice — Factor Preference Measurement

Multidimensional statistical analysis uses techniques of variables selection and aggregation. Numerous economic phenomena belongs to the application field of multivariate analysis. Generally all phenomena described by large number of variables may be analysed within this framework. Those phenomena require descriptive characteristics that bring difficulty of clear understanding and unilateral evaluation. Bank evaluation is one amidst the others that undoubtedly demands multidimensional analysis. Bank as an institution of public trust is evaluated both from outside (client, stockholders, investors) and inside (bank management). This calls for better methodology: transborder reference system might be regarded as one of the interesting proposals.

Danuta Strahl, Józef Dziechciarz

What is “Incentive Compatibility”?

In spite of its overwhelming success in theory and practice the shortcomings of the neo-classical approach in finance are well-known. It cannot explain many important phenomena, so that serious doubts on the generated positive results arise. Consequently interest shifts towards the neo-institutional framework, but it is quite difficult to judge the relevance of its results, since they are very sensitive to assumptions and details of the models. For instance, solution mechanisms for (financial) problems should be “incentive compatible”. This describes an idea and not an exact definition. There exists a variety of operationalizations, and due to the mentioned lack of robustness some sort of classification of models turns out to be an important task for future research.

R. Trost

Marketing and Market Research

An Integrative Approach for Product Development and Customer Satisfaction Measurement

As soon as the automotive industry in the western world countries recognizes that the outstanding performance of Japanese manufacturers is the result of a customer-oriented understanding of quality, the QFD approach became a famous instrument to achieve the product quality demanded by customers. Nevertheless this concept has its limits. In this article we suggest the extension of the QFD approach on the base of customer values and benefits.

A. Herrmann, C. Seilheimer, I. Vetter, M. Wricke

Market Segmentation and Profiling Using Artificial Neural Networks

An increasing number of applications of Artificial Neural Networks (ANN) in the field of Management Science demonstrates the growing relevance of modeling techniques summarized under this headline. This situation is also reflected in marketing where several papers published within the last few years have focused on ANN. To evaluate its potentials for market segmentation and consumer profiling a comparison of different approaches including connectionistic and traditional models will be performed by several independent experiments for a typical set of marketing data.

Karl A. Krycha

A Classification Approach for Competitive Pricing

Pricing strategies in marketing suffer from the problem that it is difficult to model interdependencies with respect to price decisions of competing enterprises. We present an approach which tries to tackle these shortcomings, allows for additional insights into the pricing structure of a market, enables a classification of different types of competitive pricing schemes and can be incorporated into a profit optimization framework.

M. Löffler, W. Gaul

Panel-Data Based Competitive Market Structure and Segmentation Analysis Using Self-Organizing Feature Maps

The ”Self-Organizing (Feature) Map” methodology as proposed by Kohonen (1982) is employed in the context of simultaneous competitive market structure and segmentation analysis. In a demonstration study using brands preferences derived from household panel data, the adaptive algorithm results in a mapping of topologically ordered prototypes of brand choice patterns at the segment-level. Furthermore, validity aspects are discussed and the results are compared with those derived from a more traditional method.

Thomas Reutterer

Analysis of A Priori Defined Groups in Applied Market Research

The “analysis of a priori defined groups” (Aaker et al. (1995)) is an important task of applied market research, e.g., with respect to consumer segmentation. Powerful tools for dealing with this kind of problems are provided by methods of discriminant analysis. A main objective of this paper is the investigation of both past and future importance of this “traditional” approach for analyzing a priori defined groups in applied market research.

T. Temme, R. Decker

Modeling and Measuring of Competitive Reactions to Changes of Marketing Mix at the Retail Level

Since the methods of measuring consumer response to changes in marketing mix have been improved successively in the last years, the problem of analyzing and forecasting competitive reactions has become one of the most challenging topics for model-based scanner data analysis. In this paper a stochastic model for the evaluation of competitive reactions is proposed. The adequacy of this approach with respect to the analysis of data reflecting market competition is exemplarily demonstrated by using point-of-sale scanner data.

R. Wagner

Applications in Archeology, Bioinformatics, Environment, and Health

Frontmatter

Bayesian Statistics for Archaeology

Statistical methods now form an important part of the interpretative tool kit of archaeologists. Of these the most common are descriptive statistical methods such as: means and standard deviations, medians and modes, histograms, pie charts, line graphs, etc. It is increasingly common, however, for archaeologists and their co-workers (such as physicists, chemists, geologists and environmental scientists) to adopt model-based statistical tools. Such tools use mathematical representations of the processes which gave rise to the data we observe today and help us to begin to understand them. Statistical models are particularly relevant for aiding in archaeological interpretation as they allow us to include sources of uncertainty that are so often present in our understanding of the archaeological record.

Caitlin E. Buck

Investigating Information Retrieval with the Environmental Thesaurus of the Umweltbundesamt

Information retrieval in environmental issues becomes increasingly important which can be seen by the claims of the Agenda 21 and the activities of the European Topic Centre on Catalogue of Data Sources (ETC/CDS). Since thesauri are often used for indexing and retrieval, they play an essential role in information systems. One of the most significant examples in Europe is the Environmental Thesaurus (Umwelt-Thesaurus) of the Umweltbundesamt. We have examined the thesaurus structure and its interplay with the environmental database ULIDAT. The results will be presented together with suggestions for improvement.

S. Dörflein, S. Strahringer

Data Mining with a Neuro-Fuzzy Model for the Ozone Prognosis

The prediction of the ground ozone is necessary daily, to inform the population and to allow measures to be taken for the reduction of the ozone concentration at a one-hour exceeding of the ozone value of 180µg/m3.After the presentation and comparison of applied methods of the ozone prognosis we develop a fuzzy approach, by which the process of rule formation is supported by a special neural network.With ground measurements of ozone, nitrogen oxides and the meteorological series of temperature, humidity, clouding over and wind from 3 Saxon measuring places there are generated and tested optimal knowledge bases by the developed Fuzzy-Multilayer Perceptron.

K. Gärtner, R. Schulze

Attributing Shares of Risk to Grouped or Hierarchically Ordered Factors

The attributable risk has been introduced as an epidemiological parameter that quantifies one proportion of disease events in a population that can be assigned to the adverse effects of certain risk factors. Recently, this concept has been generalised to the problem of simultaneously assessing n proportions of disease events, called partial attributable risks, that can be ascribed to the individual effects of n factors. The paper describes hierarchical and grouped variants of these parameters that appropriately combine the properties of partial attributable risks with the additional possibility of grouping and/or hierarchically ordering multiple factors to allow for more realistic multifactorial frameworks.

M. Land, O. Gefeller

New Hypotheses and Tests for the Multivariate Bioequivalence Problem

Testing for the bioequivalence alternative H₁: |θ| ≤ Δ in a normal univariate setting is usually performed with the well-known ”two-one-sidedt-tests procedure”, which is an intersection-union test. Based on the intersection-union principle recently tests for the multivariate rectangular alternative H_d : max_i = 1,..., d |θ_i| ≤ Δ have been constructed. In this paper we show that rectangular hypotheses are not always a suitable generalization of H₁ to the multivariate setting. In order to overcome the drawbacks encountered with rectangular hypotheses we suggest ellipsoids as alternatives. Finally, an asymptotic test for the new hypotheses is suggested and compared with existing methods.

R. Pflüger, A. Munk

Diagnostic Support in Allergology Using Interactive Learning Classifier Systems

We present enhancements to learning classifier systems which make them capable of generating rules from case-based allergy test data. Iterated classification and the integration of experts’ suggestions mark important steps on the way to a system assisting allergologists in determining substances which are likely to be allergenes for a patient whose history is provided. After describing the requirements of this special classification problem we introduce the interactive learning classifier system as a powerful tool for these tasks and show some first practical results.

J. Schoof

On Nonparametric Repeated Measures Analysis of Variance

In medicine, there are often experiments characterized by repeated measurements on the same object. A well-known application are dose-finding studies where we try to prove a treatment effect in different doses. Establishing clinical trials in cross-over design, we have to prove both a dose and a group effect of the treatments as well as possible interactions between them. In principal, multivariate repeated measures analysis of variance (RM-ANOVA) can be used for the analysis of such a data structure (N.H. Timm (1980)). Difficulties arise if there are only a few observations available. In this case a parametric analysis of variance can not be applied anymore and we have to look for alternatives. In a first attempt and for reasons of easy practicability we applied the method of data alignment (Hildebrand (1980)) which consists of an adjustment of those factors not regarded in the current analysis and a following ranked analysis of variance.

K.-D. Wernecke, G. Kalb

Backmatter

Title: Classification in the Information Age
Editors: Prof. Dr. Wolfgang Gaul
Prof. Dr. Hermann Locarek-Junge
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-60187-3
Print ISBN: 978-3-540-65855-9
DOI: https://doi.org/10.1007/978-3-642-60187-3

Springer Professional

Table of Contents

Frontmatter

Plenary and Semi Plenary Presentations

Frontmatter

Classification and Information

Scientific Information Systems and Metadata

Multiple Expert Fusion

How To Make a Multimedia Textbook and How to Use It

Clustering and Neural Network Approaches

Data Model and Classification by Trees

A Framework for the Design, Development and Evaluation of Multimedia Based Learning Environments: ”ISTOS: An Application in the Computer Networking Domain”

Natural and Constrained Classification of Data by Fuzzy Clustering Techniques

Finance and Risk

From Variance to Value at Risk: A Unified Perspective on Standardized Risk Measures

Does the Planning Horizon Affect the Portfolio Structure?

Statistical Approach in Financial Risk Analysis and Management — Review of Concepts

Classification and Related Aspects of Data Analysis and Learning

Frontmatter

Classification, Data Analysis, and Statistics

On Data-Based Checking of Hypotheses in the Presence of Uncertain Knowledge

Multivariate Directional Tests with Incomplete Data

Classification and Positioning of Data Mining Tools

Hazard Rate Estimation from Censored Data

A Sequential Modification of EM Algorithm

Analysis of the Stability of Clusters of Variables via Bootstrap

Models and Methods for Clusterwise Linear Regression

Statistical Clustering Under Distortions: Optimality and Robustness

Discrete Scan Statistics for Detecting Change-points in Binomial Sequences

Dynamic Confinement, Classification, and Imaging

Approximation of Distributions by Sets

Computation of the Minimum Covariance Determinant Estimator

Graphical Tools for the Detection of Multiple Outliers in Spatial Statistics Models

Classification for Repeated Measurements in Gaussian and Related Populations

Optimal vs. Classical Linear Dimension Reduction

Testing for the Number of States in Hidden Markov Models with Application to Ion Channel Data

Clustan Graphics3 Interactive Graphics for Cluster Analysis

Conceptual Analysis and Learning

Conceptual Meaning of Clusters

Group Theoretical Structures for Representing Data Contexts

The Comparative Efficacy of Some Combinatorial Tests for Detection of Clusters and Mixtures of Probability Distributions

Efficient State-Space Representation by Neural Maps for Reinforcement Learning

Match-Gaphs of Random Digraphs

An Improved Training Method for Feed-Forward Neural Networks

Decision Support By Order Diagrams

Neural Network Classification in Exponential Models with Unknown Statistics

Conceptual Landscapes of Knowledge: A Pragmatic Paradigm for Knowledge Processing

Usage of New Media and the Internet

Frontmatter

Information Systems, Multimedia, and WWW

Remote Data Analysis Using Java

Metrics for World Wide Web Information Systems

Modeling Concepts for Flexible Workflow Support

Navigation and Classification on the Internet and Virtual Universities

Visualization and Categorization of Cached Hypermedia Data

Virtualization of Course Structures Through Adaptive Internet Techniques

Navigation in Cyberspace Using Multi-Dimensional Scaling to Create Three-dimensional Navigational Maps

Structured Documentation. An Approach to Identify and Model Knowledge Components for Learning Purposes

Applications in Economics

Frontmatter

Finance, Capital, and Risk Management

Measurement of the Probability of Insolvency with Mixture-of-Experts Networks

An Application of Methods of Multivariate Data Analysis to Compare Different Approaches to Exchange Rate Forecasting

Intelligent Software Agents: Current Technology and Potential Impact on Electronic Markets and Intermediaries

Clustering of Stocks in Risk Context

Measuring Risk in Value-at-Risk Based on Student’s t-Distribution

Using ANN to Estimate VaR

Forecasting Discretized Daily USD/DEM Exchange Rate Movements with Quantitative Models

Study Major Choice — Factor Preference Measurement

What is “Incentive Compatibility”?

Marketing and Market Research

An Integrative Approach for Product Development and Customer Satisfaction Measurement

Market Segmentation and Profiling Using Artificial Neural Networks

A Classification Approach for Competitive Pricing

Panel-Data Based Competitive Market Structure and Segmentation Analysis Using Self-Organizing Feature Maps

Analysis of A Priori Defined Groups in Applied Market Research

Modeling and Measuring of Competitive Reactions to Changes of Marketing Mix at the Retail Level

Applications in Archeology, Bioinformatics, Environment, and Health

Frontmatter

Bayesian Statistics for Archaeology