nach oben

1991 | Buch

Kapitel lesen Erstes Kapitel lesen

Classification, Data Analysis, and Knowledge Organization

Name: Classification, Data Analysis, and Knowledge Organization
ISBN: 978-3-540-53483-9

Models and Methods with Applications

herausgegeben von: Professor Dr. Hans-Hermann Bock, Professor Dr. Peter Ihm

Verlag: Springer Berlin Heidelberg

Buchreihe : Studies in Classification, Data Analysis, and Knowledge Organization

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

In science, industry, public administration and documentation centers large amounts of data and information are collected which must be analyzed, ordered, visualized, classified and stored efficiently in order to be useful for practical applications. This volume contains 50 selected theoretical and applied papers presenting a wealth of new and innovative ideas, methods, models and systems which can be used for this purpose. It combines papers and strategies from two main streams of research in an interdisciplinary, dynamic and exciting way: On the one hand, mathematical and statistical methods are described which allow a quantitative analysis of data, provide strategies for classifying objects or making exploratory searches for interesting structures, and give ways to make comprehensive graphical displays of large arrays of data. On the other hand, papers related to information sciences, informatics and data bank systems provide powerful tools for representing, modelling, storing and retrieving facts, data and knowledge characterized by qualitative descriptors, semantic relations, or linguistic concepts. The integration of both fields and a special part on applied problems from biology, medicine, archeology, industry and administration assure that this volume will be informative and useful for theory and practice.

Inhaltsverzeichnis

Frontmatter

Mathematical and Statistical Methods for Classification and Data Analysis

Frontmatter

Classification and clustering methods

An Agglomerative Method for Two-Mode Hierarchical Clustering

A new agglomerative method is proposed for the simultaneous hierarchical clustering of row and column elements of a two-mode data matrix. The procedure yields a nested sequence of partitions of the union of two sets of entities (modes). A two-mode cluster (bi-cluster) is defined as the union of subsets of the respective modes. At each step of the agglomerative process, the algorithm merges two bi-clusters whose fusion results in the minimum increase in an internal heterogeneity measure. This measure takes into account both the variance within a bi-cluster and its elevation defined as the squared deviation of its mean from the maximum entry in the original matrix. Two applications concerning brand-switching data and gender subtype-situation matching data are discussed.

Thomas Eckes, Peter Orlik

Selection from Overlapping Classifications

Semantic classification utilizes structural and semantical properties of data rather than purely their numerical values for constructing classes of objects. In the process of semantic interpretation of data sets, we arrive in our project EXPLORA at a collection of possible descriptions of a given goal set. We propose here a procedure for selecting certain classes from this collection. The procedure chooses them by means of their quality and of a kind of similarity, usually unsymmetric, which we call affinity. The idea is to suppress a class if it is sufficiently similar to, but also inferior to an other class that is itself retained. Some examples illustrate the method and its effect on the results.

F. Gebhardt

On Cluster Methods for Qualitative Data II

The aim of this note is twofold. At first we present a uniquely determined best goodness criterion G for qualitative data. Then we propose an algorithm in order to obtain finally some appropriate classification with respect to G. This algorithm applies in particular to the case when a user is interested in classifications which satisfy some overlapping criterion.

G. Herden

A Regression Analytic Modification of Ward’s Method: A Contribution to the Relation between Cluster Analysis and Factor Analysis

A regression analytic modification of the minimum variance method (Ward’s method) is outlined. In the proposed method the within-cluster sums of squares are partitioned into the proportion accounted for by the cluster centers and the residual variation. The procedure consists of fusing the two clusters that minimize the residual variation not predicted by the centers. The method allows for a combination of clustering and factor analysis in order to determine the kind of properties that govern the relationships between the clusters.

S. Krolak-Schwerdt, P. Orlik, A. Kohler

The “Partition with a Structure” Concept in Biological Data Analysis

Let i = 1,..., N be some units (objects) and R = R₁,..., R_p be an arbitrary partition of the set of the units with classes R₁,..., R_p. Let K ⊂ 1,...,p2 be an association graph on the set of classes R₈, i.e. two classes R_s, R_t are called to be associated if s,t ∈ K. The pair M = (R,K) will be called a partition with a structure, or a macrostructure.

B. G. Mirkin

Classification with neural networks

In the fields of artificial intelligence, cognitive psychology, neurophysiology, and informatics in recent times neural networks have received a great deal of attention. Some general properties of these systems are discussed and exemplified in applications. The models used are a HOPFIELD-network and the BACKPROPAGATION learning algorithm. The latter is applied in the otological classification of persons regarding evoked otoacoustic emissions of normal or diseased ears, resp. The results show, that up to 71.1% are correctly classified. Classificatory abilities of neural networks, problems of preprocessing of spectral data and their analysis by backpropagation are discussed. Finally, there will be a short comparison between (higher order) associative memories and discriminant analysis.

A. Müller, J. Neumann

Statistical and probabilistic aspects of clustering and classifications

Multigraphs for the Uncovering and Testing of Structures

The main difficulty in deriving test statistics for testing hypotheses of the structure of a data set lies in finding a suitable mathematical definition of the term “homogeneity” or vice versa to define a mathematical model which “fits” to a real, but homogeneous, world. This model should be both realistic and mathematically tractable. Graph-theoretic cluster analysis provides the analyst with probability models from which tests for the hypothesis of homogeneity within a data set can be derived for many environments. Because of variations of the scale levels between the different attributes of the objects of a sample, it is better not to compute one single similarity between any pair of vertices but more — say t — similarities. The structure of a set of mixed data then can more appropriately be described by a superposition of t graphs, a so-called “completely labeled multigraphs”. This multigraph model also provides researchers with more sophisticated and flexible probability models to formulate and test different hypotheses of homogeneity within sets of mixed data. Three different probability models for completely labelled random multigraphs are developed, their asymptotical equivalence is shown, and their advantages when applied to testing the “randomness” of clusters found by single-linkage classification algorithms are discussed.

E. Godehardt

Estimators and Relative Efficiencies in Models of Overlapping Samples

In a model describing the situation of overlapping samples four unbiased estimators of the expectation of underlying random variables are examined based on different amount of information about the problem. Mallows, Vardi (1982) state an inequality for the variances of three estimators leading to a bound for relative efficiencies. In order to compare the estimators and to appraise the disadvantage that arises, if an auxiliary estimator is used instead of the optimal one, a similar inequality is given with respect to another triplet of estimators, and equality of any two estimators is characterized by means of column sums of certain matrices. Several examples are shown, and in special models of overlapping samples the results are applied, and relative efficiences are plotted as functions of problem parameters.

U. Kamps

Lower Bounds for the Tail Probabilities of the Scan Statistic

The scan statistic is used for testing the null hypothesis of uniformity against clustering alternatives. Berman & Eagleson (1985) derived an upper bound for the tail probabilities which was improved by Krauth (1988) and Glaz (1989). Glaz (1989) also derived a lower bound, based on a result of Kwerel (1975). This article presents lower bounds for the scan statistic that are easier to compute. They are proved by the method of indicators or by a linear programming approach.

J. Krauth

Poisson Approximation of Image Processes in Computer Tomography

We present estimations and asymptotic expansions for the total variation distance between the superposition of independent Bernoulli point processes and the Poisson point process with the same intensity measure. Special emphasis is given to the lattice case which arises in connection with the image reconstruction in computer tomography.

D. Pfeifer

Statistical, geometrical and algebraic methods for data analysis

Some Recent Developments in Linear Models: A Short Survey

The linear model is — in conjunction with the OLS estimation method — one of the most popular models for statistical analysis. First, the linear model is considered as a model generator for more realistic models such as generalized linear models and threshold models. Second, different kinds of misspecification of the linear model such as non-normal errors, general heteroscedasticity and errors correlated with the regressors are considered and some guidance to deal with such misspecifications is given. Third, the consistency of the parameter estimates is considered if the true dependent variable has been transformed in some unknown non-linear way or if the wrong error distribution has been chosen in limited dependent variable models. The results are illustrated with some limited Monte Carlo studies. Fourth, some implications of the results for sample design are discussed.

Gerhard Arminger

Causal Analysis in Marketing Research with LISREL or a Combination of Traditional Multivariate Methods?

There are two fundamentally different ways of performing a causal analysis: Simultaneous methods (e.g. LISREL) or the successive use of the “traditional” methods of factor analysis and regression analysis. In this paper, the investigation of both alternatives in a Monte Carlo study shows that there are advantages and disadvantages of both methods. However, when applied simultaneously on the same data, they may complement each other in an efficient way. Furthermore, in the study are compared the estimation methods in both cases, regarding the criteria robustness and quality of estimation.

Jochen Benz

Analysis of Data Measured on a Lattice

A data vector z is additively decomposed into several components by minimizing a sum of quadratic forms in each component. The data may be measured on an arbitrary lattice but the quadratic forms have to be constructed according to the chosen lattice (here mainly in IR₊). The decomposition method can be regarded as a spline approximation. The transformation of z to each component is linear, diagonalizable and its eigenvalues are contained in the unit interval. At first we introduce the concept of the detection of a structural change with this method. Then we concentrate on the analysis of two interdependent time series by the simultaneous decomposition of the two (coupled) series. For the application of the method the two series are best regarded as one datafield measured on a lattice with two directions, one direction representing the time and the other specifying the series. The concepts will be explained with data from a monitoring study in NRW on the influence of air pollution on vulnerable persons.

Ulrich Halekoh, Paul O. Degens

Dual Algorithms in Multidimensional Scaling

A basic problem in Multidimensional Scaling is to minimize the weighted sum of squared differences between given dissimilarities and distances over all Euclidian distance matrices. Existing algorithms solve this problem in a not quite satisfactory way. The present paper aims at the development of dual algorithms which are able to find the global minimum with a sufficient speed of convergence.

Rudolf Mathar

Comparison of Biplot Analysis and Formal Concept Analysis in the case of a Repertory Grid

We give a first comparison between the Principal Component Analysis PCA (“one of the oldest and best known techniques of multivariate analysis” cf. JOLLIFFE [6]) respectively of the analysis using biplots (GABRIEL [1], [2]) and an algebraic technique for the visualization of data, namely the Formal Concept Analysis FCA (WILLE [11]) by applying both methods to matrices, called Repertory Grids, which are the usual data form in many psychological investigations (SLATER [8]).

Norbert Spangenberg, Karl Erich Wolff

Convexity in Ordinal Data

Convexity is a leading idea in data analysis, although it is mostly involved on an informal level; in particular, convexity in ordinal data has not been elaborated as a well defined tool. This paper presents a first discussion of convexity definitions in connection with examples of ordinal data. One result is that there is more than one definition of ordinal convexity which is meaningful for data analysis. Convexity in multi-varied ordinal data is analysed by methods of formal concept analysis. Some relation to Euclidean convexity is outlined.

Selma Strahringer, Rudolf Wille

Classification and Seriation by Iterative Reordering of a Data Matrix

A heuristic algorithm is presented which searches for the reordering of rows and columns of a symmetric similarity matrix in order to fulfill, at least approximatively, the Robinson condition. The algorithm uses pairwise interchanges in constructive and iterative strategies. — A rectangular m×n matrix of two different sets of parameters can be treated by first converting or preprocessing the data into two square similarity matrices, each for rows and columns, before applying the above mentioned technique. The resulting orderings for rows and columns in the m×n matrix yields a pattern whose underlying structure can be interpreted by inspection. — Agglomerative hierarchical classification can be obtained after the rearrangement using only neighbouring objects (rows, columns). — A computer program has been implemented with a fast reordering algorithm and a graphical dendrogram presentation.1

Richard Streng

Data Analysis Based on a Conceptual File

The notion of a conceptual file is introduced as a new tool in data analysis; it allows an interactive procedure of conceptual analysis supporting in particular a flexible exploration of the data. The idea of a conceptual file is explained by an example from the science of international relations. A mathematical definition of a conceptual file is given in the frame of formal concept analysis. Finally, it is outlined how conceptual files might be implemented.

Frank Vogt, Cornelia Wachter, Rudolf Wille

Knowledge Organization, Data Bases, and Information Retrieval

Frontmatter

Modelling, representation and organization of conceptual knowledge

Decentralized Modelling of Data and Relationships in Enterprises

Modelling of data in enterprises is based on concepts like customer, supplier, order etc, and their fundamental relationships. Content and structure of these concepts must be standardised in order to support information flow between the different departements of a firm.Standardizaton is a very cumbersome process and results in a datamodell, which doe’s not reflect the special needs of decentralized business units. A decentralized datamodelling therefore will lead to similiar, but finally different concepts.In order to adapt the different contents and structures automatically, the differences must be known by machine. This may happen, if one restricts the modelling of data to a limited number of well known constructors, i.e. identification, generalization and spezification.

Hans Czap

A Contribution to the Examination of Semantic Relations between Lexemes

The problem of how to examine the structure of the lexical system (i.e. the word stock of a language) has often been the subject of linguistic investigation. Different properties of lexical units have been used in rather different methods to structure mostly relatively small lexical sub domains (cf. AGRICOLA, FLEISCHER, PROTZE (1969)); the properties considered — if they were considered as quantitative properties at all — were then examined at most on an ordinal scale.

R. Hammerl

A Mathematical Model for Conceptual Knowledge Systems

Objects, attributes, and concepts are basic notions of conceptal knowledge; they are linked by the following four basic relations : an object has an attribute, an object belongs to a concept, an attribute abstracts from a concept, and a concept is a subconcept of another concept. These structural elements are well mathematized in formal concept analysis. Therefore, conceptual knowledge systems can be mathematically modelled in the frame of formal concept analysis. How such modelling may be performed is indicated by an example of a conceptual knowledge system. The formal definition of the model finally clarifies in which ways representation, inference, acquisition, and communication of conceptual knowledge can be mathematically treated.

Peter Luksch, Rudolf Wille

Compositional Semantics and Concept Representation

Concept systems are not only used in the sciences, but also in secondary supporting fields, e. g. in libraries, in documentation, in terminology and increasingly also in knowledge representation. It is suggested that the development of concept systems be based on semantic analysis. Methodical steps are described. The principle of morpho-syntactic composition in semantics will serve as a theoretical basis for the suggested method. The implications and limitations of this principle will be demonstrated.

G. Rahmstorf

Data bases, expert systems, information retrieval, and library systems

Small and Beautiful ?

Some remarks on evaluating microcomputer based library systems

The paper serves as an introduction for the presentation of different microcomputer-based library systems and suggests selected criteria for their evaluation by librarians working with rather small amounts of data and not beeing actively integrated in the structures of regional bibliographic networks or of complex local applications. The issue of different system environments for library use is beeing discussed in order to indicate the limits of this undertaking. The criteria developped are: Capacity of generating (1), concepts of data-maintenance, data-access and procedures providing the homogenity of data (2), functions related to cataloguing and subject indexing of bibliographic material (3), retrieval-oriented functions (4), aspects of ergonomy and documentation (5), functions and interfaces for data-exchange (6). The aspects of generating capacity and of functions related to data-exchange are given particular attention.

S. Gradmann

A Tool for Validating PROLOG Programs

We analyze program data of a knowledge base encoded in Prolog. The aim is to classify the program clauses into rules and facts that are inconsistent, unfireable or unnecessary and those that pass without objections. Three complementary analysis steps are described and illustrated by small examples. We also discuss aspects of implementation and show how the original clauses have to be normalized in order to simplify processing. In the outlook it is indicated how the scope of the existing version will be expanded in the future.

R. Kiel, M. Schader

On the Database Component in the Knowledge-based System WIMDAS

The knowledge-based system WIMDAS uses a relational database (Oracle) for storing data and internal information. We describe its organization and how it is fitted into the system. In particular, we discuss the implementation of the interfaces between database and the other system components.

S. Marx, M. Schader

Information Retrieval Techniques in Rule-based Expert Systems

In rule-based expert systems knowledge is represented in an IF-THEN form: IF <set of conditions> THEN <decision>. A limited subset of natural language — supplemented by specified relations and operators — is used to formulate the rules. Rule syntax is simple. This makes it easy to acquire knowledge through an expert and permits plausibility checks on the knowledge base without the expert having knowledge of the implementation language or details of the system. A number of steps are used to select suitable rules during the rule-matching process. It is noteworthy that rules are well structured documents for an information retrieval system, particularly since the number of rules in a rule-based system remains manageable. In this paper it will be shown that this permits automatic processing of the rule set by methods of information retrieval (i.e. automatic indexing and automatic classification of rules, automatic thesaurus construction to the knowledge base) . A knowledge base which is processed and structured in this fashion allows the use of a complex application-specific search strategy and hence an efficient and effective realization of reasoning mechanisms.

Jiri Panyr

Object Databases and Thesauri for Small Museums

Twenty years of experience with computers in museums has led from the original “purely scientific” objectives to a more down-to-earth approach aimed at rationalizing repetitive procedures. Dealing with fragmentary or unclean data is of primary concern. In the “Small Museums” project useful techniques from the field of thesaurus applications are customized in daily practice for this purpose.

Christof Wolters

Terminology and classification

The Structure and Role of Specialized Information in Scientific and Technical Terminologies

In view of the increasing interdependence of Information and Documentation on the one hand and terminology work on the other, the author analyzes the relationship between information science and terminoloy science starting with the concept of “information” itself. Prom the semantic conception of information the concept of specialized information is derived which can be directly used in terminology science, where this type of information should be the starting point of any investigation or theoretical consideration. A terminological view of information and knowledge management would be a practical example.

G. Budin

Terminology Work in the World Health Organization EUROTERM Abbreviations

EUROTERM abbreviations is a database which includes abbreviations of organizations, institutes and universities as well as country codes. Full forms or abbreviations of titles are given in English, French, German and Russian, to the extent possible.

Sonja Hvalkof

HyperTerm

A Proposal for a User-friendly Termbank

How can termbanks be expanded towards a tool which supports the user in his/her daily work? Our idea is to combine some approaches which are quite common within hypertext systems with traditional termbank interfaces. Therefore we will introduce a multilingual terminological database, which was developed at the Fraunhofer Institute. To facilitate a discussion on the possibilities of implementing aspects of hypertext and termbanks, the basic ideas of hypertext will be explained in this paper. It is useful to investigate how users behave and proceed in retrieving information before developing a termbank interface, because this may lead to some new ideas for the termbank design. Taking existing termbank systems, the hypertext approach and human retrieval behaviour into account, I will show how the interface of a termbank could be expanded into a user-friendly termbank which I will call HyperTerm.

Renate Mayer

The Role of Classification in Terminology Documentation

Due to the rapid increase of terminological literature all over the world it became necessary to create appropriate tools for the bibliographic control of this relatively new subject field of its own right. Within the framework of content analysis of terminological literature classification activities have a predominant role. In order to facilitate and unify the classification of theoretical works in terminology, Infoterm established a new classification scheme called TCL (Terminology Classification) which covers also the neighbouring fields of this domain. Furthermore the classification of terminological vocabularies has again different requirements such as the availability of an abridged and condensed version of a universal classification scheme. Infoterm has also prepared a draft proposal for this purpose.

W. Nedobity

Applications and Methods for Special Subject Fields

Frontmatter

Classification, systematics, and evolution in biology

The Hierarchy of Organisms: Systematics and Classification in Biology

Traditionally, classifications in biology were based on typological judgement. Current classifications frequently maintain many groups and assume relationships which are still based on simple typology. Approaches of ‘transformed cladistics’ are close to advanced methods of numerical phenetics; decisions are not primarily based on evolutionary thinking. For such reasons, the method of phylogenetic systematics is regarded as the valid approach in biological classification. It includes tools for the reconstruction of the historical process of phylogenetic branching. In a secondary step, the results obtained are directly transposed into a classification (system) which qualifies as a scientific hypothesis. Phylogenetic systematics permits only one ‘correct’ classification as there has been only one course in phylogeny.

O. Kraus

Estimating Phylogenies with Invariant Functions of Data

Estimating phylogenies, or evolutionary trees, is a complex task even under the best of circumstances, and it encounters particular difficulties when using molecular data to investigate distantly related species. In recent years researchers have studied how methods to infer phylogenetic relations, such as those based on parsimony, behave for simple models of nucleic acid evolution. The results are not entirely encouraging: HENDY AND PENNY (1989), for example, illustrated simple cases under which parsimony will converge to an incorrect phylogenetic tree, even for equal rates of evolution. What is encouraging, however, is that researchers are beginning to develop methods of estimating phylogenies which may be robust under conditions where parsimony is not. A strategy shared by some of these methods (CAVENDER AND FELSENSTEIN (1987), LAKE (1987a)) is to use invariant functions of the data to identify the correct topology of the corresponding phylogeny. But which invariants, and how? What assumptions underlie these approaches? I discuss these issues and indicate the direction this research seems to be taking.

William H. E. Day

Statistical Analysis of Genetic Distance Data

Homology between biological objects (DNA sequences, species, etc.) can be measured by genetic distance data. A genetic distance may be computed from aligned genetic sequence data; e.g. DNA sequences. We discuss the dot-matrix plot as a possible graphical check of the goodness of the alignment. The assumption of identical distributions along the sequence positions is often inappropriate. Therefore, we discuss aspects of an heuristic which allows the combined exploration of genetic distance between the sequences and of different positional variation. A tree structure is not assumed for such an exploration. Having computed a genetic distance, phylogenetic relations may be analysed by three- and four-objects methods. The approach is illustrated by a set of tRNA sequences.

B. Lausen

Variance Estimation in the Additive Tree Model

By the use of stochastic models it is possible to judge procedures for fitting additive trees to dissimilarity data. We use the simple additive error model (Degens 1983) to analyse the accuracy of an estimated additive tree by estimating its variance, too. Analogously to the three-object variance estimator in the ultrametric case (cf. Lausen 1987 or Lausen & Degens 1986) we propose a four-object variance estimator based on the simple maximum-likelihood (ML-) variance estimation for all subsets consisting of any four objects of an additive tree. In contrast to variance estimation using the residual sum of squares this new estimator is not based on the assumed i.e. estimated structure of the given dissimilarity data. In the framework of a Monte-Carlo study we analyse the four-object variance estimator and compare it to variance estimators based on linear models in the case of local solutions of the underlying approximation problem (cf. Vach 1988).

K. Wolf, P. O. Degens

Classification and documentation in medicine

Semi-automated Classification of Medical Phrases, using a Personal Computer

In a basic documentation for the Childrens’ Hospital of the University of Münster medical phrases to be coded are mapped on a subset of items of a thesaurus. The most similar items are then presented to the encoding person to decide which item, if any, is semantically equivalent to the new medical phrase. New phrases lead to an insert of the new item and its code into the thesaurus. The mapping algorithm is based on formal criteria (common substrings), allowing spelling errors and various wordings. It is independent of the thesaurus and may thus be applied to other thesauri and code tables. The generation of needed help files is easy and fast to perform.

Rudolf-Josef Fischer

Structure of Informations on Medical Trials

Structures of concepts can be described by Boolean lattices, factorial structures of attributes by splitting lattices. Generally the splitting lattices are neither distributive nor complementary. A generalization of the calculus in lattices of quantity systems enables the introduction of hierarchical structures. The relation between factorial and hierarchical attribute structures is shown. Formal hierarchical structures can be represented by semantic structures and vice versa. Often semantic structures only seem to be factorial or hierarchical. So it is necessary to differentiate strictly between formal and semantic structures.

Ekhard Hultsch

Recent Problems and Longterm Activities in the Classification of Medical Concepts

Since 1986 all of the 3000 West German hospitals have to encode the main diagnoses of their yearly 11 Mio inpatient cases according to the ICD-9, giving rise to severe problems concerning the standardization of the medical nomenclature and the correctness of the encoding. These recent problems and some fundamental tasks of standardization should be solved by implementing an official classification center for German medical concepts. This center should be responsible for the coordination of semantical classifications, updating classifications and thesauri, computer-assisted encoding and especially for converting systems from one medical classification to another. For other languages such centers already do exist and a worldwide cooperation of these centers is to be established. Modern methods of medical informatics such as knowledge engineering and computer linguistics are to be utilized in order to solve the problems of classification of medical concepts.

R. Klar

Exploring Three-Dimensional Image Data with Classification Methods

An optical 3-D measurement method (moiré) was applied to investigate posture and postural changes of the human body. Surface coordinates were recorded in 14 individuals during singing at anatomical landmarks and for given x, y, or z values. Hierarchical classification was performed with sets of coordinates from different body regions and for combined data sets: with euclidean distances calculated from difference matrices between extreme positions, and complete linkage, classifications could be obtained that fitted well to an independent grading of vocal performance. The procedure can be extended to other tasks of medical image interpretation and can be recommended for explorative analysis of 3-D data.

H. Kurz, O. Leder

Data analysis in the archeological and historical sciences

The Reconstruction of “Genetic Kinship” in Prehistoric Burial Complexes — Problems and Statistics

Establishment of kinship in burial complexes has been a desideratum for decades. In recent years, paleoanthropologists have attempted to conduct kinship analyses primarily by means of epigentic variants of the skull. As with other anthropological questions, teeth and maxillary bones seem well suited for the reconstruction of genetic kinship in skeletal material. The authors analyse incidence and distribution of hypodontia in a burial complex and from the distribution of the trait deduce possible genetic relations of it’s bearers. Statistical analysis was applied to test whether the distribution of the trait was random or whether it supports the hypothesis of burial in family-groups.

Kurt W. Alt, Werner Vach

An Approach to a Formal Statistical Analysis of Historical Data based on the Town of Bamberg

The following article presents an example for the application of statistical methods in the analysis of historical sources. In the course of a research into the labour migration of craftsmen in the late Middle Ages and the Early Modern Times, the Bamberg borough council’s “Handwerksgesellenregister” (journeymen’s register) had been included. Among others, one purpose of the research was to determine the connexion between the duration of employment and the number of persons being employed within the given period of time. The Yassouridis & Hansert concordance-coefficient is supposed to measure the degree of the above mentioned characteristic’s coherence.

R. S. Elkar, R. Huthsteiner, R. Ostermann

Automatic Syntax Analysis of Meroitic Funeral Inscriptions

This paper discusses some problems of automatic sentence analysis (parsing), applied to the Meroitic language, as well as the development of an appropriate algorithm and its programming in BASIC and LOGO. The usefulness of this approach for linguistic work is examined.

F. Hintze

Application of Computers in Historical-Topographical Research: A Database for Travel Reports on Greece (18th and 19th Century)

This paper deals with the application of computers in historical-topographical research. In particular we will present the project “Historische Landeskunde des antiken Griechenland” (HiLanG). This project, financally supported by the German Research Foundation (DFG) is undertaken by the Departments of Ancient History at the Universities of Freiburg and Münster. We will develop our paper in two steps. First we will explain the scientific aim of the project, then we will give an example of our researches.

Matthias Kopp, Daniel Strauch, Christian Wacker

The Use of Multivariate Statistics in Scandinavian Archaeology

The use of statistics in Scandinavian archaeology started after the second World War only, and it has been slow to penetrate beyond a very basic level. To day, where complex nmltivariate techniques are within easy reach of all who care, the proper use of statistics in archaeology has become a pertinent problem. How does the use of statistics fit into the archaeological research process, and what are the demands on the archaeologists who apply complex multivariate techniques in their research. Experiences with multivariate statistics in Scandinavian archaeology in recent years has helped to clarify the archaeological research process, and been very informative on the use of multivariate analyses in archaeology.

Torsten Madsen

The Application of Correspondence Analysis: Some Examples in Archaeology

The utility of the Correspondence Analysis (CA) for chronological purposes is discussed on a basis of archaeological material, i.e. female graves with jewellery, and jewellery with cast ornaments. In connection with the female graves theoretical and interpretational problems are also discussed.

Karen Høilund Nielsen

An Analysis of Beads Found in the Merovingian Cemetery of Weingarten

Correspondence Analysis was applied to beads of 42 different types in 101 women graves of the Merovingian cemetery of Weingarten, Southwest Germany. It was possible to subdivide the sample into six type groups and seven type combination groups. The results is in accordance with that for other finds.

Claudia Theune-Vogt

Classification in industry: Coding systems and commodity description

Bank Code Numbers as Defining Arguments and Controlling Tools in Automated Payments

1.1 The Bank code number is a numbering device for offices of credit institutions which are domiciled in the Federal Republic of Germany and Berlin (West) and which effect payment transactions. The legal basis for the introduction of the bank code number is an agreement, which was concluded in 1970 after many years of discussion, between the Deutsche Bundesbank and the central associations of the banking industry.

H.-J. Friederich, J. Rieck

From Commodity Description to Expert Systems

In recent years the working group on Commodity Classification, of the German Society of Classification (GfKl) has thoroughly investigated the conditions and problems of commodity description and classification and examined these subjects under the most diverse aspects and requirements. This has led to a widening of our horizons so that the limitations seem to disappear gradually.

Josef Hölzl

Tabular Layouts of Article Characteristics and Formal Concept Analysis

The German word “Sackmerkmal” (article characteristics) and “Sachmerkmalleiste (SML)” (tabular layout of article characteristics) became well known by the activities of the DIN Standardization Committee. The related systematics are mainly designed for the retrieval of single objects in a large set of “technical items” of many different types. In the framework of document retrieval, these terms correspond to “descriptor” and “thesaurus”, respectively. Till now, SML’s have been developed for about 300 groups of articles in the standard DIN 4000. An investigation of the structural principles can be found in [3] where the relationship to formal concept analysis (developed by R. Wille in [1]) is discussed. In the same lines, this investigation is continued in the following paper in order to interprete the SML method as an application of formal concept analysis.

Franz Meinl

The Postcode, a Local and Routing Code for the Transport of Mail Items

As early as last century code numbers were used for the purpose of identifying post towns. In 1853 it was for the first time that the Taxis Postal Administration introduced circular stamps with code numbers which were based on a specific system of numbering the individual post towns [1]. Organizational measure and the addition of new postal establishments soon called for the extension of the stamp code numbers so that, as a result, the original concept was no longer recognizable after a while. However, this concept began to fall into oblivion when the Confederation of the Rhine was established which meant the end of the Reichspost (postal administration of the German Reich), and was forgotten once and for all when Prussia took over the Taxis Postal System on 1 July 1867.

Helmut Oppermann

Backmatter

Titel: Classification, Data Analysis, and Knowledge Organization
herausgegeben von: Professor Dr. Hans-Hermann Bock
Professor Dr. Peter Ihm
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-76307-6
Print ISBN: 978-3-540-53483-9
DOI: https://doi.org/10.1007/978-3-642-76307-6

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Mathematical and Statistical Methods for Classification and Data Analysis

Frontmatter

Classification and clustering methods

An Agglomerative Method for Two-Mode Hierarchical Clustering

Selection from Overlapping Classifications

On Cluster Methods for Qualitative Data II

A Regression Analytic Modification of Ward’s Method: A Contribution to the Relation between Cluster Analysis and Factor Analysis

The “Partition with a Structure” Concept in Biological Data Analysis

Classification with neural networks

Statistical and probabilistic aspects of clustering and classifications

Multigraphs for the Uncovering and Testing of Structures

Estimators and Relative Efficiencies in Models of Overlapping Samples

Lower Bounds for the Tail Probabilities of the Scan Statistic

Poisson Approximation of Image Processes in Computer Tomography

Statistical, geometrical and algebraic methods for data analysis

Some Recent Developments in Linear Models: A Short Survey

Causal Analysis in Marketing Research with LISREL or a Combination of Traditional Multivariate Methods?

Analysis of Data Measured on a Lattice

Dual Algorithms in Multidimensional Scaling

Comparison of Biplot Analysis and Formal Concept Analysis in the case of a Repertory Grid

Convexity in Ordinal Data

Classification and Seriation by Iterative Reordering of a Data Matrix

Data Analysis Based on a Conceptual File

Knowledge Organization, Data Bases, and Information Retrieval

Frontmatter

Modelling, representation and organization of conceptual knowledge

Decentralized Modelling of Data and Relationships in Enterprises

A Contribution to the Examination of Semantic Relations between Lexemes

A Mathematical Model for Conceptual Knowledge Systems

Compositional Semantics and Concept Representation

Data bases, expert systems, information retrieval, and library systems

Small and Beautiful ?

A Tool for Validating PROLOG Programs

On the Database Component in the Knowledge-based System WIMDAS

Information Retrieval Techniques in Rule-based Expert Systems

Object Databases and Thesauri for Small Museums

Terminology and classification

The Structure and Role of Specialized Information in Scientific and Technical Terminologies

Terminology Work in the World Health Organization EUROTERM Abbreviations

HyperTerm

The Role of Classification in Terminology Documentation

Applications and Methods for Special Subject Fields

Frontmatter

Classification, systematics, and evolution in biology

The Hierarchy of Organisms: Systematics and Classification in Biology

Estimating Phylogenies with Invariant Functions of Data

Statistical Analysis of Genetic Distance Data

Variance Estimation in the Additive Tree Model

Classification and documentation in medicine

Semi-automated Classification of Medical Phrases, using a Personal Computer

Structure of Informations on Medical Trials

Recent Problems and Longterm Activities in the Classification of Medical Concepts

Exploring Three-Dimensional Image Data with Classification Methods

Data analysis in the archeological and historical sciences

The Reconstruction of “Genetic Kinship” in Prehistoric Burial Complexes — Problems and Statistics

An Approach to a Formal Statistical Analysis of Historical Data based on the Town of Bamberg

Automatic Syntax Analysis of Meroitic Funeral Inscriptions

Application of Computers in Historical-Topographical Research: A Database for Travel Reports on Greece (18th and 19th Century)

The Use of Multivariate Statistics in Scandinavian Archaeology

The Application of Correspondence Analysis: Some Examples in Archaeology

An Analysis of Beads Found in the Merovingian Cemetery of Weingarten

Classification in industry: Coding systems and commodity description

Bank Code Numbers as Defining Arguments and Controlling Tools in Automated Payments

From Commodity Description to Expert Systems

Tabular Layouts of Article Characteristics and Formal Concept Analysis

The Postcode, a Local and Routing Code for the Transport of Mail Items

Backmatter