main-content

## Über dieses Buch

Numerical linear algebra, digital signal processing, and parallel algorithms are three disciplines with a great deal of activity in the last few years. The interaction between them has been growing to a level that merits an Advanced Study Institute dedicated to the three areas together. This volume gives an account of the main results in this interdisciplinary field. The following topics emerged as major themes of the meeting: - Singular value and eigenvalue decompositions, including applications, - Toeplitz matrices, including special algorithms and architectures, - Recursive least squares in linear algebra, digital signal processing and control, - Updating and downdating techniques in linear algebra and signal processing, - Stability and sensitivity analysis of special recursive least squares problems, - Special architectures for linear algebra and signal processing. This book contains tutorials on these topics given by leading scientists in each of the three areas. A consider- able number of new research results are presented in contributed papers. The tutorials and papers will be of value to anyone interested in the three disciplines.

## Inhaltsverzeichnis

### The Family of Fast Least Squares Algorithms for Adaptive Filtering

Three representative algorithms from the family of Fast Least Squares (FLS) algorithms are reviewed. They correspond to the transversal FIR filter, the lattice-ladder structure and the rotation approach respectively. They all solve the least squares problem recursively and are convenient for adaptive filtering.They lead to three different levels of computational complexity and robustness to round-off errors. Each of them also has special implementation aspects and provides a particular set of signal parameters.Overall, these three types of algorithms offer a wide and welcome choice to the adaptive filter designer.

M. G. Bellanger

The thinking behind adaptive control is to combine the processes of system identification, to gain a system model for an unknown plant, and of feedback control design for a known plant model. This yields a design-while-identify control strategy otherwise known as self-tuning or self-adjusting. To be naive one would suppose that a good adaptive control procedure is able to be manufactured by cobbling together a good parameter identifier and a good control law, and all could be expected to perform. This is partially true in that a successful adaptive control law must incorporate both of these features, but this may not be sufficient for good performance. A third feature needs to be taken into account which addresses the marrying of the identification objective with the control objective through the design or selection of adaptation speed, model structure, signal conditioning, control design rule, reference signal properties, speed of plant variation etc.The aim of this work is to present and develop a view of the salient aspects of adaptive control algorithms with the intent of encapsulating their dominant features and explaining the reasons for their specific numerically desirable properties. We shall concentrate upon the development of a coherent standpoint from which the practical aspects of adaptive controller design emerge.

### Error Analysis of Least Squares Algorithms

A finite algebraic algorithm starts with a set of data d1,...,d r , from which it computes via fundamental arithmetic operations a solution fl,...,f t . In forward error analysis one attempts to bound $$\left| {{{\bar f}_j} - {f_j}} \right|$$ , where $${\bar f_j}$$ denotes the computed element In backward error analysis, pioneered by J.H. Wilkinson in the late fifties, one attempts to determine a modified set of data $${\bar d_i}$$ such that the computed solution $${\bar f_j}$$ is the exact solution. When it applies it tends to be very markedly superior to forward analysis. To yield error bounds for the solution, the backward error analysis has to be complemented with a perturbation analysis, which naturally leads to the concept of condition number of a problem.There are several possible definitions of the stability of an algorithm related to different types of error analysis. The concepts of forward and backward stability and of weak and strong stability are discussed.Many of the common problems in signal processing can be formulated as solutions to (a sequence of) linear least squares problems of the form min x ∥Xw − y∥2. We review the perturbation theory of such problems and discuss methods for the estimation of the corresponding condition numbers. We survey stability results for the method of normal equations and methods based on orthogonal reductions.Very often it is required to recursively recalculate the solution x when equations are successively added to and/or deleted from the least squares problem. Many different algorithms have been proposed to effectuate this. Most of these involve updating or downdating the Cholesky factor R of ATA. This can be achieved using orthogonal and hyperbolic transformations. The numerical stability of such recursive algorithms is not yet completely analyzed. A new method, using iterative refinement, is suggested as a means of increasing the reliability of downdating algorithms

Åke Björck

### Parallel Algorithms for Toeplitz Systems

We describe some parallel algorithms for the solution of Toeplitz linear systems and Toeplitz least squares problems. First we consider the parallel implementation of the Bareiss algorithm (which is based on the classical Schur algorithm). The alternative Levinson algorithm is less suited to parallel implementation because it involves inner products.The Bareiss algorithm computes the LU factorization of the Toeplitz matrix T without pivoting, so can be unstable. For this reason, and also for the application to least squares problems, it is natural to consider algorithms for the QR factorization of T. The first O(n2) serial algorithm for this problem was given by Sweet, but Sweet’s algorithm seems difficult to implement in parallel. Also, despite the fact that it computes an orthogonal factorization of T, Sweet’s algorithm can be numerically unstable.We describe an algorithm of Bojanczyk, Brent and de bog for the QR factorization problem, and show that it is suitable for parallel implementation. This algorithm overcomes some (but not all) of the numerical difficulties of Sweet’s algorithm. We briefly compare some other algorithms, such as the “lattice” algorithm of Cybenko and the “generalized Schur” algorithm of Chun, Kailath and Lev-Ari.

Richard P. Brent

### Parallel Algorithms for Digital Signal Processing

This paper provides an introduction to some parallel algorithms relevant to digital signal processing. First we introduce some basic concepts such as speedup and efficiency of parallel algorithms We also outline some practical parallel computer architectures — pipelined, SIMD and MIMD machines, hypercubes and systolic arrays.

Richard P. Brent

### An Introduction to the Class of Split Levinson Algorithms

The split Levinson algorithms constitute a new class of efficient procedures to solve a linear system of equations exhibiting the positive definite Toeplitz structure. They can be derived from the classical Levinson algorithm by a kind of splitting operation, which results in a more efficient algorithm processing some well-defined symmetric polynomials (with complex coefficients). The algorithm thus obtained is based on a simple two-step recurrence relation satisfied by these polynomials. The paper provides a self-contained introduction to the whole class of split Levinson algorithms, including a detailed technical derivation. This class is shown to depend on two unit modulus parameters, which can be chosen at will by the user.

P. Delsarte, Y. Genin

### On the Split Approach Based Algorithms for DSP Problems

This paper shows that the “split approach” produces new efficient algorithms to solve not only the Toeplitz linear prediction problem, as explained in a companion paper, but also a collection of other important DSP problems belonging to the Toeplitz environment. It emphasizes both the algorithmic procedures themselves, and their connections with standard mathematical subjects such as the theory of orthogonal polynomials and the theory of positive functions.

P. Delsarte, Y. Genin

### Updating and Downdating of Orthogonal Polynomials with Data Fitting Applications

We derive and assess new methods for updating and downdating least squares polynomial fits to discrete data using polynomials orthogonal on all the data points being used. Rather than fixing on one basis throughout, the methods adaptively update and downdate both the least squares fit and the polynomial basis. This is achieved by performing similarity transformations on the tridiagonal Jacobi matrices representing the basis. Although downdating is potentially unstable, experimental results show that the methods give satisfactory results for low degree fits. The most economical of the algorithms implementing the methods needs 14n+ O(1) flops and 2n square roots to update a fit of order n.

Sylvan Elhay, Gene H. Golub, Jaroslav Kautsky

### Parallel Algorithms for Singular Value Problems

We look at algorithms for computing the singular value decomposition (SVD) and related problems, that are suitable for parallel environments.

Sven Hammarling

### Some Remarks on the Generalised Bareiss and Levinson Algorithms

The Bareiss (or Schur) and Levinson algorithms are the most popular algorithms for solving linear systems with dense n × n Toeplitz coefficient matrix in O(n2) arithmetic operations. Both algorithms have been generalised to solve linear systems whose n × n coefficient matrices A are not necessarily Toeplitz (in O(n3) operations). We show in this paper that the generalised Levinson algorithm is a direct consequence of the generalised Bareiss algorithm, thereby considerably simplifying its presentation in comparison to previous work.

Ilse Ipsen

### Generalized Displacement Structure for Block-Toeplitz, Toeplitz-block, and Toeplitz-derived Matrices

The concept of displacement structure has been used to solve several problems connected with Toeplitz matrices and with matrices obtained in some way from Toeplitz matrices (e.g. by combinations of multiplication, inversion and factorization). Matrices of the latter type will be called Toeplitz-derived (or Toeplitz-like). In this paper we shall introduce a generalized definition of displacement for block-Toeplitz and Toeplitz-block matrices. It will turn out that Toeplitz-derived matrices are perhaps best regarded as particular Schur complements obtained from suitably defined block matrices. The displacement structure will be used to obtain a generalized Schur algorithm for the fast triangular and orthogonal factorizations of all such matrices, and well structured fast solutions of the corresponding exact and overdetermined systems of linear equations.

J. Chun, T. Kailath

### Fault Tolerant Recursive Least Squares Minimization

Existing fault tolerance schemes have often been ignored by systolic array designers because they are too costly and unwieldy to implement. With this in mind, we have developed a new technique specially tailored for recursive least squares minimization that emphasizes simplicity. We propose a new decoding scheme that allows for error detection while wasting no precious processor cycles and preserving the basic structure of the systolic array. We will show that errors can be detected by examining a single scalar The technique can be implemented with negligible algorithmic modification and little additional hardware. The simplicity of our method invites its use in future systolic arrays.

Franklin T. Luk

### A Systolic Array for Recursive Least Squares Minimisation and Adaptive Beamforming

A systolic/wavefront array for linearly constrained recursive least squares minimisation is described in the context of adaptive antenna array beamforming. The architectural and numerical advantages of this signal processing technique are discussed.

J. G. McWhirter

### Parallel Algorithms for Supercomputers

In this paper, we describe some of today supercomputers and we give examples of scientific applications that can be efficiently solved on these powerful machines. First, we give a classification of different architectures and we illustrate this description with some specific machines. Then, we discuss the basic issues about vector and parallel computation with examples of performance evaluations. The second part of the paper gives examples of vector and parallel algorithms. We concentrate on solving large linear systems with a parallel fast solver for the Poisson equation and the incomplete Choleski decomposition for the conjugate gradient method.

Gérard Meurant

### Updating Techniques in Parallel Computation

A survey is given of recent techniques for updating matrix factorizations. A review is given for updating several standard matrix factorizations and eigen-decompositionsparallel decomposition of linear systems based on lowrank modificationsparallel Algorithms for eigen-value problems based on divide and conquer strategies.

D. C. Sorensen

### On the Use of the Singular Value Decomposition in Identification and Signal Processing

In recent years, the ‘ordinary’ singular value decomposition (OSVD) and its generalizations, have become extremely valuable instruments in the analysis and the solution of problems in mathematical engineering. In most applications, the OSVD provides a unifying framework, in which the conceptual formulation of the problem, the practical application and an explicit solution that is guaranteed to be numerically robust, are derived at once. In this way, the OSVD has become a fundamental tool for the formulation and derivation of new concepts such as angles between subspaces, oriented signal-to-signal ratio’s, canonical correlation analysis,... and for the reliable computation of the solutions to problems such as total linear least squares, realization and identification of linear state space models, source separation by subspace methods etc.

Joos Vandewalle, Bart De Moor

### Structured Linear Algebra Problems in Digital Signal Processing

In this paper we give a survey of a number of linear algebra problems occurring in digital signal processing, where the structure of the matrices involved is crucial. Although the problems one wants to solve for these matrices are rather classical, one can not make use anymore here of standard linear algebra tools, since the structure of the matrices has to be taken into account. We discuss in this paper some of these problems and show how structure affects the sensitivity of the problem at hand and how algorithms should be adapted in order to cope with the structure constraint.

Paul M. Van Dooren

### Constructing a Unitary Hessenberg Matrix from Spectral Data

We consider the numerical construction of a unitary Hessenberg matrix from spectral data using an inverse QR algorithm. Any upper unitary Hessenberg matrix H with nonnegative subdiagonal elements can be represented by 2n − 1 real parameters. This representation, which we refer to as the Schur parametrization of H, facilitates the development of efficient algorithms for this class of matrices. We show that a unitary upper Hessenberg matrix H with positive subdiagonal elements is determined by its eigenvalues and the eigenvalues of a rank-one unitary perturbation of H. The eigenvalues of the perturbation strictly interlace the eigenvalues of H on the unit circle.

Gregory Ammar, William Gragg, Lothar Reichel

### Parallel Computation of the Generalized Singular Value Decomposition

The generalized singular value decomposition (GSVD) is the simultaneous reduction of any two matrices having the same number of columns to diagonal forms by premultiplying by two different orthogonal matrices and postmultiplying by the same nonsingular matrix. The recent advent of real time signal processing and other problems have given impetus to the development of the parallel computation of the GSVD [3,7]. Brent et al. [2] have developed a parallel algorithm on systolic array. However, their approach requires the use of different configurations of systolic arrays for singular value decomposition (SVD), QR decomposition and matrix-matrix multiplication. In addition to some difficulties in numerical treatment, how to avoid costly interfacing between these arrays and devise a single array that is capable of performing all these basic matrix computations is still an open problem.

Zhaojun Bai

### Correcting Interface Errors Arising from Segmentation in a Parallel Iterative Algorithm

Several parallel iterative strategies based on domain segmentation were previously developed by the authors. While each is effective, the error distribution after every global iteration is not uniform within each segment, the error being greater at the extremities. A correction to the solution is developed at the interfaces between the disjoint segments which accelerates convergence.

J. M. Barry, J. P. Pollard, E. L. Wachspress

### Multidimensional Internally Lossless Filters of Fully Recursive Half-Plane Type

Recently, motivated by needs for parallel processing of 2 — D signals a scheme known as the fully recursive half-plane scheme has been proposed, and a method of designing transfer functions of filters having this recursive structure has been outlined in [1].

Sankar Basu

### Rank-One Extensions of the Generalized Hermitian Eigenvalue Problem for Adaptive High Resolution Array Processing

The determination of number, location, and movement of radiating sources using passive arrays of sensors is an important problem in radar imaging, medical imaging, and seismology. High resolution array processing, methods utilize spectral information contained in the observed signal correlation matrix in order to decompose incoming signals into signal and noise components. This allows extraction of various characteristic parameters such as source intensity and direction of arrival. Treatment of nonstationary signals in a colored noise environment necessitate the, analysis of generalized Hermitian pencils. In order to conserve computational resources and permit real time processing of information, one may wish to process these matrix pencils adaptively only up to the smallest order necessary for full estimation of the desired signal parameters.

C. Beattie, A. Beex, M. Fargues

### A Hybrid Scheme for the Singular Value Decomposition on a Multiprocessor

We present a hybrid multiprocessor scheme for determining clusters of singular values and corresponding singular vectors of rectangular matrices in which the number of rows is substantially larger or smaller than the number of columns. In this scheme, we perform an initial orthogonal factorization on the tall matrix (either A or AT) using a multiprocessor block Householder algorithm. A parallel one-sided Jacobi method is then applied to the resulting upper triangular R to reduce the matrix S = RTR to block diagonal form. This is followed by applying either a one-or two-sided Jacobi method to obtain the eigenvalues of these diagonal submatrices (which are the clusters of singular values of A) in parallel. These hybrid methods are well suited for the hierarchical memory architectures of the Alliant FX/8, CEDAR, and CRAY 2. Some initial results on the Alliant FX/8, CRAY X-MP/416, and CRAY 2 are presented along with a comparison in the performance of the classical bi-diagonalization technique used in EISPACK and UNPACK.

Michael Berry, Ahmed Sameh

### The Weak Stability of Algorithms for Matrix Computations

The weak stability of an algorithm (a small relative error in the computed solution for every well-conditioned system) is shown to be equivalent to a small relative residual for every well-conditioned system or to the solution of a nearby problem (in the relative sense) for every well-conditioned system.

James R. Bunch

### Parallelism in Dynamic Programming and Control

Dynamic programming using parallel processing techniques is discussed for optimal control. An implementation on a CRAY-2 for optimizing the station acquisition of geostationary satellites is mentioned.

J. L. Calvet, J. Dantas de Melo, J. M. Garcia

### Design Issues for Parallel Matrix Algorithms

It is well known that parallelism by itself does not lead to higher speeds. For example, communication or I/O overheads can lead to situations where adding processors actually increases total execution time. It is important, therefore, to understand and to classify properly the factors that affect the optimal design and implementation of parallel algorithms In order to make our discussion meaningful and specific, we illustrate these factors and trade-offs for: a given class of problems (matrix algorithms)a particular architecture type (distributed-memory, message-passing MIMD hypercube)a particular machine (NCUBE)We conclude that analytic performance modeling and optimal design of algorithms are usually possible only for simple matrix algorithms and under simplifying assumptions concerning implementation and machine dependent factors. It does not seem feasible or practical to develop exact (analytic) performance models for algorithms that take into account all problem-dependent, architecture-dependent and implementation (machine) — dependent factors. Therefore, for practical purposes, it is important to develop approximate performance models for specific parallel machines(s), and to verify them experimentally. INDEX TERMS: communication overhead, hypercube, load balancing, mapping, NCUBE, parallel matrix algorithms, partitioning, performance evaluation.

### Fast Computation of a Restricted Subset of Eigenpairs of a Varying Hermitian Matrix

Standard algorithms require an order of dm2 flops to compute d eigenpairs of a m × m semi-definite positive hermitian matrix, R. A larger amount of computations (of same order) is required to find d left singular pairs of a square matrix, D, such that R = DDH; this is known to be numerically more stable. Note that D is not unique, the best choice in the sequel turns out not to be the Cholesky factor of R. The algorithm proposed in this contribution will compute d left singular pairs of D in a faster way, such that it requires an order of αmd2 flops. This algorithm takes advantage of the fact that the matrix considered is nothing else but a matrix whose d eigenpairs are known, perturbed by a rank-one modification. It is pointed out in the sequel why the technique of Bunch et al [1] is not relevant for O(md2) algorithms, neither for square-root implementations.

Pierre Comon

### Parallelization of Toeplitz Solvers

We shall describe parallel algorithms for solving Toeplitz and block Toeplitz systems, based on the Levinson type methods implemented in serial in the Toeplitz Package (by Arushanian et al. [1]). A matrix A, with blocks A i , j , has block Toeplitz structure if the blocks satisfy A i , j = A j i for all i, j such that 0 ≤ i, j ≤ n − 1 and each block is a general matrix. The implementation and performance of these algorithms on MIMD shared memory machines will be discussed. It will be shown that the contribution of the inner product calculations to the time complexity is negligible for typical applications.

E. de Doncker, J. Kapenga

### Parallel Gaussian Elimination, iPSC/2 Hypercube versus a Transputer Network

Gaussian elimination with partial pivoting can be implemented in several ways on a multitude of parallel hardware. In this note we give a comparison of implementations of the algorithm for banded matrices on an iPSC/2 hypercube and on a square grid of Transputers. For both implementations a scattered square allocation scheme is used. The comparison shows that machine characteristics strongly influence the attainable efficiency.

L. Beernaert, D. Roose, S. Van Praet, P. de Groen

### Snapshots of Mobile Jacobi

We wish to document the short but dramatic life of a special matrix undergoing the Jacobi diagonalization procedure to compute its eigenvalues. The combination of a massively parallel supercomputer with a real time graphics display allows us to understand algorithms in entirely new ways, giving insight that can be surprising as well as beautiful. In this note, we would like to give one illustration. We regret that the printed medium only allows for snapshots; the video is far more dramatic.

Alan Edelman

### Computing Generalized Canonical Correlations

In this paper we consider the computation of generalized canonical correlations (gcc), a useful tool for linear prediction. The problem is reduced to a singular value decomposition of a product of three rank deficient matrices, for which a new algorithm is presented. It is then shown how the gcc may be computed directly from the data matrices.

L. Magnus Ewerbring, Franklin T. Luk

### Introduction to High Resolution Array Spectrum Estimation

The basic principles of high resolution spectrum estimation using the SVD is described. The paper starts with a simple signal model for a linear array and explains why simpler signal processing tools are not powerful enough for many of the tasks which are required from systems. One particular SVD technique, the MUSIC algorithm, is described and some of the limitations on performance are described.

D. R. Farrier

### Modifications of the Normal Equations Method that are Numerically Stable

A method of solving min ∥ b − Ax ∥ where A is an m × n matrix is presented which appears to be as stable as methods based on orthogonal factorization but in certain cases is nearly as efficient as methods based on factoring the normal equations. Iterative refinement is used in the new method. Also a class of methods is described that in terms of efficiency (for m ≥ 2n) and, apparently, numerical stability is intermediate between orthogonalization methods and normal equations methods. The class can be considered to be a block implementation of the modified Gram-Schmidt algorithm.

Leslie Foster

### Block Elimination with Iterative Refinement for Bordered Linear Systems

We describe (without proofs) some recent theory, confirmed by numerical experiments, which shows that Block Elimination with Iterative Refinement is an easily programmed and usually robust method for solving bordered linear systems with a border of width greater than 1. We compare it for accuracy, efficiency and convenience with the Generalized Deflated Block Elimination methods of Chan and Resasco.

W. Govaerts, J. D. Pryce

### The Efficient Calculation of the Eigenvalues, Eigenvectors and Inverses of a Special Class of Brownian Matrices

A Brownian matrix is defined by having all the elements in any row, to the right of the principle diagonal, equal, and all elements in any column, below the principal diagonal, equal. These matrices occur in signal processing problems, and it has been shown that the solution of Brownian systems of linear equations can be solved in 0(n) flops. The inverse of a nonsingular Brownian matrix can be found in 0(n2) flops.In this paper we consider a special class of Brownian matrices, which form a ring, and show that both the normal inverse, when nonsingularity applies, and some generalized inverses can be found in 0(n) flops. In addition, explicit formulae for the determinant, the eigenvalues and eigenvectors are given, also requiring 0(n) flops.

M. J. C. Gover

### Improvements of Stepsize Control in Numerical Integration

Stepsize control in numerical integration of ODE is an essential tool to improve the precision of the produced solution. In addition, a well behaved stepsize sequence shortens the computing time needed to produce the solution. Normally an estimate of the solution error is fed back to adjust the stepsize, so that the error of the solution is kept below some user-specified level. In this paper stepsize adjustment is viewed as a feedback control problem. This gives insight as well as an improved algorithm.

Kjell Gustafsson

### A Statistical Evaluation of Inverse Iteration

The eigenvalues of a symmetric tridiagonal matrix can be computed by bisection using Sturm sequences and their corresponding eigenvectors found by inverse iteration. The independent computing tasks comprising this combination are especially appropriate for implementation on a distributed memory multiprocessor, but the parallel efficiency of bisection and inverse iteration is not enough to recommend their use in general. Bisection produces eigenvalues to high accuracy. Inverse iteration results in eigenvectors with quality dependent on the spacing of the eigenvalues and on the starting vector. This paper examines the factors influencing the accuracy of inverse iteration. The effects of random starting vectors are discussed, and some uses of statistical analysis in the design of an inverse iteration algorithm are presented.

E. R. Jessup

### On Underdetermined Systems

We summarize the results of a study [4] whose objective was to obtain some understanding of the nature of certain solutions of underdetermined systems of linear equations with a view to their role in the analysis of various practical inverse problems, specifically those associated with image restoration and computed tomography.

### A chart of numerical methods for structured eigenvalue problems

We consider eigenvalue problems for real and complex matrices with two of the following algebraic properties: symmetric, Hermitian, skew symmetric, skew Hermitian, symplectic, conjugate symplectic, J-symmetric, J-Hermitian, J-skew symmetric, J-skew Hermitian. In the complex case we found numerically stable algorithms that preserve and exploit both structures in 24 out of the 44 nontrivial cases with such a twofold structure. Of the remaining 20, we found algorithms that preserve part of the structure of 9 pairs. In the real case we found algorithms for all pairs studied. The algorithms are constructed from a small set of numerical tools, including orthogonal reduction to Hessenberg form, simultaneous diagonalization of commuting normal matrices, Francis’ QR algorithm, the Quaternion QR-algorithm and structure revealing, symplectic, unitary similarity transformations.

Angelika Bunse-Gerstner, Volker Mehrmann, Ralph Byers

### Updating Choleski Factors in Parallel

In this paper we extend Bennett’s algorithm for modifying the Choleski factors of a banded matrix which undergoes a rank m change, where m is small compared with n, the dimension of the matrix. We compare and contrast the implementation on the DAP and the T-RACK, respectively exhibiting features of SIMD and MIMD systems. The parallel update algorithm based on this technique is used for solving banded linear systems and has been incorporated in the finite element package for stress analysis problems, where it is found to be particularly effective for interactive analysis, when a sub-region of the mesh is repeatedly updated.

J. J. Modi

### Approximate Inversion of Partially Specified Positive Definite Matrices

A fast algorithm is presented that can be used to compute an approximate inverse of a positive definite matrix that is specified only on a multiple band. The approximate inverse is the inverse of a matrix that closely matches the partially specified matrix. It has zeros in the positions that correspond to unspecified entries in the partially specified matrix. It is closely related to the inverse of the so-called maximum-entropy extension of the partially specified matrix. The algorithm is very well suited for implementation on an array processor.

H. Nelis, E. Deprettere, P. Dewilde

### Nearest “Unstable” Matrix Pencil to a Given Pencil

We aim to generalize the concept of “distance to instability” for matrices to matrix pencils. Difficulties arise since infinitesimal perturbations can destroy the Kronecker canonical form of the pencil. If perturbations are constrained so that the structure of the pencil is retained, then there exist maximal finite perturbations such that the finite eigenvalues of the pencil remain in the left half plane.A computable bound on the maximal perturbations is derived in terms of singular values of a particular matrix.

N. K. Nichols

### Parallel Recursive Least Squares on a Hypercube Multiprocessor

We have developed an efficient parallel implementation of an algorithm for recursive least squares computations based upon the covariance updating method. The target architecture is a distributed-memory multiprocessor, and test results on an Intel iPSC/2 hypercube demonstrate the parallel efficiency of the algorithm. A 64-node system is measured to execute the algorithm over 48 times as fast as a single processor for the largest problem that fits on a single node (fixed size speedup). Moreover, the computation times increase only slightly with an increase in the number of processors when the problem size per processor remains constant. Applications include robust regression in statistics and modification of the Hessian matrix in optimization, but the primary motivation for this work is the need for near real-time recursive least squares computations in signal processing.

Charles S. Henkel, Robert J. Plemmons

### A Divide and Conquer Method for the Unitary Eigenproblem and Applications

In divide and conquer methods for eigenproblems, the original eigenproblem is split into several smaller subproblems by using rank one modifications. Many of the subproblems can be solved in parallel, and from their solutions the solution of the original eigenproblem can be determined. This approach has for symmetric matrices been developed by Cuppen and refined by Dongarra and Sorensen. We describe a divide and conquer method for the computation of eigenvalues and eigenvectors of unitary matrices. The unitary matrices are represented in Schur parametric form. We also discuss the application of the unitary eigenproblem to the computation of Pisarenko frequency estimates in signal processing.

Gregory Ammar, William Gragg, Lothar Reichel

### Utilization of the Matrix Pencil to Extract Poles of a Linear Time-Invariant System

For a (discrete) linear time-invariant system, its poles can be obtained by solving for rank reducing numbers of a matrix pencil. Three algorithms are presented for utilizing the matrix pencil to obtain the poles of the system.

Yingbo Hua, Tapan K. Sarkar, P. Catalano, G. Casalegno, B. Audone

### Efficient Solution of Minimum Eigen-Problem of Hankel Systems by Conjugate Gradient Algorithm and FFT

The advantages of the conjugate gradient (CG) algorithm for the solution of the eigenvector corresponding to the minimum/maximum eigenvalue of a symmetric matrix over conventional methods, such as those found in IMSL, EISPACK or UNPACK library, are its efficiency and flexibility. Preliminary results for the solution of minimum eigen-problem of a Hankel system indicates the efficiency of the CG algorithm. The introduction of the FFT into computation demonstrates the flexibility of the CG algorithm, and further improves the efficiency of the algorithm.

Tapan K. Sarkar, Xiaopu Yang

### Progress Towards a Systolic SVD Array Implementation

The past five years have seen a great deal of interest in parallel architectures for the computation of the singular value decomposition. We present a linear array approach which is unique in that there is no angle computation within the array, I/O time is balanced with computation time, communication is nearest neighbor, and simple multiply and accumulate processors are utilized. In addition, special properties of the SVD algorithm are exploited in order to obtain a VLSI design with reduced area. We are currently designing a chip using two micron CMOS design rules. When completed we estimate that our array will compute the SVD at a rate of between 5n and 10n MFlops/second, where n is the dimension of the array.

David E. Schimmel

### On the Convergence of Cyclic Jacobi Methods

Jacobi’s method is an iterative algorithm for diagonalizing a symmetric matrix A = A(1), in which at iteration k a plane rotation J k =J k (i k ,j k k ) is chosen so as to annihilate a pair of off diagonal elements in the matrix A(k)$$\left( {\begin{array}{*{20}{c}} c&{ - s} \\ s&c \end{array}} \right)\left( {\begin{array}{*{20}{c}} {a_{ii}^{(k)}}&{a_{ij}^{(k)}} \\ {a_{ji}^{(k)}}&{a_{jj}^{(k)}} \end{array}} \right)\left( {\begin{array}{*{20}{c}} c&s \\ { - s}&c \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {a_{ii}^{(k + 1)}}&0 \\ 0&{a_{jj}^{(k + 1)}} \end{array}} \right)$$ where c = cos θ k and s=sin θ k .

Gautam Shroff, Robert Schreiber

### Numerically Stable Fast Transversal Filters for Recursive Least-Squares Adaptive Filtering

In this paper, a solution is proposed to the long-standing problem of the numerical instability of Fast Recursive Least-Squares Transversal Filter (FTF) algorithms with exponential weighting, which is an important class of algorithms for adaptive filtering. A framework for the analysis of the error propagation in FTF algorithms is first developed; within this framework, we show that the computationally most efficient 7N form is exponentially unstable. However, by introducing redundancy into this algorithm, feedback of numerical errors becomes possible; a judicious choice of the feedback gains then leads to a numerically stable FTF algorithm with complexity 9N. The results are presented for the complex multichannel joint-process filtering problem.

Dirk T. M. Slock, Thomas Kailath

### Applications of Analytic Centers

In this paper we present some applications of a recently introduced concept, the ”analytical center” of a convex inequality system. Let F = {f a (x, p)} denote a family of functions depending on parameters p ∈ P and elements a of an index set A, which are concave in x ∈ X, on the sets S a,p where they are nonnegative. Here X is a Hilbert space fixed together with the sets A and P. Below we shall consider the specific case when for each (a, p) the function f a (x, p) is linear or concave quadratic in x or when it is the determinant of a symmetric, positive semidefinite matrix, whose elements depend linearly on x.

György Sonnevend

### Solving Triangular System in Parallel is Accurate

An error complexity analysis of two algorithms for solving a unit diagonal triangular system is given. The results show that the usual sequential algorithm is optimal in terms of having the minimal maximum and cumulative error complexity measures. The parallel algorithm described by Sameh and Brent is shown to be essentially equivalent to the optimal sequential one.

Nai-kuan Tsao

### Storage Schemes for Parallel Eigenvalue Algorithms

In this paper, we examine storage schemes for parallel implementation on distributed memory MIMD multiprocessors of algorithms for solving the algebraic eigenvalue problem. We show that a novel storage scheme, which we call block Hankel-wrapped storage, allows better utilization of the processors than column-wrapped storage when implementing Jacobi’s method or the nonsymmetric QR algorithm.

Robert A. van de Geijn

### Iterative Solution Methods for Large, Sparse Systems of Linear Equations Arising from Tomographic Image Reconstruction

The problem of reconstructing an object from its projections can be solved using two different approaches: analytic methods and algebraic methods. Analytic methods, based on integral transforms such as the inverse Radon transform, are considerably more efficient than algebraic methods. In many practical cases, however, such as limited angle tomography or problems where refraction plays an important role, algebraic techniques have to be used.

M. C. A. Van Dijke, H. A. van der Vorst, M. A. Viergever

### The Generalized Total Least Squares Problem : Formulation, Algorithm and Properties

The Total Least Squares (TLS) method has been devised as a more global fitting technique than the ordinary least squares technique for solving overdetermined sets of linear equations AX ≈ B when errors occur in all data. If the errors on the measurements A and B are uncorrelated with zero mean and equal variance, TLS is able to compute a strongly consistent estimate of the true solution of the corresponding unperturbed set A 0X = B0. In this paper the TLS computations are generalized in order to maintain consistency of the solution in the following cases: first of all, some columns of A may be error-free and secondly, the errors on the remaining data may be correlated and not equally sized. Hereto, a numerically reliable Generalized TLS algorithm GTLS, based on the Generalized Singular Value Decomposition (GSVD), is developed. Additionally, the equivalence between the GTLS solution and alternative expressions of consistent estimators, described in literature, is proven. These relations allow to deduce the main statistical properties of the GTLS solution.

Sabine Van Huffel

### Understanding Old Deficiencies in the Conventional Recursive Least-Squares (CRLS) Scheme

This paper extends the well-known fact, that the normal equations approach in solving least-squares problems operates “well” when these problems are “well” conditioned, to the recursive formulation of this approach. The latter scheme is indicated in this paper as the conventional RLS scheme. In particular this paper highlights that for those well-conditioned problems the conventional RLS scheme does not suffer from the divergence phenomenon, the loss of symmetry and positive definiteness of the parameter covariance matrix.

M. H. Verhaegen

### Algorithms for Optical Computing and their Sensitivity

Fiber optics and guided wave optics have played an increasing role in communications, in recent years, and are expected to continue to grow. This activity, which was largely oriented to switching networks, will have direct impact on signal and data processing as well.

Erik I. Verriest

### A Constrained Eigenvalue Problem

In this paper we consider the following mathematical and computational problem. Given the quantities A: (n + m)-by-(n + m) matrix, symmetric, n > 0N: (n + m)-by-m matrix with full rankt: vector of dimension m with ∥(NT)+t∥ < 1 Determine an x such that $${\operatorname{x} ^T}Ax = \min$$ subject to the constraints i$${N^T}x = t$$ii$${x^T}x = 1.$$ Variants of this problem occur in many applications [1,5,7,8,11]. The problem has been studied previously when t = 0, the null vector, (cf. [4,6]). When t ≠ 0, then the problem becomes more complicated.We show how to eliminate the linear constraint (i). Then three different methods are presented for the solution of the resulting Lagrange equations.

Walter Gander, Gene H. Golub, Urs von Matt

### On GR Algorithms for the Eigenvalue Problem

This paper outlines a general theory of GR algorithms which subsumes the theory of QR, LR, and similar algorithms. A generic GR algorithm is described, and its convergence properties are discussed. A generic chasing algorithm, which performs a generic GR step implicitly, is also described. The more general point of view introduced here gives new insights which may lead to more efficient algorithms for the general eigenvalue problem.

David S. Watkins

### Numerical Analysis of Nonlinear Equations in Computer Vision and Robotics

The need to apply sophisticated numerical analysis algorithms to computer vision and robotics problems is urgent, and these fields provide unique challenges different from the engineering problems normally encountered by numerical analysts. The opportunities for crossfertilization are vast; within computer vision, facet modelling, surface approximation, three-dimensional object recognition, and range data analysis require splines, generalized polynomials, classical approximation theory, approximation in various norms, robust statistics, and fixed point theory. Fundamental vision problems such as shape from shading, structure from motion, consistent labelling, and surface segmentation involve nonlinear equations, nonlinear optimization, quasi-Newton and homotopy algorithms Robot control, kinematics, and planning problems involve. modern differential and algebraic geometry, modern control theory, computational geometry, and homotopy theory.

Layne T. Watson

### A Linear Systolic Array for the Adaptive MVDR Beamformer

We propose an efficient algorithm and a linear systolic array for the adaptive MVDR (Minimum Variance Distortionless Response) beam-former. We show that the beamformer output can be computed in a fully pipelined fashion without determining the optimum weight vector. For N antenna elements and M look directions to which the beamformer should be steered, we require O(N2/2+MN) operations1 per sample time.

B. Yang, J. F. Böhme

### Solving Band Systems of Linear Equations by Iterative Methods on Vector Machines

The use of iterative methods in the solution of systems of linear algebraic equations whose coefficient matrices are band is an important problem, because such systems do appear in many fields of engineering and science. Band matrices, symmetric and positive definite as well as unsymmetric, can efficiently be treated on many vector machines. The possibility of exploiting zero diagonals within the band in order to reduce the storage and the computing time will be discussed (also such matrices appear very often when large-scale problems are to be handled). The treatment of some special matrices, as tri-diagonal and five-diagonal matrices, will be sketched. It will be shown that on vector machines iterative methods may perform better than the direct methods even in the case where the direct methods are better than the iterative methods on scalar machines. The reason for this phenomenon is the fact that many iterative methods can be designed so that one works with long and contiguous arrays during the whole computational process. This can not be achieved when systems of linear algebraic equations with band matrices are solved by direct methods.

Zahari Zlatev

### Backmatter

Weitere Informationen