Skip to main content

Über dieses Buch

The combination of fast, low-latency networks and high-performance, distributed tools for mathematical software has resulted in widespread, affordable scientific computing facilities. Practitioners working in the fields of computer communication networks, distributed computing, computational algebra and numerical analysis have been brought together to contribute to this volume and explore the emerging distributed and parallel technology in a scientific environment. This collection includes surveys and original research on both software infrastructure for parallel applications and hardware and architecture infrastructure. Among the topics covered are switch-based high-speed networks, ATM over local and wide area networks, network performance, application support, finite element methods, eigenvalue problems, invariant subspace decomposition, QR factorization and Todd-Coxseter coset enumeration.



Introduction: High Performance Computing and the Technology of Switch-based Computer Networks

Many important research problems in mathematics, science and engineering can only be solved by means of very demanding computations on supercomputers. Since these machines are very expensive, they are rare. Hence it is difficult to get access to them. The recent introduction of new high-speed network technologies may change this situation. In the near future, the combined computational power of a cluster of workstations interconnected by a high-speed local area network is likely to be comparable to that of a current stand-alone supercomputer. Such cost-effective and scalable facilities promise to become the workhorse of future large-scale scientific and engineering applications. Access to such facilities is particularly important for universities, as a more cost-effective substitute for expensive, traditional supercomputers.
Gene Cooperman, G. Michler, H. Vinck

High Performance Computing and Communication Research at the National Science Foundation

This paper describes the High Performance Computation and Communications (HPCC) of the National Science Foundation (NSF). The rationale and organization of the US level HPCC program are outlined to present the context. Then the NSF’s HPCC-related activities are presented in more detail.
S. Kamal Abdali

Practical parallel coset enumeration

Coset enumeration is a most important procedure for investigating finitely presented groups. We present a practical parallel procedure for coset enumeration on shared memory processors. The shared memory architecture is particularly interesting because such parallel computation is both faster and cheaper. The lower cost comes when the program requires large amounts of memory, and additional CPU’s allow us to lower the time that the expensive memory is being used.
Rather than report on a suite of test cases, we take a single, typical case, and analyze the performance factors in-depth. The parallelization is achieved through a master-slave architecture. This results in an interesting phenomenon, whereby the CPU time is divided into a sequential and a parallel portion, and the parallel part demonstrates a speedup that is linear in the number of processors. We describe an early version for which only 40% of the program was parallelized, and we describe how this was modified to achieve 90% parallelization while using 15 slave processors and a master. In the latter case, a sequential time of 158 seconds was reduced to 29 seconds using 15 slaves.
Gene Cooperman, George Havas

High Performance Computing over Switch-Based High-Speed Networks

Communication between processors has long been the bottleneck of distributed network computing. However, recent progress in switch-based high-speed networks including ATM, HIPPI and Fibre Channel may be changing this situation. To utilize the aggregate computational power of clusters of workstations, We have especially concentrated our effort on maximizing the achievable throughput and minimizing the delay for end-users. In order to achieve better performance, we have investigated the following issues: i) ways to improve the performance of I/O subsystems and ii) porting PVM over high-speed network APIs. The typical communications between processors are going through a stack of network protocols, (i) represents our effort to improve the lower layer communication performance. However, the application (or user) level of performance may not be improved too much even if the lower layer performance is improved. Therefore, to improve user level performance it is required to bypass most of the network protocols in the communication stack and, hence, the overheads generated by them by porting applications (like PVM) directly over lower layer network protocols.
For (i) we discuss the details of how the I/O subsystems related to the network operations and several possible approaches for improving the maximum achievable bandwidth and reducing end-to-end communication latency. For (ii) we show some performance results of PVM over high-speed local area networks. Although the available bandwidth of high-speed network is much higher than the traditional LANs, the application-level performance, however, still lags behind the capabilities of the physical medium.
Jenwei Hsieh, David H. C. Du

ATM High Performance Distributed Computing Laboratory

The New York State Center for Advanced Technology in Computer Applications and Software Engineering (CASE) at Syracuse University has established an ATM High Performance Distributed Computing (HPDC) Laboratory within the center that provides ATM connections to several other units both on and off campus. This HPDC Laboratory has been established to provide a research testbed for multimedia and ATM related studies and provide a controlled environment where enabling technologies and applications can be developed and evaluated. In this paper, we describe the computing environment of HPDC Laboratory and overview some of the ongoing projects.
Salim Hariri

Switched Virtual Networking

The Asynchronous Transfer Mode (ATM) technology is frequently seen only under the aspects of bandwidth gain and the suitability for multi-media communication. An essential advantage of ATM however is its potential for new concepts of connection management in complex networks. By the ability of ATM to create virtual subnets the separation of the logical network topology from the physical location of the subnet participants becomes possible. In conjunction with other switching LAN technologies like switched Ethernet and switched Token Ring extremely flexible and dynamic communication structures can be build which reflect the organisational requirements of nowadays business processes.
Detlef Straeten, Martin Mähler

Parallel Large Scale Finite Element Computations

From Amdahl’s Law, the efficient use of parallel computers can not mean a parallelization of some single steps of a larger calculation, if in the same time a relatively large amount of sequentiell work remains or if special convenient data structures for such a step have to be produced with the help of expensive communications between the processors. From this reason, our basic work on parallel solving partial differential equations was directed to investigating and developing a natural fully parallel run of a finite element computation — from parallel distribution and generating the mesh — over parallel generate / assembly step — to parallel solution of the resulting large linear systems of equation and post-processing.
Arnd Meyer

Employing Deterministic and Stochastic Petri Nets for the Analysis of Usage Parameter Control in ATM-Networks

Traffic flow control mechanisms play an important role for the design and operation of future high-speed networks. Here we employ the class of Deterministic and Stochastic Petri Nets (DSPN) for the specification and evaluation of Usage Parameter Control at the User Network Interface in ATM-networks. After an overview of performance issues in traffic management DSPNs are applied for the specification of the functional and quantitative behaviour of traffic sources and control mechanisms. The main contribution of this paper is to demonstrate the applicability of DSPN as a concise and unifying technique for the specification and analysis of Usage Parameter Control in ATM-networks.
Bruno Müller-Clostermann

Parallel Algorithms for Computing Rank-Revealing QR Factorizations

The solution to many scientific and engineering problems requires the determination of the numerical rank of matrices. We present new parallel algorithms for computing rank-revealing QR (RRQR) factorizations of dense matrices on multicomputers, based on a serial approach developed by C. H. Bischof and G. Quintana-Ortí. The parallel implementations include the usual QR factorization with column pivoting, and a new faster approach that consists of two stages: a QR factorization with local column pivoting and a reliable rank-revealing algorithm appropriate for triangular matrices. Our parallel implementations include the BLAS-2 and BLAS-3 QR factorizations without pivoting since they are a good reference point, though they are not appropriate for rank-revealing purposes.
Experimental comparison shows considerable performance improvements of our new approach over classical rank-revealing algorithms on the platforms we used: an IBM SP2 platform and a cluster of SGI workstations.
We study the effect of the computer communication network and the processor computational power on the performance of the algorithms. In this case, as well as in many other parallel and distributed applications, the latency and bandwidth of the network are much more important than the processor computational power and, thus, these are the key factors impacting performance.
Gregorio Quintana-Ortí, Enrique S. Quintana-Ortí

A distributed algorithm for the construction of invariant subspaces

We describe our implementation of a distributed algorithm for the construction of subspaces of vector spaces over finite fields which are invariant under the action of one or more matrices.
Jens Rosenboom

ATM in Practice: Experiences in LAN and WAN Environments

In recent months ATM technology has been praised as a solution to the ever rising need for bandwidth both in LAN and WAN environments, since it not only offers higher speeds at moderate cost but also the ability to assign resources according to needs. However it seems that apart from experiments in small laboratory environments little experience is available on how ATM — or rather the available ATM equipment — performs under heavy load.
Gerhard J. A. Schneider

Measurements, Proper Scaling and Application of Communication Parameters

The measurements of the communication parameters for different parallel computers are presented. These parameters are scaled to equivalent lost operations for startup and transfer of one operand. Then latency hiding is discussed and measured. For the example of the parallel matrix multiplication on p processors we present the theory for three different types of scaling: same problem, same storage per processor and same computation per processor. The theory is compared to measurements for the CRAY T3D, Intel Paragon, IBM RS/6000 SP with ATM-Switch and with High Performance Switch 3. Additional system overhead, compared to basic measurements, scales by a negative overlap factor.
Willi Schönauer, Hartmut Häfner

Matrix Multiplication over Small Finite Fields on MIMD Architectures

Many problems in computational representation theory of finite groups require calculations with huge dense matrices over very small finite fields (cf. [5] and [7]). In this paper, we concentrate on matrix multiplication of matrices over GF(2) though most of the ideas can be generalized in some way to other matrix algorithms and to other small finite fields.
In the first chapter, we present some ideas how to accelerate earlier programs and considerations how to choose the best possible corresponding tuning parameters. We show that these considerations and the running times of an implementation on one thin node of an IBM SP/2 fit well together.
In the second chapter, we discuss different approaches for the parallelization of the algorithms, which all use the message-passing paradigm. We study the influences between the tuning parameters and different communication patterns. This is explained looking at several examples performed on the SP/2.
Reiner Staszewski

Parallel Algorithms for Dense Eigenvalue Problems

Parallel solutions of dense eigenvalue problems have been active research topics since the implementation of the first parallel eigenvalue algorithm in 1971. In this paper we review Jacobi methods and spectral division methods in respective frameworks, introduce the issues arising for high performance implementation on contemporary computers, especially, parallel computers, and address problems that remain open.
Xiaobai Sun

Enhanced Communication Support for Networked Applications

Today communication support forms a key issue for almost all computer-driven applications. Enhanced networks as well as communication protocols and services are required in order to appropriately serve the emerging variety of applications. This holds for networked multimedia applications as well as for scientific applications. If scientific applications are implemented on workstation clusters, they typically need to deal with the same protocol stack as multimedia applications, i.e., with the Internet protocols. Although both types of applications may have different flavours, many commonalities exist with respect to their communication requirements. Examples include low communication overhead as well as scalable and efficient support of group communication.
The paper gives a survey of enhanced communication support that is mainly targeted towards services and protocols for networked multimedia applications. Since efficiency of communication systems is very important, implementation techniques are also considered within this survey.
Martina Zitterbart


Weitere Informationen