ABSTRACT
We discuss the parallelization of arithmetic operations on polynomials modulo a triangular set. We focus on parallel normal form computations since this is a core subroutine in many high-level algorithms, such as triangular decompositions of polynomial systems.
When computing modulo a triangular set, multivariate polynomials are regarded recursively as univariate ones, which leads to several implementation challenges when one targets highly efficient code. We rely on an algorithm proposed in [17] which addresses some of these issues.
We propose a two-level parallel scheme. First, we make use of parallel multidimensional Fast Fourier Transform in order to perform multivariate polynomial multiplication. Secondly, we extract parallelism from the structure of the sequential normal form algorithm of [17]. We have realized a multithreaded implementation. We report on different strategies for the management of tasks and threads.
- R. AlNa'Mneh, W. D. Pan, and R. Adhami Communication efficient adaptive matrix transpose algorithm for FFT on symmetric multiprocessors. In Proc. SSST'05 the Thirty-Seventh Southeastern Symposium on System Theor, pages 312--315. IEEE Computer Society, 2005.Google ScholarCross Ref
- J.-C. Bajard and G. A. Jullien. Parallel Montgomery multiplication in GF (2k): using trinomial residue arithmetic. In Proc. ARITH'05, the 17th IEEE Symposium on Computer Arithmetic, pages 164--171. IEEE Computer Society, 2005. Google ScholarDigital Library
- R. D. Blumofe, M. Frigo, C. F. Joerg, C. E. Leiserson, and K. H. Randall. An analysis of dag-consistent distributed shared-memory algorithms. In Proc. SPAA'96, the eighth annual ACM symposium on Parallel algorithms and architectures, pages 297--308, Padua, Italy, ACM Press, 1996. Google Scholar
- S. Cook. On the minimum computation time of functions. PhD thesis, Harvard University, 1966.Google Scholar
- X. Dahan, M. Moreno Maza, É. Schost, W. Wu, and Y. Xie. Lifting techniques for triangular decompositions. In Proc. ISSAC'05, pages 108--115. ACM Press, 2005. Google ScholarDigital Library
- A. Filatei, X. Li, M. Moreno Maza, and ÉSchost. Implementation techniques for fast polynomial arithmetic in a high-level programming environment. In Proc. ISSAC'06, pages 93--100, ACM Press, 2006. Google ScholarDigital Library
- M. V. Foursov and M. Moreno Maza. On computer-assisted classification of coupled integrable equations. J. Symb. Comp., 33:647--660, 2002. Google ScholarDigital Library
- J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 1999. Google ScholarDigital Library
- A. Halbutogullari and C. K. Koc. Parallel multiplication in GF (2k): using polynomial residue arithmetic. Designs, Codes and Cryptography, 20(2): :155--173, 2000. Google ScholarDigital Library
- J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proc. ISSAC 2004, pages 290--296, ACM Press, 2004. Google ScholarDigital Library
- H. Hong and H. W. Loidl. Parallel computation of modular multivariate polynomial resultants on a shared memory machine. In B. Buchberger and J. Volkert, editors, Proc. of CONPAR 94, Springer LNCS 854., pages 325--336. Springer Verlag, 1994. Google ScholarDigital Library
- J. R. Johnson, W. Krandick, and A. D. Ruslanov. Architecture-aware classical Taylor shift by 1. In ISSAC'05, pages 200--207. ACM Press, 2005. Google ScholarDigital Library
- Hyun-Sung Kim, Hee-Joo Park, and Sung-Ho Hwang. Parallel Modular Multiplication Algorithm in Residue Number System. Chapter in Parallel Processing and Applied Mathematics, volume 3019/2004 of Lecture Notes in Computer Science, Springer, 2004.Google Scholar
- W. Küchlin. On the multi-threaded computation of integral polynomial greatest common divisors. In Proc. of ISSAC'91, pages 333--342. ACM Press, 1991. Google ScholarDigital Library
- H. T. Kung. On computing reciprocals of power series. Numerische Mathematik, 22:341--348, 1974.Google ScholarDigital Library
- X. Li and M. Moreno Maza. Efficient implementation of polynomial arithmetic in a multiple-level programming environment. In A. Iglesias and N. Takayama, editors, Proc. International Congress of Mathematical Software-ICMS 2006, pages 12--23. Springer, 2006. Google ScholarDigital Library
- X. Li, M. Moreno Maza, and ÉSchost. Fast arithmetic for triangular sets:From theory to practice. In Proc. ISSAC'07, ACM Press, 2007. Google ScholarDigital Library
- X. Li, M. Moreno Maza, and É. Schost. On the virtues of generic programming for symbolic computation. In Proc. CASA'07, Lecture Notes in Computer Science vol. 4488, pages. 251258, Springer-Verlag Berlin Heidelberg 2007. Google ScholarDigital Library
- M. Moreno Maza, B. Stephenson, S. M. Watt, and Y. Xie. Multiprocessed parallelism support in Aldor on SMPs and multicores. In Proc. PASCO'07, ACM Press, 2007. Google ScholarDigital Library
- M. Moreno Maza and Y. Xie. Component-level parallelization of triangular decompositions. In Proc. PASCO'07, ACM Press, 2007. Google ScholarDigital Library
- K. Morita, A. Morihata, K. Matsuzaki, Z. Hu and M. Takeichi. Automatic Inversion Generates Divide-and-Conquer Parallel Programs In Proc. PLDI'07, ACM Press, 2007. Google ScholarDigital Library
- M. Sieveking. An algorithm for division of power series. Computing, 10:153--156, 1972.Google ScholarCross Ref
- S. Varette and J.-L. Roch. Probabilistic Certification of Divide & Conquer Algorithms on Global Computing Platforms. Application to Fault-Tolerant Exact Matrix-Vector Product In Proc. PASCO'07, ACM Press, 2007. Google ScholarDigital Library
Index Terms
- Multithreaded parallel implementation of arithmetic operations modulo a triangular set
Recommendations
Fast arithmetic for triangular sets: from theory to practice
ISSAC '07: Proceedings of the 2007 international symposium on Symbolic and algebraic computationWe study arithmetic operations for triangular families of polynomials, concentrating on multiplication in dimension zero. By a suitable extension of fast univariate Euclidean division, we obtain theoretical and practical improvements over a direct ...
Interlacing and spacing properties of zeros of polynomials, in particular of orthogonal and Lq-minimal polynomials, q∈[1,∞]
Let (P"2"n^*(z)) be a sequence of polynomials with real coefficients such that lim"nP"2"n^*(e^i^@f)=G(e^i^@f) uniformly for @f@__ __[@a-@d,@b+@d] with G(e^i^@f)<>0 on [@a,@b], where 0=<@a<@b=<@p and @d>0. First it is shown that the zeros of p"n(cos@f)=...
Component-level parallelization of triangular decompositions
PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computationWe discuss the parallelization of algorithms for solving poly-nomial systems symbolically by way of triangular decompositions. We introduce a component-level parallelism for which the number of processors in use depends on the geometry of the solution ...
Comments