Skip to main content

2023 | Buch

Principal Component Analysis and Randomness Test for Big Data Analysis

Practical Applications of RMT-Based Technique


Über dieses Buch

This book presents the novel approach of analyzing large-sized rectangular-shaped numerical data (so-called big data). The essence of this approach is to grasp the "meaning" of the data instantly, without getting into the details of individual data. Unlike conventional approaches of principal component analysis, randomness tests, and visualization methods, the authors' approach has the benefits of universality and simplicity of data analysis, regardless of data types, structures, or specific field of science.

First, mathematical preparation is described. The RMT-PCA and the RMT-test utilize the cross-correlation matrix of time series, C = XXT, where X represents a rectangular matrix of N rows and L columns and XT represents the transverse matrix of X. Because C is symmetric, namely, C = CT, it can be converted to a diagonal matrix of eigenvalues by a similarity transformation SCS-1 = SCST using an orthogonal matrix S. When N is significantly large, the histogram of the eigenvalue distribution can be compared to the theoretical formula derived in the context of the random matrix theory (RMT, in abbreviation).

Then the RMT-PCA applied to high-frequency stock prices in Japanese and American markets is dealt with. This approach proves its effectiveness in extracting "trendy" business sectors of the financial market over the prescribed time scale. In this case, X consists of N stock- prices of length L, and the correlation matrix C is an N by N square matrix, whose element at the i-th row and j-th column is the inner product of the price time series of the length L of the i-th stock and the j-th stock of the equal length L.

Next, the RMT-test is applied to measure randomness of various random number generators, including algorithmically generated random numbers and physically generated random numbers.

The book concludes by demonstrating two applications of the RMT-test: (1) a comparison of hash functions, and (2) stock prediction by means of randomness, including a new index of off-randomness related to market decline.


Chapter 1. Big Data Analysis with RMT
The purpose of this book is to introduce the basic concepts of RMT-oriented methods and their practical applications in big data analysis, focusing on two topics, RMT-PCA and RMT-test. Both are methodologies for analyzing large numerical data via computer programming. The essence of these methodologies is to use RMT to subtract random portions from the data under analysis and extract a small number of useful elements.
Mieko Tanaka-Yamawaki, Yumihiko Ikura
Chapter 2. Formulation of RMT-PCA
When dealing with big data, it is stored in computers as digital data according to some regularity. A familiar example is stock prices. Usually, the stock prices of each company are arranged in a time series. A vector is a mathematical concept that is useful for handling such a long array of stock prices in a single character.
Mieko Tanaka-Yamawaki, Yumihiko Ikura
Chapter 3. RMT-PCA for the Stock Markets
We will examine how to extract a set of correlated stock prices from a large and complex network consisting of hundreds or thousands of stocks. In addition to correlations between stocks in the same industry, there are also correlations and anti-correlations between stocks in different industries. In order to compare price time series of different magnitude, profits are often used instead of prices.
Mieko Tanaka-Yamawaki, Yumihiko Ikura
Chapter 4. The RMT-Tests
We created RMT-test as a new tool to measure the randomness of sufficiently long data strings and present the results of applying it to various situations. The goal of RMT-test is to find good labels for a given data string. Good labels can greatly reduce the burden of big data analysis, especially for huge financial data.
Mieko Tanaka-Yamawaki, Yumihiko Ikura
Chapter 5. Applications of the RMT-Test
Next, we will discuss three applications of RMT-test that we have tried. The first application measures the randomness of two Hash functions. Naturally, the newest algorithm produces more random Hash values than the older one. But let’s check: the second application is to select stocks to invest in; the third application is to use the randomness measured by the RMT-test to predict possible market disruptions in advance.
Mieko Tanaka-Yamawaki, Yumihiko Ikura
Chapter 6. Conclusion
We have explored the world of numerical research armed with RMT-oriented methodologies for big data analysis. At first glance, the methodology may appear to be very mathematical. However, in the course of our research, we have learned that once the process is fully formulated and the algorithms are established, it should be a tool that anyone can use. That is how we enjoyed working together and struggling with a huge size of data sets. Such simplicity is at the heart of a technology suitable for big data analysis.
Mieko Tanaka-Yamawaki, Yumihiko Ikura
Principal Component Analysis and Randomness Test for Big Data Analysis
verfasst von
Mieko Tanaka-Yamawaki
Yumihiko Ikura
Springer Nature Singapore
Electronic ISBN
Print ISBN

Premium Partner