4.2.1 Critical Values
Table
2 shows critical values
\(\lambda _{LR}^{(\alpha )}\) for each combination of size of the data and number of clusters, at a nominal significance level
\(\alpha \)= 0.05. These critical values were obtained by applying the three steps Monte Carlo scheme described in Section
3.2.
Inspection of Table
2 shows that, for each level of size, the null distribution of
\(\lambda _{LR}\) is shifted towards the right for increasing number of column clusters. This is an expected result because more parameters are estimated from the observed data and, thus, more chance capitalization resulting in higher
\(\lambda _{LR}\) values by chance. In contrast, increasing the number of row clusters appears to result in null distributions that are shifted towards the left, despite implying a larger number of estimated parameters. This shift must be attributed to the penalty term in (
12), which is affected by the number of estimated row clusters. Furthermore, if the number of columns
J is fixed, then for each level of number of clusters, the null distribution of
\(\lambda _{LR}\) is shifted to the left as the number of rows
I increases. This is due to the penalty term decreasing faster, as
I increases, than the difference in log-squared-residuals (see (
12)), leading to smaller test statistic values. Finally, comparing
\(50 \times 30\) to
\(20 \times 20\) and
\(40 \times 20\) shows that increasing the number of columns
J shifts the null distribution of
\(\lambda _{LR}\) towards the right, despite a larger number of rows
I, which we have seen shifts the null distribution to the left. This may be explained by the fact that
J does not affect the penalty term in (
12) but only the difference in log-squared-residuals.
Table 2
Critical values \(\lambda _{LR}^{(\alpha )}\) for \(\alpha =0.05\) for each combination of size and number of clusters
\({20\times 20}\) | 15.029 | 14.542 | 12.844 | 25.043 | 34.636 |
\({40\times 20}\) | 13.875 | 12.932 | 10.476 | 22.349 | 29.991 |
\({50\times 30}\) | 18.562 | 17.975 | 15.288 | 28.986 | 39.144 |
\({30\times 50}\) | 29.511 | 29.213 | 27.725 | 47.065 | 62.392 |
\({100\times 20}\) | 13.101 | 11.655 | 8.157 | 20.167 | 27.509 |
\({200\times 30}\) | 17.206 | 15.738 | 11.909 | 26.005 | 35.052 |
4.2.3 Power and Parameter Recovery
Power
Figure
1 shows empirical power as a function of method (x-axis) and size of the data (curves), for data generated with unequal expected row cluster sizes. Subfigures in each column refer to true number of clusters set to (2, 2), (3, 3) and (4, 4), for the first, second and third column, respectively. Subfigures in the first row are obtained when the number of clusters for the analysis coincides with the true number of clusters (i.e., no misspecification), while the second and third row correspond to misspecification of the number of clusters for the analysis (see subfigure headings for details). Note that the methods
Boik,
Piepho and
Malik are not based on a two-mode clustering model, and hence their power depends only on the generated data and not on the number of clusters for the analysis. The results of these methods therefore do not change across the different rows of Fig.
1, but only across columns. Overall, empirical power decreases if the number of observations (i.e.,
I and/or
J) decreases (comparison between curves within a subfigure) and/or the true number of clusters (
P,
Q) increases (comparison across subfigures of the first row). There is a decrease, but it is not dramatic, in power for E-ReMI and REMAXINT when the number of clusters for the analysis does not match the true number of clusters (comparison across rows of the subfigures within each column). This suggests that, when using E-ReMI or REMAXINT as a test for interaction a small under/over-fitting does not have serious consequences for power (see Section
6 for a discussion on setting the number of clusters for the analysis). Comparing the different methods, it stands out that Malik performs the worst, followed by Piepho, which shows the second worst performance in terms of power. Comparing the remaining three methods when the true number of clusters is set to (2, 2), REMAXINT seems to perform slightly better than E-ReMI, which, in turn, tends to perform better than Boik. Instead, when the true number of clusters is set to (3, 3) E-ReMI overall has the best performance, followed by REMAXINT and then Boik. Given the choice of data generation mechanism (with one row cluster comprising 70% of cases in expectation), this is not surprising as increasing the number of row clusters leads to a higher level of inequality of the expected row cluster sizes. Lastly, Boik becomes the best method, followed by E-ReMI, once the true number of clusters increases to (4, 4).
Figure
2 is similar to Fig.
1 and shows empirical power as a function of method (x-axis) and size of the data (curves), but for data generated with equal expected row cluster sizes (same subfigure structure). Similarly as in the previous scenario, empirical power decreases if the number of observations decreases (comparison between curves within a subfigure) and/or the true number of clusters (
P,
Q) increases (comparison across subfigures of the first row). However in this case, it seems that increasing the number of columns has a stronger effect than increasing the number of rows, as, when size is set to
\(30 \times 50\) results are equal or better than when size is set to
\(50 \times 30\). A possible explanation is that since E-ReMI is too flexible, it requires more columns so as not to overfit the data with respect to the row clusters. REMAXINT, on the other hand, correctly assumes equal row cluster sizes and thus is less affected by the number of columns. As in the unequal row cluster sizes case, there is a small decrease in power for E-ReMI and REMAXINT when the number of clusters for the analysis does not match the true number of clusters (comparison across rows of the subfigures within each column), with E-ReMI being less robust to this misspecification. Comparing the different methods, Malik and Piepho have clearly the worst performance. Comparing the remaining three methods reveals that when the true number of clusters is set to (2, 2) and to (3, 3) REMAXINT seems to perform better than E-ReMI and Boik, which have a similar performance. When increasing the true number of clusters to (4, 4) Boik is clearly the best performing method in terms of power.
Parameter recovery
Figures
3–
6 present the results in terms of means, across all 1000 data sets per condition, of
ARI for rows,
ARI for columns and
NSE. Similarly to the figures for power, different columns refer to different true number of clusters, while different rows to different number of clusters for the analysis (see subfigure headings for details). For studying parameter recovery, the comparison is possible only between E-ReMI and REMAXINT, as they are the only two methods that yield estimated row and column partitions with corresponding bicluster interaction effect parameters.
Figure
3 presents the means across all 1000 data sets per condition of
ARI for row clusters as a function of method (x-axis) and size of the data (curves), for data generated with unequal expected row cluster sizes. Overall, mean
ARI is higher (i.e., better performance) for larger data sizes. Specifically, it is the highest when size of the data is set to
\(200 \times 30\), followed by
\(30 \times 50\) and then by
\(50 \times 30\). Moreover, for fixed number of columns and increasing number of rows (i.e., comparing
\(100 \times 20\),
\(40 \times 20\) and
\(20 \times 20\)), the performance increases as the number of rows increases. Lastly, comparing the cases
\(100 \times 20\),
\(50 \times 30\) and
\(30 \times 50\), the latter has the overall best performance, while
\(100 \times 20\) has the worst, despite having a larger number of total observations (i.e. 2000 as compared to 1500). This is as expected, since a larger number of columns imply more information to estimate the row clusters. Subfigures in the top row show the results when there is no missspecification of the number of clusters for the analysis, that is, when they coincide with the true number of clusters. Focusing on these figures, it can be seen that the two methods perform equally well in the scenario with true number of clusters set to (2, 2), while E-ReMI performs better in the (3, 3) and (4, 4) cases. In case of misspecification of the number of clusters for the analysis, E-ReMI performs always better than REMAXINT. This result is particularly interesting in the (2, 2) case, as the performance between the two methods was similar under correct specification of the number of clusters. This increased comparative performance can be explained by the fact that the misspecified models in this case imply an overfitting of the number of row clusters. It is likely that E-ReMI is capable of classifying correctly most of the observations, by creating additional small clusters for observations that (randomly) differentiate from the two main clusters. REMAXINT, on the other hand, because of the implicit assumption of equal row cluster sizes, is encouraged more strongly to yield surplus row clusters containing a substantial number of rows.
Figure
4 presents the means of
ARI for row clusters, for data generated with equal expected row cluster sizes. Overall, we see a slightly better performance of REMAXINT in all scenarios but those where the true number of clusters is equal to (2, 2) and the number of clusters for the analysis are misspecified, where E-ReMI has clearly a better performance. These results can again be explained by the flexibility of E-ReMI with respect to yielding row clusters with unequal row cluster sizes that may (partially) make up for the fact that an excess number of row clusters is fitted to these data.
Figure
5 presents the means of
ARI for column clusters, for data generated with unequal expected row cluster sizes. Overall, mean
ARI is higher (i.e., better performance) for larger data sizes. Specifically, it is the highest when size of the data is set to
\(200 \times 30\), followed by
\(100 \times 20\) and then by
\(50 \times 30\). As opposed to
ARI for rows, increasing the number of rows has a stronger effect on
ARI for columns than increasing the number of columns. This can be seen by
\(50 \times 30\) performing better than
\(30 \times 50\) (it was the opposite in ARI for rows) and by
\(100 \times 20\) performing better than those two (which was not the case for ARI for rows). This is as expected, since a larger number of rows imply more information to estimate the column clusters. The performance of the methods decreases for higher values of true number of clusters (comparison across columns), and it is negatively affected by misspecification of the number of clusters for the analysis when true number of clusters is set to (3, 3) and (4, 4) (comparison across rows of the middle and rightmost columns). Note that when true number of clusters is set to (2, 2), both misspecifications imply overfitting in terms of the number of row clusters, whereas when true number of clusters is set to (4, 4) both misspecifications imply underfitting in terms of the number of column clusters. When true number of clusters is set to (2, 2) the two methods perform equally well, while when it is set to (3, 3) or (4, 4), the two methods perform equally well in most cases, except for data sizes with a large number of rows in which E-ReMI has a better performance. This suggests that estimation of the column partitions benefits substantially from a correct specification of the model if that sample is sufficiently large (i.e., 100 rows or more). Results when data are generated with equal row cluster sizes are very similar, but now the two methods perform always equally well (results not shown). Note that in this case, the sampling mechanism for the rows as assumed by REMAXINT is correct.
Figure
6 presents the means of
NSE, for data generated with unequal expected row cluster sizes. The outcome measure
NSE is the most general parameter recovery performance measure out of the three considered in this study, since it takes into account the quality of the estimated row clustering, of the estimated column clustering and of the estimated interaction effect parameters. Bearing in mind that lower values of
NSE imply better performance, mean
NSE is the lowest (and thus the best) for size set to
\(200 \times 30\), i.e. the largest data size, and is very similar for sizes set to
\(100 \times 20\),
\(50 \times 30\) and
\(30 \times 50\). Since
\(100 \times 20\) has a larger data size than the other two cases, but it is also the most asymmetrical case, this suggests that more symmetrical data sets tend to perform better in terms of
NSE. Also in this case, increasing the true number of clusters has a detrimental effect on the performance of the methods under study. Interestingly, misspecification of the number of clusters for the analysis does not lead to a clear trend with results that are sometimes not affected, sometimes positively affected (better performance of the methods) and sometimes negatively affected (worse performance of the methods). Lastly, E-ReMI tends to perform at least as well as REMAXINT, and, very often, better. Results when data are generated with equal row cluster sizes are very similar, but the two methods perform always equally well (results not shown).
Summarizing the results in this subsection, increasing size of the data leads to a better performance, with some performance criteria more affected by an increase in the number of rows and some others by an increase in the number of columns. Increasing the complexity of the clustering structure, that is, when true number of clusters increases, leads to a worse performance. Misspecification of the number of clusters for the analysis generally has a detrimental effect on the performance of the methods, but for most performance criteria (in most scenarios) this effect is minimal. As could be expected, in terms of power, E-ReMI tends to perform better than REMAXINT when data are generated with unequal expected row cluster sizes whereas, in a few cases, the performance of REMAXINT is better than that of E-ReMI when data are generated with equal expected row cluster sizes. Overall, in terms of parameter recovery, E-ReMI performs better than REMAXINT when data are generated with unequal expected row cluster sizes and both methods perform equally well when data are generated with equal expected row cluster sizes. Lastly, when it comes to power, Boik’s method is a good choice as its performance is always very good and it is more robust to increased complexity of the data. However, this method was not designed to facilitate interpreting the row by column interaction in a data set at hand. When interest is in understanding that interaction, REMAXINT/E-ReMI are recommended because they yield an estimated two-mode clustering structure and corresponding interaction effect parameter estimates.