2015 | OriginalPaper | Buchkapitel
Towards Order-Preserving SubMatrix Search and Indexing
verfasst von : Tao Jiang, Zhanhuai Li, Qun Chen, Kaiwen Li, Zhong Wang, Wei Pan
Erschienen in: Database Systems for Advanced Applications
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Order-Preserving SubMatrix
(OPSM) has been proved to be important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. Given an OPSM query based on row or column keywords, it is desirable to retrieve OPSMs quickly from a large gene expression dataset or OPSM data via
indices
. However, the time of OPSM mining from gene expression dataset is long and the volume of OPSM data is huge. In this paper, we investigate the issues of indexing two datasets above and first present a naive solution
pfTree
by applying
p
re
f
ix-
Tree
. Due to it is not efficient to search the tree, we give an optimization indexing method
pIndex
. Different from
pfTree
,
pIndex
employs row and column header tables to traverse related branches in a
bottom-up
manner. Further, two pruning rules based on
number
and
order
of keywords are introduced. To reduce the number of column keyword candidates on fuzzy queries, we introduce a
F
irst
I
tem of keywords ro
T
ation method
FIT
, which reduces it from
$$n!$$
to
$$n$$
. We conduct extensive experiments with real datasets on a single machine, Hadoop and Hama, and the experimental results show the efficiency and scalability of the proposed techniques.