Article

Distance-function design and fusion for sequence data

Authors:
Yi Wu

University of California, Santa Barbara, CA

University of California, Santa Barbara, CA
View Profile

,
Edward Y. Chang

University of California, Santa Barbara, CA

University of California, Santa Barbara, CA
View Profile

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge managementNovember 2004Pages 324–333https://doi.org/10.1145/1031171.1031238

Published:13 November 2004Publication History

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

Pages 324–333

ABSTRACT

Sequence-data mining plays a key role in many scientific studies and real-world applications such as bioinformatics, data stream, and sensor networks, where sequence data are processed and their semantics interpreted. In this paper we address two relevant issues: sequence-data representation, and representation-to-semantics mapping. For representation, since the best one is dependent upon the application being used and even the type of query, we propose representing sequence data in multiple views. For each representation, we propose methods to construct a valid kernel as the distance function to measure similarity between sequences. For mapping, we then find the best combination of the individual distance functions, which measure similarity of different views, to depict the target semantics. We propose a super-kernel function-fusion scheme to achieve the optimal mapping. Through theoretical analysis and empirical studies on UCI and real world datasets, we show our approach of multi-view representation and fusion to be mathematically valid and very effective for practical purposes.

References

http://www.bioinfo.rpi.edu/zukerm/bio-5495/rnafold-html/node2.html.]]Google Scholar
H. Andrd-J6nsson and D. Badal. Using signature files for querying time-series data. In proceedings of Principles of Data Mining and Knowledge Discovery, 1 st European Symposium, 1997.]] Google ScholarDigital Library
Y. B., H. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In Proc. of the 14th Int'l Conference on Data Engineering. Orlando, FL, 1998.]] Google ScholarDigital Library
Y. Bengio. Gradient-based optimization of hyper-parameters. Neural Computation, 12(8), 2000.]] Google ScholarDigital Library
T. Bozkaya, N. Yazdani, and Z. M. Ozsoyoglu. Matching and indexing sequences of different lengths. In Proc. of the 6th Int'l Conference on Knowledge Discovery and Data Mining., 1997.]] Google ScholarDigital Library
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1998.]] Google ScholarDigital Library
S. Chandrasekaran, B. S. Manjunath, Y. F. Wang, J. Winkeler, and H. Zhang. An eigenspace update algorithm for image analysis. Graphical Models and Image Processing, 59(5), 1997.]] Google ScholarDigital Library
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3), 2002.]] Google ScholarDigital Library
G. Chartrand. Introductory graph theory. New York: Dover, 1985.]]Google Scholar
L. Chen, M. Tamer, and V. Oria. Symbolic representation and retrieval of moving object trajectories. University of Waterloo School of Computer Science Waterloo, Canada, Technical Report CS-2003-30., 2003.]]Google Scholar
C. S. Daw, C. E. A. Finney, and E. R. Tracy. A review of symbolic analysis of experimental data. Review of Scientific Instruments, 74(2), 2003.]]Google ScholarCross Ref
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In proc. of the ACM SIGMOD Int l Conference on Management of Data., 1994.]] Google ScholarDigital Library
P. Geurts. Pattern extraction for time series classification. In Proc. of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, 2001.]] Google ScholarDigital Library
L. Hammel and J. Patel. Searching on the secondary structure of protein sequences. Proceedings of the 28th VLDB Conference, 2002.]] Google ScholarDigital Library
T. Hishiki, N. Collier, C. Nobata, T. Ohta, N. Ogata, T. Sekimizu, R. Steiner, H. Park, and J. Tsujii. Developing nlp tools for genome informatics: An information extraction perspective. In Genome Informatics. Universal Academy Press, Inc., Tokyo, Japan, 1998., 1998.]]Google Scholar
Y. Huang and P. S. Yu. Adaptive query processing for time-series data. In Proc. of the 5th Int'l Conference on Knowledge Discovery and Data Mining., 1999.]] Google ScholarDigital Library
Y. Huhtala, J. Karkkainen, and H. Toivonen. Mining for similarities in aligned time series using wavelets. Data Mining and Knowlege Discovery: Theory, Tools, and Technology, SPIE Proceeding Series, 1999.]]Google Scholar
E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. Proceedings of the 27th VLDB Conference, 2001.]] Google ScholarDigital Library
P. Indyk, N. Koudas, and S. Muthukrishnan. Indentifying representation trends in massive time series data sets using sketches. In Proc. of the 26th Int'l Conference on Very Large Data Bases(VLDB)., 2000.]] Google ScholarDigital Library
H. JA and M. BJ. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1), 1982.]]Google Scholar
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. of ACM SIGMOD Conference on Management of Data., 2001.]] Google ScholarDigital Library
E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. SIGKDD, 2002.]] Google ScholarDigital Library
R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. Proceedings of International Conference on Machine Learning (ICML), 2002.]] Google ScholarDigital Library
F. Korn, H. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data., 1997.]] Google ScholarDigital Library
G. R. G. Lanckriet, M. H. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble. Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing, 2004.]]Google Scholar
J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proc. of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003.]] Google ScholarDigital Library
V. Moulton, M. Zuker, M. Steel, R. Pointon, and D. Penny. Metrics on rna secondary structures. Journal of Computational Biology, 2000.]]Google ScholarCross Ref
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 1970.]]Google Scholar
S. Park, S. Kim, and W. W. Chu. Segment-based approach for subsequence searches in sequence databases. In proc. of the 16th ACM Symposium on Applied Computing. Las Vegas., 2001.]] Google ScholarDigital Library
V. Roth, J. Laub, J. Buhmann, and K.-R. Muller. Going metric: Denoising pairwise data. In Neural Information Processing Systems (NIPS), 2002.]]Google Scholar
V. Roth and V. Steinhage. Nonlinear discriminant analysis using kernel functions. NIPS, 1999.]]Google Scholar
B. Scholkopf and A. Smola. Learning with kernels. MIT Press, 2001.]]Google Scholar
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of molecular biology, 1981.]]Google Scholar
V. Vapnik. The nature of statistical learning theory. Springer, NY, 1995.]] Google ScholarDigital Library
C. Watkins. Dynamic alignment kernels. Technical Report CSD-TR-98-11, 1999.]]Google Scholar
G. Wu, Y. Wu, L. Jiao, Y.-F. Wang, and E. Chang. Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance. Proceedings of the 11th Annual ACM International Conference on Multimedia (ACMMM), 2003.]] Google ScholarDigital Library
Y. Wu, C.-Y. Lin, E. Y. Chang, and J. R. Smith. Multimodal kernel fusion for news video concept detection. IEEE International Conference on Image Processing (ICIP), 2004.]]Google Scholar

Index Terms

Distance-function design and fusion for sequence data
1. Computing methodologies
  1. Machine learning

Recommendations

Optimal multimodal fusion for multimedia data analysis
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media ...
Read More
Data fusion

The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information ...
Read More
A New Multi-source Image Sequence Fusion Algorithm Based on SIDWT
ICIG '13: Proceedings of the 2013 Seventh International Conference on Image and Graphics

A new fusion method of infrared and visible video sequence is proposed based on the shift-invariant discrete wavelet transformation (SIDWT). Firstly the approximate target regions of each single-frame infrared image are detected by weighted information ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
November 2004
678 pages
ISBN:1581138741
DOI:10.1145/1031171
General Chair:
David Grossman
Illinois Institute of Technology
,
Program Chairs:
Luis Gravano
Columbia University
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign
,
Otthein Herzog
University of Bremen, Germany
,
David A. Evans
Clairvoyance Corporation
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multiple-view representation
representation-to-semantics mapping
sequence-data mining
sequence-data representation
super-kernel fusion
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,003
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distance-function design and fusion for sequence data

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimal multimodal fusion for multimedia data analysis

Data fusion

A New Multi-source Image Sequence Fusion Algorithm Based on SIDWT

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distance-function design and fusion for sequence data

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimal multimodal fusion for multimedia data analysis

Data fusion

A New Multi-source Image Sequence Fusion Algorithm Based on SIDWT

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media