Article

Molecular feature mining in HIV data

Authors:
Stefan Kramer

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
View Profile

,
Luc De Raedt

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
View Profile

,
Christoph Helma

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 136–143https://doi.org/10.1145/502512.502533

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 136–143

ABSTRACT

We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.

References

1.R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, 207-216, 1993.]] Google ScholarDigital Library
2.R. Bayardo. Efficiently mining long patterns from databases, in: SIGMOD 1998: Proceedings of ACM SIGMOD International Conference on Management of Data, 85-93, 1998.]] Google ScholarDigital Library
3.L. Dehaspe, H. Toivonen, R.D. King. Finding frequent substructures in chemical compounds, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 30-36, AAAI press, 1998.]]Google ScholarDigital Library
4.L. Dehaspe, H. Toivonen. Discovery of frequent datalog patterns, in Data Mining and Knowledge Discovery 3(1):7-36, 1999.]] Google ScholarDigital Library
5.L. De Raedt. A logical database mining query language, in: Proceedings of the lOt.h Inductive Logic Programming Conference, 78-92, Lecture Notes in Artificial Intelligence, Vol. 1866, Springer Verlag, 2000.]] Google ScholarDigital Library
6.L. De Raedt, S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), 2001.]]Google Scholar
7.D. Gunopulos, H. Mannila, S. Saluja. Discovering all most specific sentences by randomized algorithms. In F.N. Afrati, P. Kolaitis (eds.): Database Theory - ICDT '97, 6th International Conference, 215-229, Lecture Notes in Computer Science 1186, Springer, 1997.]] Google Scholar
8.J. Han, L. V. S. Lakshmanan, R. T. Ng. Constraint-based, multidimensional data mining. Computer, Vol. 32(8):46-50, 1999.]] Google ScholarDigital Library
9.H. Hirsh. Generalizing version spaces. Machine Learning, Vol. 17(1):5-46, 1994.]] Google ScholarDigital Library
10.T. Imielinski, H. Mannila. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58-64, 1996.]] Google ScholarDigital Library
11.C.A. James, D. Weininger, J. Delany. Daylight theory manual - Daylight J. 71, Daylight Chemical Information Systems, 2000. http ://www. daylight, corn/]]Google Scholar
12.D.D. Jensen, P.l%. Cohen. Multiple comparisons in induction algorithms. Machine Learning 38(3):309-338, 2000.]] Google ScholarDigital Library
13.S. Kramer, L. De Raedt. Feature construction with version spaces for biochemical applications, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML-01), 2001.]] Google ScholarDigital Library
14.H. Mannila, H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241-258, 1997.]] Google ScholarDigital Library
15.R. Meo, G. Psalla, S. Ceri. An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, 2(2):195-224, 1998.]] Google ScholarDigital Library
16.C. Meliish. The description identification problem. Artificial Intelligence, 52(2):151-167, 1991.]] Google ScholarDigital Library
17.T.M. Mitchell. Generalization as search, Artificial Intelligence, 18(2), 1982.]]Google Scholar
18.A. Inokuchi, T. Washio, H. Motoda. An Apriori-based algorithm for mining frequent substructures from graph data. in: D. Zighed, J. Komorowski, J. Zyktow (eds.), Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-2000), 13-23, Lecture Notes in Artificial Intelligence, Vol. 1910, Springer-Verlag, 2O00.]] Google ScholarDigital Library
19.A. Srinivasan, R. King. Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery, 3(1):37-57, 1999.]] Google ScholarDigital Library
20.A. Srinivasan, R.D. King, D.W. Bristol. An assessment of submissions made to the predictive toxicology evaluation challenge, in: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), 270-275, 1999.]] Google ScholarDigital Library
21.D. Weininger. SMILES 1. Introduction and encoding rules. Journal of Chemical Information and Computer Sciences, 28, 31, 1988.]]Google ScholarDigital Library
22.D. Weininger, A. Weininger, J.L Weininger. SMILES II, algorithm for generation of unique SMILES notation. Journal of Chemical Information and Computer Sciences, 29, 97, 1989.]]Google ScholarCross Ref
23.Weislow, O.S., R. Kiser, D.L. Fine, J.P. Bader, R.H. Shoemaker, M.K. Boyd. New soluble formazan assay for HIV-1 cytopathic effects: application to high flux screening of synthetic and natural products for AIDS antiviral activity. Journal of the National Cancer Institute, 81:577-586, 1989.]]Google ScholarCross Ref

Index Terms

Molecular feature mining in HIV data
1. Applied computing
  1. Life and medical sciences
  2. Physical sciences and engineering
    1. Mathematics and statistics
2. Information systems
  1. Information systems applications
    1. Decision support systems
      1. Data analytics

Recommendations

Computational proteomics analysis of binding mechanisms and molecular signatures of the HIV-1 protease drugs

Objective: Computational proteomics analysis of biomolecular interactions is proposed to determine molecular signatures of the HIV-1 protease inhibitors. A comparative microscopic analysis is conducted for a panel of inhibitors which exemplify a ...
Read More
Influence of Mg2+ on the binding modes of HIV-1 integrase with thiazolothiazepine inhibitor studied by molecular simulation

We studied the Influence of Mg^2^+ ions on binding modes of HIV-1 integrase (IN) with thiazolothiazepine (THI) inhibitor by molecular simulation. The results in this work show that the binding process of THI and IN can be divided into two phases, the ...
Read More
In Silico-Guided Discovery of Potential HIV-1 Entry Inhibitors Mimicking bNAb N6: Virtual Screening, Docking, Molecular Dynamics, and Post-Molecular Modeling Analysis
Bioinformatics Research and Applications
Abstract
An integrated computational approach to in silico drug design was used to identify novel HIV-1 entry inhibitor scaffolds mimicking broadly neutralizing antibody (bNAb) N6 targeting CD4-binding site of the viral gp120 protein. This computer-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 156
  Total Citations
  View Citations
- 1,090
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Molecular feature mining in HIV data

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Computational proteomics analysis of binding mechanisms and molecular signatures of the HIV-1 protease drugs

Influence of Mg2+ on the binding modes of HIV-1 integrase with thiazolothiazepine inhibitor studied by molecular simulation

In Silico-Guided Discovery of Potential HIV-1 Entry Inhibitors Mimicking bNAb N6: Virtual Screening, Docking, Molecular Dynamics, and Post-Molecular Modeling Analysis