ABSTRACT
We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.
- 1.R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, 207-216, 1993.]] Google ScholarDigital Library
- 2.R. Bayardo. Efficiently mining long patterns from databases, in: SIGMOD 1998: Proceedings of ACM SIGMOD International Conference on Management of Data, 85-93, 1998.]] Google ScholarDigital Library
- 3.L. Dehaspe, H. Toivonen, R.D. King. Finding frequent substructures in chemical compounds, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 30-36, AAAI press, 1998.]]Google ScholarDigital Library
- 4.L. Dehaspe, H. Toivonen. Discovery of frequent datalog patterns, in Data Mining and Knowledge Discovery 3(1):7-36, 1999.]] Google ScholarDigital Library
- 5.L. De Raedt. A logical database mining query language, in: Proceedings of the lOt.h Inductive Logic Programming Conference, 78-92, Lecture Notes in Artificial Intelligence, Vol. 1866, Springer Verlag, 2000.]] Google ScholarDigital Library
- 6.L. De Raedt, S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), 2001.]]Google Scholar
- 7.D. Gunopulos, H. Mannila, S. Saluja. Discovering all most specific sentences by randomized algorithms. In F.N. Afrati, P. Kolaitis (eds.): Database Theory - ICDT '97, 6th International Conference, 215-229, Lecture Notes in Computer Science 1186, Springer, 1997.]] Google Scholar
- 8.J. Han, L. V. S. Lakshmanan, R. T. Ng. Constraint-based, multidimensional data mining. Computer, Vol. 32(8):46-50, 1999.]] Google ScholarDigital Library
- 9.H. Hirsh. Generalizing version spaces. Machine Learning, Vol. 17(1):5-46, 1994.]] Google ScholarDigital Library
- 10.T. Imielinski, H. Mannila. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58-64, 1996.]] Google ScholarDigital Library
- 11.C.A. James, D. Weininger, J. Delany. Daylight theory manual - Daylight J. 71, Daylight Chemical Information Systems, 2000. http ://www. daylight, corn/]]Google Scholar
- 12.D.D. Jensen, P.l%. Cohen. Multiple comparisons in induction algorithms. Machine Learning 38(3):309-338, 2000.]] Google ScholarDigital Library
- 13.S. Kramer, L. De Raedt. Feature construction with version spaces for biochemical applications, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML-01), 2001.]] Google ScholarDigital Library
- 14.H. Mannila, H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241-258, 1997.]] Google ScholarDigital Library
- 15.R. Meo, G. Psalla, S. Ceri. An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, 2(2):195-224, 1998.]] Google ScholarDigital Library
- 16.C. Meliish. The description identification problem. Artificial Intelligence, 52(2):151-167, 1991.]] Google ScholarDigital Library
- 17.T.M. Mitchell. Generalization as search, Artificial Intelligence, 18(2), 1982.]]Google Scholar
- 18.A. Inokuchi, T. Washio, H. Motoda. An Apriori-based algorithm for mining frequent substructures from graph data. in: D. Zighed, J. Komorowski, J. Zyktow (eds.), Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-2000), 13-23, Lecture Notes in Artificial Intelligence, Vol. 1910, Springer-Verlag, 2O00.]] Google ScholarDigital Library
- 19.A. Srinivasan, R. King. Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery, 3(1):37-57, 1999.]] Google ScholarDigital Library
- 20.A. Srinivasan, R.D. King, D.W. Bristol. An assessment of submissions made to the predictive toxicology evaluation challenge, in: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), 270-275, 1999.]] Google ScholarDigital Library
- 21.D. Weininger. SMILES 1. Introduction and encoding rules. Journal of Chemical Information and Computer Sciences, 28, 31, 1988.]]Google ScholarDigital Library
- 22.D. Weininger, A. Weininger, J.L Weininger. SMILES II, algorithm for generation of unique SMILES notation. Journal of Chemical Information and Computer Sciences, 29, 97, 1989.]]Google ScholarCross Ref
- 23.Weislow, O.S., R. Kiser, D.L. Fine, J.P. Bader, R.H. Shoemaker, M.K. Boyd. New soluble formazan assay for HIV-1 cytopathic effects: application to high flux screening of synthetic and natural products for AIDS antiviral activity. Journal of the National Cancer Institute, 81:577-586, 1989.]]Google ScholarCross Ref
Index Terms
- Molecular feature mining in HIV data
Recommendations
Computational proteomics analysis of binding mechanisms and molecular signatures of the HIV-1 protease drugs
Objective: Computational proteomics analysis of biomolecular interactions is proposed to determine molecular signatures of the HIV-1 protease inhibitors. A comparative microscopic analysis is conducted for a panel of inhibitors which exemplify a ...
Influence of Mg2+ on the binding modes of HIV-1 integrase with thiazolothiazepine inhibitor studied by molecular simulation
We studied the Influence of Mg^2^+ ions on binding modes of HIV-1 integrase (IN) with thiazolothiazepine (THI) inhibitor by molecular simulation. The results in this work show that the binding process of THI and IN can be divided into two phases, the ...
In Silico-Guided Discovery of Potential HIV-1 Entry Inhibitors Mimicking bNAb N6: Virtual Screening, Docking, Molecular Dynamics, and Post-Molecular Modeling Analysis
Bioinformatics Research and ApplicationsAbstractAn integrated computational approach to in silico drug design was used to identify novel HIV-1 entry inhibitor scaffolds mimicking broadly neutralizing antibody (bNAb) N6 targeting CD4-binding site of the viral gp120 protein. This computer-based ...
Comments