skip to main content
10.1145/502512.502533acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Molecular feature mining in HIV data

Authors Info & Claims
Published:26 August 2001Publication History

ABSTRACT

We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.

References

  1. 1.R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, 207-216, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.R. Bayardo. Efficiently mining long patterns from databases, in: SIGMOD 1998: Proceedings of ACM SIGMOD International Conference on Management of Data, 85-93, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.L. Dehaspe, H. Toivonen, R.D. King. Finding frequent substructures in chemical compounds, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 30-36, AAAI press, 1998.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.L. Dehaspe, H. Toivonen. Discovery of frequent datalog patterns, in Data Mining and Knowledge Discovery 3(1):7-36, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.L. De Raedt. A logical database mining query language, in: Proceedings of the lOt.h Inductive Logic Programming Conference, 78-92, Lecture Notes in Artificial Intelligence, Vol. 1866, Springer Verlag, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.L. De Raedt, S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), 2001.]]Google ScholarGoogle Scholar
  7. 7.D. Gunopulos, H. Mannila, S. Saluja. Discovering all most specific sentences by randomized algorithms. In F.N. Afrati, P. Kolaitis (eds.): Database Theory - ICDT '97, 6th International Conference, 215-229, Lecture Notes in Computer Science 1186, Springer, 1997.]] Google ScholarGoogle Scholar
  8. 8.J. Han, L. V. S. Lakshmanan, R. T. Ng. Constraint-based, multidimensional data mining. Computer, Vol. 32(8):46-50, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.H. Hirsh. Generalizing version spaces. Machine Learning, Vol. 17(1):5-46, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.T. Imielinski, H. Mannila. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58-64, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.C.A. James, D. Weininger, J. Delany. Daylight theory manual - Daylight J. 71, Daylight Chemical Information Systems, 2000. http ://www. daylight, corn/]]Google ScholarGoogle Scholar
  12. 12.D.D. Jensen, P.l%. Cohen. Multiple comparisons in induction algorithms. Machine Learning 38(3):309-338, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.S. Kramer, L. De Raedt. Feature construction with version spaces for biochemical applications, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML-01), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.H. Mannila, H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241-258, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.R. Meo, G. Psalla, S. Ceri. An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, 2(2):195-224, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.C. Meliish. The description identification problem. Artificial Intelligence, 52(2):151-167, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.T.M. Mitchell. Generalization as search, Artificial Intelligence, 18(2), 1982.]]Google ScholarGoogle Scholar
  18. 18.A. Inokuchi, T. Washio, H. Motoda. An Apriori-based algorithm for mining frequent substructures from graph data. in: D. Zighed, J. Komorowski, J. Zyktow (eds.), Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-2000), 13-23, Lecture Notes in Artificial Intelligence, Vol. 1910, Springer-Verlag, 2O00.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.A. Srinivasan, R. King. Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery, 3(1):37-57, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.A. Srinivasan, R.D. King, D.W. Bristol. An assessment of submissions made to the predictive toxicology evaluation challenge, in: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), 270-275, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.D. Weininger. SMILES 1. Introduction and encoding rules. Journal of Chemical Information and Computer Sciences, 28, 31, 1988.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.D. Weininger, A. Weininger, J.L Weininger. SMILES II, algorithm for generation of unique SMILES notation. Journal of Chemical Information and Computer Sciences, 29, 97, 1989.]]Google ScholarGoogle ScholarCross RefCross Ref
  23. 23.Weislow, O.S., R. Kiser, D.L. Fine, J.P. Bader, R.H. Shoemaker, M.K. Boyd. New soluble formazan assay for HIV-1 cytopathic effects: application to high flux screening of synthetic and natural products for AIDS antiviral activity. Journal of the National Cancer Institute, 81:577-586, 1989.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Molecular feature mining in HIV data

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
                August 2001
                493 pages
                ISBN:158113391X
                DOI:10.1145/502512

                Copyright © 2001 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 26 August 2001

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%

                Upcoming Conference

                KDD '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader