Skip to main content

Introducing Machine Learning Concepts with WEKA

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1418))

Abstract

This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington, MA

    Google Scholar 

  2. Ross Quinlan J (1993) C 4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA

    Google Scholar 

  3. Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4:1633–1649

    Article  CAS  PubMed  Google Scholar 

  4. Ramana J, Gupta D (2010) Machine learning methods for prediction of CDK-inhibitors. PLoS One 5(10):e13357

    Article  PubMed  PubMed Central  Google Scholar 

  5. Buchwald F, Richter L, Kramer S (2011) Predicting a small molecule- kinase interaction map: a machine learning approach. J Cheminform 3:22

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54

    Article  Google Scholar 

  7. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2-3):131–163

    Article  Google Scholar 

  8. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130

    Article  Google Scholar 

  9. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(9):533–536

    Article  Google Scholar 

  11. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27

    Article  Google Scholar 

  12. Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226

    Article  Google Scholar 

  13. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  14. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. International conference on machine learning. Morgan Kaufmann, Bari, Italy

    Google Scholar 

  15. Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer, Berlin, pp 1–15

    Book  Google Scholar 

  16. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  17. Ting KM (1998) Inducing cost-sensitive trees via instance weighting. Principles of data mining and knowledge discovery. Springer, Berlin, pp 139–147

    Book  Google Scholar 

  18. Duda RO, Hart PE (1973) Pattern classification and scene analysis, vol 3. Wiley, New York

    Google Scholar 

  19. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    Article  Google Scholar 

  20. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    Google Scholar 

  21. Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    Article  CAS  PubMed  Google Scholar 

  22. McLachlan GJ, Basford KE (1987) Mixture models: inference and applications to clustering. CRC, New York

    Google Scholar 

  23. Rakesh A, Srikant R (1994) Fast algorithms for mining association rules. International conference on very large databases. Morgan Kaufmann, Santiago de Chile, Chile

    Google Scholar 

  24. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314

    Google Scholar 

  25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tony C. Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Smith, T.C., Frank, E. (2016). Introducing Machine Learning Concepts with WEKA. In: Mathé, E., Davis, S. (eds) Statistical Genomics. Methods in Molecular Biology, vol 1418. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3578-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3578-9_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3576-5

  • Online ISBN: 978-1-4939-3578-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics