Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity

Raji Ghawi; Jürgen Pfeffer

doi:10.1515/comp-2019-0011

Open Access Published by De Gruyter Open Access August 8, 2019

Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity

Raji Ghawi and Jürgen Pfeffer

From the journal Open Computer Science

https://doi.org/10.1515/comp-2019-0011

Abstract

In machine learning, hyperparameter tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. Several approaches have been widely adopted for hyperparameter tuning, which is typically a time consuming process. We propose an efficient technique to speed up the process of hyperparameter tuning with Grid Search. We applied this technique on text categorization using kNN algorithm with BM25 similarity, where three hyperparameters need to be tuned. Our experiments show that our proposed technique is at least an order of magnitude faster than conventional tuning.

Keywords: hyperparameter tuning; text categorization; grid search; kNN; BM25

References

[1] Claesen M., Moor B.D., Hyperparameter Search in Machine Learning, CoRR, abs/1502.02127, 2015Search in Google Scholar

[2] Bergstra J., Bengio Y., Random Search for Hyper-parameter Optimization, Journal of Machine Learning Research, 13(1), 2012, 281–305Search in Google Scholar

[3] Chapelle O., Vapnik V., Bousquet O., Mukherjee S., Choosing Multiple Parameters for Support Vector Machines, Machine Learning, 46(1-3), 2002, 131–159, 10.1023/A:101245032738710.1023/A:1012450327387Search in Google Scholar

[4] Do C.B., Foo C.S., Ng A.Y., Efficient Multiple Hyperparameter Learning for Log-linear Models, In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, Curran Associates Inc., USA, 2007, 377–384Search in Google Scholar

[5] Wang Z., Hutter F., Zoghi M., Matheson D., De Freitas N., Bayesian Optimization in a Billion Dimensions via Random Embeddings, Journal of Artificial Intelligence Research, 55(1), 2016, 361–38710.1613/jair.4806Search in Google Scholar

[6] Bergstra J., Bardenet R., Bengio Y., Kégl B., Algorithms for Hyper-parameter Optimization, In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, Curran Associates Inc., USA, 2011, 2546–2554Search in Google Scholar

[7] Snoek J., Larochelle H., Adams R.P., Practical Bayesian Optimization of Machine Learning Algorithms, In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, NIPS’12, USA, 2012, 2951–2959Search in Google Scholar

[8] Robertson S., Walker S., Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval, In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘94, Springer-Verlag New York, Inc., New York, NY, USA, 1994, 232–24110.1007/978-1-4471-2099-5_24Search in Google Scholar

[9] Robertson S., Zaragoza H., The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends in Information Retrieval, 3(4), 2009, 333–389, 10.1561/150000001910.1561/1500000019Search in Google Scholar

[10] Yang Y., Liu X., A Re-examination of Text Categorization Methods, in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 1999, 42–4910.1145/312624.312647Search in Google Scholar

[11] Masand B., Linoff G., Waltz D., Classifying News Stories Using Memory Based Reasoning, In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’92, ACM, New York, NY, USA, 1992, 59–65, 10.1145/133160.13317710.1145/133160.133177Search in Google Scholar

[12] Yang Y., Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, in B.W. Croft, C.J. van Rijsbergen, eds., SIGIR ’94, Springer London, London, 1994, 13–2210.1007/978-1-4471-2099-5_2Search in Google Scholar

[13] Iwayama M., Tokunaga T., Cluster-based Text Categorization: A Comparison of Category Search Strategies, In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’95, ACM, New York, NY, USA, 1995, 273–280, 10.1145/215206.21537110.1145/215206.215371Search in Google Scholar

[14] Domhan T., Springenberg J.T., Hutter F., Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves, In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, AAAI Press, 2015, 3460–3468Search in Google Scholar

[15] Bengio Y., Gradient-Based Optimization of Hyperparameters, Neural Computation, 12, 2000, 1889–190010.1162/089976600300015187Search in Google Scholar PubMed

[16] Larsen J., Hansen L.K., Svarer C., Ohlsson M., Design and regularization of neural networks: the optimal use of a validation set, in Neural Networks for Signal Processing [1996] VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop, IEEE, 1996, 62–71Search in Google Scholar

[17] Maclaurin D., Duvenaud D., Adams R., Gradient-based hyper-parameter optimization through reversible learning, in International Conference on Machine Learning, 2015, 2113–2122Search in Google Scholar

[18] Jones D.R., Schonlau M., Welch W.J., Efficient Global Optimization of Expensive Black-Box Functions, Journal of Global Optimization, 13(4), 1998, 455–492, 10.1023/A:100830643114710.1023/A:1008306431147Search in Google Scholar

[19] Brochu E., Cora V.M., de Freitas N., A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning, CoRR, abs/1012.2599, 2010Search in Google Scholar

[20] Hutter F., Hoos H.H., Leyton-Brown K., Sequential Model-Based Optimization for General Algorithm Configuration, in C.A.C. Coello, ed., Learning and Intelligent Optimization, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, 507–52310.1007/978-3-642-25566-3_40Search in Google Scholar

[21] Thornton C., Hutter F., Hoos H.H., Leyton-Brown K., Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2013, 847–85510.1145/2487575.2487629Search in Google Scholar

[22] Aggarwal C.C., Zhai C., A Survey of Text Classification Algorithms, Springer US, Boston, MA, 2012, 163–222, 10.1007/978-1-4614-3223-4_610.1007/978-1-4614-3223-4_6Search in Google Scholar

[23] Yang Y., Chute C.G., An Example-based Mapping Method for Text Categorization and Retrieval, ACM Trans. Inf. Syst., 12(3), 1994, 252–277, 10.1145/183422.18342410.1145/183422.183424Search in Google Scholar

[24] Cohen W.W., Hirsh H., Joins that Generalize: Text Classification Using WHIRL, In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, New York, USA, August 27-31, 1998, 1998, 169–173Search in Google Scholar

[25] Han E.H., Karypis G., Kumar V., Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification, In Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ’01, Springer-Verlag, Berlin, Heidelberg, 2001, 53–6510.1007/3-540-45357-1_9Search in Google Scholar

[26] Guo G., Wang H., Bell D., Bi Y., Greer K., Using kNN Model for Automatic Text Categorization, Soft Computing, 10(5), 2006, 423–430, 10.1007/s00500-005-0503-y10.1007/s00500-005-0503-ySearch in Google Scholar

[27] Chakrabarti S., Dom B., Agrawal R., Raghavan P., Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases, In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997, 446–455Search in Google Scholar

[28] Koller D., Sahami M., Hierarchically Classifying Documents Using Very Few Words, In Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97, Morgan Kauf-mann Publishers Inc., San Francisco, CA, USA, 1997, 170–178Search in Google Scholar

[29] Lewis D.D., Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval, In Proceedings of the 10th European Conference on Machine Learning, ECML’98, Springer-Verlag, Berlin, Heidelberg, 1998, 4–15, 10.1007/BFb002666610.1007/BFb0026666Search in Google Scholar

[30] Lewis D.D., Catlett J., Heterogeneous uncertainty sampling for supervised learning, in Machine Learning Proceedings, Elsevier, 1994, 148–15610.1016/B978-1-55860-335-6.50026-XSearch in Google Scholar

[31] Cohen W.W., Singer Y., Context-sensitive Learning Methods for Text Categorization, ACM Trans. Inf. Syst., 17(2), 1999, 141–173, 10.1145/306686.30668810.1145/306686.306688Search in Google Scholar

[32] Ng H.T., Goh W.B., Low K.L., Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization, In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’97, ACM, New York, NY, USA, 1997, 67–73, 10.1145/258525.25853710.1145/258525.258537Search in Google Scholar

[33] Wiener E., O. Pedersen J., S. Weigend A., A Neural Network Approach to Topic Spotting, SDAIR, 1995, 317–332Search in Google Scholar

[34] Ruiz M.E., Srinivasan P., Hierarchical Neural Networks for Text Categorization, In Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, ACM, New York, NY, USA, 1999, 281–282, 10.1145/312624.31270010.1145/312624.312700Search in Google Scholar

[35] Kowsari K., Brown D.E., Heidarysafa M., Meimandi K.J., Gerber M.S., Barnes L.E., HDLTex: Hierarchical Deep Learning for Text Classification, CoRR, abs/1709.08267, 201710.1109/ICMLA.2017.0-134Search in Google Scholar

[36] Joachims T., Text Categorization with Support Vector Machines: Learning with Many Relevant Features, In Proceedings of the 10th European Conference on Machine Learning, ECML’98, Springer-Verlag, Berlin, Heidelberg, 1998, 137–142, 10.1007/BFb002668310.1007/BFb0026683Search in Google Scholar

[37] Joachims T., A Statistical Learning Learning Model of Text Classification for Support Vector Machines, In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, ACM, New York, NY, USA, 2001, 128–136, 10.1145/383952.38397410.1145/383952.383974Search in Google Scholar

[38] Zhang W., Yoshida T., Tang X., Text Classification Based on Multi-word with Support Vector Machine, Know.-Based Syst., 21(8), 2008, 879–886, 10.1016/j.knosys.2008.03.04410.1016/j.knosys.2008.03.044Search in Google Scholar

[39] Yang Y., An Evaluation of Statistical Approaches to Text Categorization, Information Retrieval, 1(1), 1999, 10.1023/A:1009982220290Search in Google Scholar

[40] Dasarathy B.V., Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, CA, 1991Search in Google Scholar

[41] Yang Y., A Study of Thresholding Strategies for Text Categorization, In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, New York, NY, USA, 2001, 137–145, 10.1145/383952.38397510.1145/383952.383975Search in Google Scholar

[42] Salton G., Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1989Search in Google Scholar

[43] Fang H., Tao T., Zhai C., A Formal Study of Information Retrieval Heuristics, In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’04, ACM, New York, NY, USA, 2004, 49–56, 10.1145/1008992.100900410.1145/1008992.1009004Search in Google Scholar

[44] Singhal A., Buckley C., Mitra M., Pivoted Document Length Normalization, In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’96, ACM, New York, NY, USA, 1996, 21–29, 10.1145/243199.24320610.1145/243199.243206Search in Google Scholar

[45] He B., Ounis I., Term Frequency Normalisation Tuning for BM25 and DFR Models, in D.E. Losada, J.M. Fernández-Luna, eds., Advances in Information Retrieval, Springer Berlin Heidelberg, 2005, 200–21410.1007/978-3-540-31865-1_15Search in Google Scholar

[46] Murata M., Kanamaru T., Shirado T., Isahara H., Using the K Nearest Neighbor Method and BM25 in the Patent Document Categorization Subtask at NTCIR-5, In Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NII, Tokyo, Japan, 2005, 324–331Search in Google Scholar

[47] Geng X., Liu T.Y., Qin T., Arnold A., Li H., Shum H.Y., Query Dependent Ranking Using K-nearest Neighbor, In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, ACM, New York, NY, USA, 2008, 115–122, 10.1145/1390334.139035610.1145/1390334.1390356Search in Google Scholar

[48] Hu H., Zhu R., Wang Y., Feng W., Tan X., Huang J.X., A Best Match KNN-based Approach for Large-scale Product Categorization, in ACM SIGIR Workshop on eCommerce (SIGIR 2018 eCom Data Challenge), ACM, New York, NY, USA, 2018Search in Google Scholar

[49] Wang X.l., Zhao H., Lu B.l., Enhanced K-Nearest Neighbour Algorithm for Large-scale Hierarchical Multi-label Classification, In Proceedings of the Joint ECML/PKDD PASCAL Workshop on Large Scale Hierarchical Classification, volume 5, Athens, Greece, 2011Search in Google Scholar

[50] He B., Ounis I., A Study of Parameter Tuning for Term Frequency Normalization, In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, ACM, New York, NY, USA, 2003, 10–16, 10.1145/956863.95686710.1145/956863.956867Search in Google Scholar

[51] Taylor M., Zaragoza H., Craswell N., Robertson S., Burges C., Optimisation Methods for Ranking Functions with Multiple Parameters, In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM ’06, ACM, New York, NY, USA, 2006, 585–593, 10.1145/1183614.118369810.1145/1183614.1183698Search in Google Scholar

[52] Moniz N., Torgo L., Multi-Source Social Feedback of Online News Feeds, CoRR, 2018Search in Google Scholar

[53] Greene D., Cunningham P., Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering, In Proc. 23rd International Conference on Machine learning (ICML’06), ACM Press, 2006, 377–38410.1145/1143844.1143892Search in Google Scholar

Received: 2019-02-20

Accepted: 2019-07-18

Published Online: 2019-08-08

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity

Abstract

References

Journal and Issue

Articles in the same Issue