Elsevier

Knowledge-Based Systems

Volume 151, 1 July 2018, Pages 95-113
Knowledge-Based Systems

On the predictive analysis of behavioral massive job data using embedded clustering and deep recurrent neural networks

https://doi.org/10.1016/j.knosys.2018.03.025Get rights and content

Abstract

The recent proliferation of social networks as a main source of information and interaction has led to a huge expansion of automatic e-recruitment systems and by consequence the multiplication of web channels (job boards) that are dedicated to job offers disseminating. In a strategic and economic context where cost control is fundamental, it has become necessary to identify the relevant job board for a given new job offer has become necessary. The purpose of this work is to present the recent results that we have obtained on a new job board recommendation system that is a decision-making tool intended to guide recruiters while they are posting a job on the Internet. Firstly, the Doc2Vec embedded representation is used to analyse the textual content of the job offers, then the job applicant clickstreams history on various job boards are stored in a large learning database, and then represented as time series. Secondly, a deep neural network architecture is used to predict future values of the clicks on the job boards. Third, and in parallel, dimensionality reduction techniques are used to transform the clicks numerical time series into temporal symbolic sequences. Forecasting algorithms are then used to predict future symbols for each sequence. Finally, a list of top ranked job boards are kept by maximizing the clickstreams forecasting in both representations. Our experiments are tested on a real dataset, coming from a job-posting database of an industrial partner. The promising results have shown that using deep learning, the recommendation system outperforms standard multivariate models.

Introduction

This work concerns the recruitment market that is composed of three main players: the recruiter, who wishes to find the most suitable candidate with a desired profile; the candidate, looking for a job adapted to her/his profile and her/his professional perspectives; and the intermediaries, that mediate the relationship between the first two actors. Intermediaries in the labour market are the recruitment agencies, the temporary employment agencies, the human resources (HR) communication agencies, the press, the institutional networks, etc. Over the two last decades, another kind of intermediary appeared: the job boards (or job search websites). More formally, many job boards allow the dissemination of the job offers on different Web platforms (University websites, job social networks, business career websites, etc.). Since the arrival of the Internet, the use of web job boards has increased drastically. Between 2006 and 2009, the proportion of managerial positions that were diffused in the Internet has increased by 16%. In 2009, the Internet has been proved to be an essential medium for recruitment, with 82% of employment published therein [1]. Expanding the Internet media for recruitment has led to a multiplication of channels to find candidates. Current e-recruitment systems consider only a part of the recruitment process, concentrating on matching job offers with CVs. However, the selection of the most appropriate job board regarding an offer is also very important for the optimization of this fully digital recruitment process. This is our main contribution, in the SONAR research project (Sourcing and Automated Recruitment)1. At the moment, various questions arise concerning the selection criteria for the relevance of a job board. For example, is the job board relevant if the numbers of offers are increasing in it? Or, simply if the number of visits and/or the number of clicks to view the offers by potential candidates tend to grow compared to those observed in the past? Our main goal is to provide a tool which can help recruiters to (i) select the most relevant job boards for a new job offer, (ii) diffuse more effectively job offers, that is to say at the right place at the right time, (iii) provide tools to connect candidates and job offers automatically.

In this paper, we propose Deep4Job, a job offer recommendation system in which the main contributions concern: (a) the representation of the job offers textual documents in a new embedded space model that allows extracting latent topics and for classifying business categories; (b) the consideration of contextual information such as the job applicants temporal behavior through their clicks on different dissemination links as time series data; (c) by showing how interesting is the use of deep neural networks instead of the probabilistic models, to predict future clicks values; finally (d) by also proposing the use of symbolic temporal sequences that are obtained from the clicks time series using dimensionality reduction methods to analyse the trajectories of the job applicants. These new contributions were evaluated on a real job offers database provided by an industrial partner, as illustrated in Fig. 1. The results seem to be very interesting compared to the state of the art collaborative filtering analysis.

In the next section, we will firstly give a global overview on the existing recommendation systems with their advantages and limits, and afterward we will introduce the general architecture of our proposed Deep4Job system.

Section snippets

Highlights on recommender systems

During the past decade, the variety and number of products and services provided by companies has increased dramatically. Companies produce a large number of products to meet the needs of customers. Although this gives more options to customers, it makes it harder for them to process the large amount of information provided by companies. Recommender systems are designed to help customers by introducing products or services. These products and services are likely preferred by users, based on

Project context and description of the big database

This work is the result of our participation in a FUI2 project called SONAR (Sourcing and Automated Recruitment3), with an industrial partner (MultiPosting)4 that is a leader in the French job market, which has provided us a big archive of job offers, that were disseminated in different job board websites, and also the relative quantity of their visits (clicks) by users (job applicants). The job

Embedded representation of job offers

As reported antecedently, we need to represent the textual job offer documents in a numerical way to make their manipulation with deep neural networks possible. One-hot encoding is the most common, and basic way to turn a token (word) into a vector. It consists in associating a unique integer index to every word, then turning this integer index i into a binary vector of size N, the size of the vocabulary, that would be all-zeros except for the ith entry, which would be 1.

Another popular and

Preliminaries

The estimation of future values in a time series is a very interesting topic in data mining and machine learning. It is commonly done using past values of the same data. Given a job board time series, the forecasting here refers to the process of calculating one of several values ahead X^JBT(N,h), using just the information given by the past values of the time series, X^JBT(N,h)= f(X1JBT,X2JBT,,XNJBT). Time series prediction issues are a difficult type of predictive modelling problem. Unlike

Preliminaries

Predictive models with symbolic sequences concern generally 4 types of problems [47]: Sequence prediction, Sequence Classification, Sequence generation, Sequence-to-sequence Prediction. These models are different from set-based machine learning problems since in a sequence, the order of the observations is explicitly imposed.

Sequence prediction models, also known as sequence learning, involve the prediction of the next value for a given input sequence. They are still a big challenge in pattern

Evaluation and results discussion

In this section we will show the results of the evaluation of the different contributions that we have made in this paper. The assessment protocol involves in a first step the evaluation of the Doc2Vec job offers clustering. In a second step, we will show the evaluation of the forecasting models on the numerical time series, as well as the symbolic sequences. Finally we will show the impact of each contribution on the recommendation system.

Conclusion and perspectives

In this work, we have presented Deep4Job, a big data recommendation system based on the temporal prediction of the clickstreams with time series representation. The system analysis the historical behavior of job applicants in the Internet. We have shown how it was possible to use Doc2Vec embedding representation for extracting topics from large scale job offer documents. Then, we have proposed many prediction algorithms, using deep learning methods. We have implemented two complementary

Funding

This work was supported by the French government and Ile-de France region under a grant for FUI SONAR Project (FUI-AAP15-SONAR) for automatic recruitment tasks.

Availability

The sources and the additional materials are available in https://gitlab.com/opencver91/dl.

Compliance with ethical standards

The authors declare that there is no conflict of interest.

Ethical approval: This article does not contain any studies with human participants or animals performed by the author.

Acknowledgment

The authors would like to thank Multiposting start-up for data sharing.

Thanks to Dr. James Cheney for proofreading the article

References (66)

  • M.N. Jelassi et al.

    Étude du profil utilisateur pour la recommandation dans les folksonomies

  • Recommender Systems, http://www.datasciencecentral.com/profiles/blogs/5-types-of-recommenders. Accessed:...
  • D. Cao et al.

    Cross-platform app recommendation by jointly modeling ratings and texts

    ACM Trans. Inf. Syst.

    (2017)
  • CaoD. et al.

    Embedding factorization models for jointly recommending items and user generated lists

    Proceedings of the Fortieth International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17

    (2017)
  • F. Chollet

    Deep Learning with Python

    (2017)
  • LeCunY. et al.

    Deep learning

    Nature

    (2015)
  • A. Krizhevsky et al.

    Imagenet classification with deep convolutional neural networks

    Proceedings of the Twenty-fifth International Conference on Neural Information Processing Systems, NIPS’12

    (2012)
  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

    CoRR

    (2014)
  • A. Karpathy et al.

    Deep visual-semantic alignments for generating image descriptions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • M. Jaderberg et al.

    Spatial transformer networks

    Proceedings of the Twenty-eighth International Conference on Neural Information Processing Systems, NIPS’15

    (2015)
  • T. Mikolov et al.

    Linguistic regularities in continuous space word representations.

    Proceedings of the HLT-NAACL

    (2013)
  • T. Mikolov

    Recurrent neural network based language model.

    Proceedings of the Interspeech

    (2010)
  • T. Mikolov et al.

    Distributed representations of words and phrases and their compositionality

    Proceedings of the Advances in Neural Information Processing Systems

    (2013)
  • R. Pascanu et al.

    On the difficulty of training recurrent neural networks

    Proceedings of the International Conference on Machine Learning

    (2013)
  • A. Joulin et al.

    Bag of tricks for efficient text classification

    (2016)
  • C. Szegedy et al.

    Going deeper with convolutions

    Proceedings of the Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • HeK. et al.

    Deep residual learning for image recognition

    Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016

    (2016)
  • A. van den Oord et al.

    Wavenet: a generative model for raw audio

    CoRR

    (2016)
  • LiC. et al.

    Recursive deep learning for sentiment analysis over social data

    Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) – Volume 02, WI-IAT ’14

    (2014)
  • Y. Bengio et al.

    A neural probabilistic language model

    J. Mach. Learn. Res.

    (2003)
  • V. Nath et al.

    Autonomous Robotics and Deep Learning

    (2014)
  • D. Silver et al.

    Mastering the game of go with deep neural networks and tree search

    Nature

    (2016)
  • D. Kim et al.

    Convolutional matrix factorization for document context-aware recommendation

    Proceedings of the Tenth ACM Conference on Recommender Systems, RecSys ’16

    (2016)
  • Cited by (28)

    • Electric demand forecasting with neural networks and symbolic time series representations

      2022, Applied Soft Computing
      Citation Excerpt :

      SAX works under the assumption of normality in the original time series and two parameters provided by the algorithm user: the size of the alphabet and the size of the segments. While SAX has offered great results in many applications [15,16], it is a common opinion from various authors the information from just the mean may not suffice depending on its application. Thus, there are many proposals of SAX variants that try to address some of its issues.

    • A fuzzy adaptive zeroing neural network with superior finite-time convergence for solving time-variant linear matrix equations

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Due to the parallel distributed nature and the development of deep learning [18,19] and hardware [20,21], neural dynamic methods have been maturely investigated as powerful alternatives for computation in real-time. As a classic neural network, recurrent neural network (RNN) has abundant practical applications, such as recommendation system [22], disambiguation of polysemic words [23] and sentiment analysis [24], and many researchers devote themselves to this direction with numerous excellent achievements. Gradient neural network (GNN) [25] as a kind of RNN has been successfully applied to a lot of calculation problems [26].

    View all citing articles on Scopus
    View full text