A reinforcement learning methodology for a human resource planning problem considering knowledge-based promotion

https://doi.org/10.1016/j.simpat.2015.07.004Get rights and content

Abstract

This paper addresses a combined problem of human resource planning (HRP) and production-inventory control for a high-tech industry, wherein the human resource plays a critical role. The main characteristics of this resource are the levels of “knowledge” and the learning process. The learning occurs during the production process in which a worker can promote to the upper knowledge level. Workers in upper levels have more productivity in the production. The objective is to maximize the expected profit by deciding on the optimal numbers of workers in various knowledge levels to fulfill both production and training requirement. As taking an action affects next periods’ decisions, the main problem is to find the optimal hiring policy of non-skilled workers in long-time horizon. Thus, we develop a reinforcement learning (RL) model to obtain the optimal decision for hiring workers under the demand uncertainty. The proposed interval-based policy of our RL model, in which for each state there are multiple choices, makes it more flexible. We also embed some managerial issues such as layoff and overtime-working hours into the model. To evaluate the proposed methodology, stochastic dynamic programming (SDP) and a conservative method implemented in a real case study are used. We study all these methods in terms of four criteria: average obtained profit, average obtained cost, the number of new-hired workers, and the standard deviation of hiring policies. The numerical results confirm that our developed method end up with satisfactory results compared to two other approaches.

Introduction

The main goal of a production plant or a service supplier in a developed and developing country is to have more shares in the internal or external markets especially where different competitors exist. To reach this valuable goal, a company should efficiently utilize all different resources such as workforces and facilities such that it could meet the required satisfaction of customers. As the level of satisfaction is usually changing with the enhancement of technology, all different operations accomplished need to be based on up-to-date knowledge. This is called knowledge intensive operations. Among different important resources employed in knowledge intensive operations, human resource is more critical because people and their knowledge are the most strategic resource for firms [2].

One of the main issues in human resource planning (HRP) is staffing and recruitment decision-making to provide enough qualified manpower for producing high quality products or giving superior services. Recruitment is usually a mid-term or even long-term decision which can really affect the near future of the company and its success. Furthermore, human resources, as a strategic and valuable asset, possess the knowledge and skills which are substantially necessary to move a company toward its predefined goals. In other words, one of the important aspects of HRP is to determine the required number of workers in different knowledge levels (e.g., new-hired, semi-skilled, and skilled workers) that should be utilized in various parts of the production process in a company. This is in fact a way of improving the utilization of knowledge resources toward a better efficiency.

There are few quantitative approaches employed to cope with staffing problems for knowledge-intensive operations. One of the pioneering works in the area of human resources planning in a knowledge-based situation has been proposed by Bordoloi and Matsuo [7]. They proposed a model obtaining the number of different needed knowledge level. They also embedded employee’s learning and turnover rate into their optimization model to find the better recruitment decisions where demand is non-deterministic. The learning occurs during the production process in which a worker can be transferred from a lower knowledge level to the upper one (e.g., from the first level to the second one) after some periods. Furthermore, turnover in a company is defined as the rate of losing its workers in each knowledge level (semi-skilled or skilled level) at the end of each period [20]. When a company loses skilled workers in the upper levels, it cannot be directly compensated. This means that the company is only able to do the demand satisfaction by recruiting workers in the first level (new-hired workers). They used the chance-constraint method to tackle the high uncertainty of demand and the high volatility of knowledge workers in the last two levels. Their method will fail if we want to address the production-inventory control problem. This also ends up with the static hiring policy (i.e., the hiring rate is constant for all periods in the real-time decision-making process) which is very conservative (i.e., the policy is obtained for a pessimistic situation). Furthermore, the layoff has not been considered in their model.

Given this fact that the demand is stochastic and unknown at the time of decision making, there is a possibility of not satisfying demand (we called this slack or shortfall hereafter) which is assumed to be lost sale in our paper. There is also another choice to construct a physical buffer to store the remaining goods for a situation in which the demand is more than the production level so that the extra demand can be met using the stored stock. Of course, it might be possible to compensate the stock-out using an overtime working shift with existing workers. By considering these managerial issues (i.e., overtime working hours, slack/shortfall, surplus, and layoff to the mathematical optimization model), the planning model based on the knowledge-intensive workers would be more compatible to what happens in reality and the final hiring policy would be more useful for managers and beneficial for the respective company.

To address all the aforementioned issues, this paper contributes three important goals. First, this paper proposes a new optimization model in which the inventory level is also taken into consideration. This consideration makes our model more compatible to the reality, so it would lead to more proper decisions. Second, in order to efficiently solve this model, we develop a reinforcement learning (RL) method. Furthermore, to have a more applicable decision policy we achieve optimal interval decisions instead of single ones for every state using a modified version of the value iteration technique as a well-known approach in stochastic dynamic programming (SDP). This makes the optimization model more flexible as it gives multi choices to the decision-maker. It is worth mentioning that all the respective information about the demand are used to find the optimal hiring policy while in the chance-constraints approach (i.e., the basic model proposed by Bordoloi and Matsuo [7]) only the mean and the standard deviation of the stochastic demand are used. We will refer to their proposed model in more details in the rest of the paper and compare the results of our two models and their model using the data obtained from a semiconductor equipment manufacturing company.

This paper is organized as follows: Section 2 provides a review on the related literature and introduces some papers related to the subject topic. Section 3 includes basic definitions of SDP and RL. In Section 4, we describe the production-inventory control problem and our proposed methods. Sections 5 Numerical results, 6 Conclusion and future work present numerical results and conclusion, respectively.

Section snippets

Related work

Workforce planning determines the required level of workers by which the strategic goals of an organization/company could be achieved. Bulla and Scott [9] defined it as a process in which the required level for the human resource in an organization is properly identified and efficient plans for satisfying those requirements are designed. Khoong [27] specified manpower planning as a core of HRP, which is supported by other aspects of HRP.

Different mathematical approaches have been used in HRP

Stochastic dynamic programming and reinforcement learning

Stochastic dynamic programming as a well-known optimization methodology which can cope with uncertain situation breaks down complex problems (e.g., non-linear, non-convex, and non-continues) into easier sub-problems [6]. However, it suffers from double curses of dimensionality and modeling for large-scale applications. Different techniques in SDP have been developed; however, value iteration (VI) approach is widely used in varied practical problems because it can approximate the true value

Optimization model

Bordoloi and Matsuo [7] addressed a linear program to model an assembly line for which the operations in its front-end stage, called stage A, should be completed prior to operations in the back-end stage, called stage B. Workers in the first knowledge level (new-hired) and in the second knowledge level (semi-experienced) perform the operations in stage A and B, respectively. Workers in the third knowledge level (fully experienced) are assigned to the production stages for training lower-level

Numerical results

To evaluate our proposed methods, we used a unique case study that is briefly explained in the next subsection.

Conclusion and future work

This paper focuses on a combined problem of human resource planning (HRP) and production-inventory control in knowledge-intensive industry. The main characteristics of human resource in this industry are levels of “knowledge” and learning process. Thus, the objective is to maximize the expected profit by finding the optimal numbers of workers in various knowledge levels to fulfill both production and training requirement. When a company loses skilled workers in the upper levels, it cannot be

References (34)

  • E.J. Pinker et al.

    Optimizing the use of contingent labour when demand is uncertain

    Eur. J. Oper. Res.

    (2003)
  • S.J. Sadjadi et al.

    A new nonlinear stochastic staff scheduling model

    Scientia Iranica E

    (2011)
  • H.S. Ahn et al.

    Staffing decisions for heterogeneous workers, with turnover

    Math. Meth. Oper. Res.

    (2005)
  • M. Armstrong, A Handbook of Human Resource Management Practice, tenth ed., London, UK,...
  • D.J. Bartholomew et al.

    Statistical Techniques for Manpower Planning

    second ed.

    (1991)
  • R. Bellman

    Dynamic Programming

    (1957)
  • S.K. Bordoloi

    A control rule for recruitment planning in engineering consultancy

    J. Prod. Anal.

    (2006)
  • Cited by (14)

    • Dynamic assignment of a multi-skilled workforce in job shops: An approximate dynamic programming approach

      2023, European Journal of Operational Research
      Citation Excerpt :

      Sotskov, Gholami, & Werner (2013) proposes a conflict resolution RL algorithm to define dispatching rules with a two-phase algorithm (learning and exploration), while Zhou & jun Yang (2019) study the machine sequence definition with a two-part genetic algorithm with simulation based fit testing, and Kuhnle, Schfer, Stricker, & Lanza (2019) show general guidelines to implement RL algorithms in the solution of JSSP with a special focus on Artificial Neural Networks. Furthermore, one of the most common methods among RL solution approaches is the application of the learning algorithm using Q-Learning, as a way to approximate the value functions that simultaneously relate the state and decision (see Karimi-Majd, Mahootchi, & Zakery, 2017; Shahrabi, Adibi, & Mahootchi, 2017; Shiue, Lee, & Su, 2018 and Aissani, Beldjilali, & Trentesaux, 2009), which can be specifically applied to manufacturing systems with multi-machine and multi-technician settings (Qu, Wang, Govil, & Leckie, 2016). These DP and RL papers consider the scheduling problem directly or indirectly, but they have specific differences when compared to our proposed approach.

    • A multi-tier architecture for data analytics in smart metering systems

      2020, Simulation Modelling Practice and Theory
      Citation Excerpt :

      The learners learn by trial and error. RL is a set formed by agents, environments, actions, and rewards [11]. The RL implementation of this work is based on a Markov Decision Process (MDP).

    • Evaluating deep models for absenteeism prediction of public security agents

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      Although not aimed at investigating the absenteeism phenomenon, such a study provided evidence that it is feasible to predict at least some kinds of workers behavior by analyzing their characteristics. Karimi-Majd et al. [9] developed a reinforcement learning model to maximize the expected profit for the high-tech industry. By taking into account that during the production process a worker can promote to the upper knowledge level and consequently increase its productivity, the proposed model decides the optimal numbers of workers in various knowledge levels to fulfill both production and training requirements.

    View all citing articles on Scopus
    1

    Tel.: +98 21 64545387.

    View full text