Zum Inhalt

Chinese University Major Decision and Its Effect on Wages: Modeling Interaction Between Major Specificity and Education-Job Relevancy Using Machine Learning Approaches

  • Open Access
  • 10.01.2025
  • Research Article
Erschienen in:

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Die Studie untersucht, wie wichtige Entscheidungen an Universitäten und die Relevanz von Ausbildung und Beruf die Löhne in China beeinflussen. Dabei werden Modelle des maschinellen Lernens genutzt, um einen einzigartigen Datensatz von Absolventen aus Nanjing zu analysieren. Sie vertieft das komplexe Wechselspiel zwischen wichtigen Besonderheiten, Universitätsniveau und familiärem Hintergrund und bietet Einblicke in soziale Gerechtigkeit und Klassenmobilität auf dem chinesischen Arbeitsmarkt. Die Ergebnisse stellen die gängige Meinung in Frage und liefern ein differenziertes Verständnis der Lohnbestimmungsfaktoren für chinesische Hochschulabsolventen.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

China has undergone dramatic demographic changes in recent years. First, contemporary China shares East Asia’s low fertility rate, a problem that has been exacerbated by the One Child Policy. This policy, implemented from the 1970s, resulted in the total number of newborn babies plummeting from 25.25 million in 1987 to 9.56 million in 2022 [43]. Second, the population’s level of human capital has considerably improved, as the number of bachelor’s degree holders increased substantially from 16.12 million in 1990 to 218.36 million in 2021 [42]. Consequently, Di, a Chinese political economy scholar, forecasts a drastic decrease in China’s marginal supply of low-educated labor in the next five to eight years, alongside a dramatic increase in the number of students enrolled in colleges [17]. In a recent interview, Di further stated: “I am not worried about the unemployment of the (low-educated) migrant workers. The real unemployment problem we are facing is the college graduate population” [18].
China has the world’s biggest labor market and produces around ten million fresh graduates annually [45]. This is two times as high as in the United States (U.S.). Each year these graduates go through fierce competition in the labor market, driven by an increase in the number of graduates of about 200,000 each year since 2008 [42]. Besides having a huge labor force, China is a new emerging market that has transformed from central planning to a market economy since the reforms of the late 1970s. Although most college graduates go to jobs in the private sector, the public sector also offers a considerable number of jobs with more generous benefits.
Unlike most labor markets elsewhere, some political and policy legacies from the old centrally-planned economy still exist, and are believed to play a role in social stratification, including through their influence on the job search process. One such legacy is hukou, China’s “household registration system”, which classifies residents by origin or birthplace and discriminates against individuals from rural villages compared to those from more urbanized townships, counties, prefecture-level cities and municipalities [39, 58]. This policy dates back to ancient China’s Zhou Dynasty (1046 − 256 BCE), and became a highly rigid system during the Mao era (1949–1976) after the founding of the People’s Republic of China [15]. To facilitate population and migration control, the hukou policy dispatched the urban unemployed from ghettos to the countryside in the 1950s. Hukou also confines people to the place where they are registered. During the Mao era, hukou status and location determined every citizen’s quality of life, including their housing, rations, job, and welfare. Residents with an urban hukou were assigned to the industrial sector, and labeled as working class, employed in state-owned enterprises, or employed in other public sector units. Residents with a rural hukou worked in collective farms as agricultural workers. Since 1978, the hukou policy has been gradually relaxed, following Deng Xiaoping’s market-oriented reforms. Nowadays, voluntary migration is not restrained by hukou status, but the system and the local political-economic ecology still affect individuals’ medical care, education, and housing subsidy [12, 14, 57, 59]. Another unique feature of Chinese society in this context lies in the higher education sector, in particular its strong adherence to government strategies (such as the 211 project and the 985 project) and official ideologies [46, 60]. Admission to university depends on an individual’s score in the national entrance examination (gaokao). In addition, tuition fees at flagship universities in China are lower than their counterparts elsewhere. Therefore, on the one hand, China’s unique education system is a reflection of the country’s governance; on the other hand, it may facilitate social equity through educational opportunities [33, 60].
University rankings are also relevant to the study of wages. In general, enrollments are governed by college entrance examination scores, which therefore offer a very reliable measure of Chinese university rankings. Unlike their U.S. counterparts, China’s private universities and colleges have a relatively short history (the first private university in China was founded in 1987) and are mainly vocational institutions. Most students in China’s private colleges or universities come from the middle and lower levels of the academic hierarchy [37].
In terms of public schools, flagship universities or colleges comprise those that are listed in the official 211 Project, 985 Project or the most recent Double-First-Class Project. The 211 Project, announced by the Ministry of Education and initiated in 1995, aims to enhance the research standard of 100 universities; the 985 Project, launched in May 1995, aimed to realize world-class ambitions by investing more public resources in the 39 flagship universities listed in the 211 Project [60]. The Double-First-Class Project (full name “World First Class University and First Class Academic Discipline Construction”) refers to a classification given to a small group of flagship public universities. This plan was announced in 2015 by the Central Party Committee and the State Council, with the aims of coordinating and promoting world-class universities and first-class subject-specific capabilities. It was seen as a strategic decision to strengthen China’s competitiveness for long-term development [46]. Not only do the flagship universities enjoy more government funding, they also have more private-sector resources and other social connections [53].
Fresh graduate employment also provides a window into China’s class mobility and social sustainability. Graduate employment is the first application of one’s education to the labor market, and may be influenced by family factors and public policies such as the above-mentioned hukou or college admission rules [11, 54]. In terms of intergenerational mobility, parents’ social and hukou status play important roles in both education and career pathways [9, 34, 54]. How do college major and other factors shape China’s new graduates’ experiences in the labor market? How is human capital priced in the labor market, and how does this pricing process reflect social equity? This study addresses these questions through developing models of wage determinants and investigating their interactions in China.
In this study, we employ a newly released dataset from a series of surveys of university graduates in Nanjing, China. This novel dataset includes variables relating to university education and employment, providing an important data source with which to investigate the relationship between the wages of fresh graduates and their educational and family background in contemporary China.

Literature Review

The Study of Wages

The study of wage determinants is a core topic in Labor Economics both on the supply and demand sides. Analysis of starting wages focuses on human capital theory and family background, given that new employees do not yet have work experience and wage records. However, it is generally understood that one’s first job and starting salary are extremely important, as the former is one’s springboard to a future career, while the latter sets a signal for future earning levels [11].
The field of intergenerational mobility contains many existing studies on the family effect on college graduate wages. Family background is typically measured by parental income, job prestige or education. It has been noted that parental background significantly influences students’ achievement and their future incomes in OECD countries [9, 13, 35]. However, conclusions about causality are mixed due to omitted variable bias, variations in operationalization, and for other reasons. For example, parents’ social class ranking has been found to be significant in determining female (but not male) MBA graduates’ starting salaries [23]. There is also criticism of using annual salaries (including starting salaries) as a measurement, as it leads to a biased calculation of intergenerational elasticity and intergenerational correlation. Permanent earnings or an average for several years have been suggested as proxies [50]. A Chinese study estimates that the wage premium of having a cadre parent (a civil servant in at least the county-level government) is 15% of total earnings [34]. Li claims that 58.1% of “cadre parents’’ are college graduates compared with 19.1% for non-cadre parents, and that parents’ social class is not associated with their children’s major choice.
As starting salary is free from the noise of experience and former salary, some scholars focus on the effect of education on starting salaries, including those who emphasize human capital theories [7, 48], or the alternative screening hypothesis [51]. The former treats education as a good measurement of productivity that is reflected in future salary; while the latter supposes that education merely functions as a signal, given that one’s productivity is innate and unrelated to educational training. A major body of literature indicates a positive correlation and causal relationship between university tiers and earnings. Mark Hoekstra uses SAT scores and GPA to estimate student proximity to flagship university admission rules, and builds a counterfactual by comparing those who are near to but below the cut-off of the estimated admission rule to those who are expected to achieve flagship university enrollment [27]. Those who are predicted to be enrolled according to the model earn around 20% more than those who are below the cut-off. Rodney Andrews compares different types of universities by analyzing the unconditional quantile treatment effect of different university diplomas on earnings [3]. By analyzing the distribution rather than mean regressors, significant wage premium heterogeneity is found among students of the University of Texas at Arlington (UTA), Texas A&M University, and community colleges in Texas across different wage intervals.
Since Becker’s initial work, human capital theory has also continued to explore the relationship between education and occupation [6]. One’s major signals one’s higher education curriculum content. A few seminal studies have shown a remarkable wage gap between different university majors [16, 31]. However, due to data limitations, Chinese scholars have often grouped hundreds of sub-majors into very broad categories such as Science, Engineering, and Social Science [29, 33, 54]. This oversimplified grouping of majors is problematic. For example, according to the official set of discipline genres, social sciences such as Journalism and Languages are grouped as Literature. In addition, most Chinese studies in this field use dummy variables to label major disciplines, and hence focus more on the phenomenon of wage gaps across majors, lacking exploration of major pricing mechanisms within the labor market.

Major Specificity and Wages

Human capital can be conceptualized as general or specific [6, 7]. Specific human capital is only useful in one firm, while general human capital involves skills that can easily be used in other firms. The difference between the two rests on a measure of the degree of skill transferability [6, 30]. Recent studies indicate that vocational majors have a higher payoff in the earlier period of one’s career while academic majors may pay less at the beginning but overtake vocational majors in later career stages; and that a general major has a similar payoff across a broad occupation choice set, while a more specific major has a good payoff only in a more narrowed occupation choice set [10, 24, 30]. The Hirschman-Herfindahl Index, Gini coefficient and CR (concentration ratio) are used to measure major specificity [1, 10, 30]. According to Leighton, graduates with highly specific majors measured by occupational HHI earn 8% more than the average in the US labor market; while Altonji, using the Concentration Ratio measure, uncovers a positive relationship between major specificity and earnings [1, 30]. Moreover, scholars in this field have been implementing new models and research designs, such as regression discontinuity, instrumental variables, and dynamic models, to strengthen causal inferences [1, 5, 8].

Interaction Effects

Variable interaction here refers to bivariate interaction. An interaction between two independent variables means that the effect of one variable is dependent on the value of the other. Function f(x) exhibits an interaction between two variables x1 and x2 if the difference in the value of f(x) as a result of changing the value of x1 depends on the value of x2 [22, 28]. A traditional way of measuring interaction is to add a multiplication term to the model. In terms of machine learning, the partial dependence measure shows the marginal effect that one or two features have on the predicted outcome [21]. Partial dependence measures the change in the average predicted value as specified features vary over their marginal distribution [28]. Based on the partial dependence measure, the Friedman H-index is an interaction measure that computes the marginal effect of two targeted variables. If two variables do not interact, their joint partial dependence function equals the sum of their individual partial dependence functions. Therefore, given two interacted variables, the H-index is calculated as the sum of squares of the difference between their joint and individual partial dependence functions, divided by the sum of squares of their joint partial dependence function. Therefore, the H-index is employed for measuring interaction strength. However, some researchers argue that the original H-index is inefficient and gives a spurious result if the denominator (the sum of squares of the joint partial dependence) is close to zero. As a remedy, the adjusted H-index drops the denominator of the original H-index [4, 28].

Data

The dataset covers annual bachelor’s degree graduate employment information from 2016 to 2018 in six universities located in Nanjing, Jiangsu Province, China. Nanjing, a former capital city of China, has more than 60 tertiary education institutions and produced over 720,000 college graduates in 2018. The data was collected through a provincial graduate service center investigation into graduate employment. The service center contacted every student through school emails. Each individual received two surveys, one before graduation and one after. Survey One, conducted during April to June, focused on graduates’ demographics and university education. All fresh graduates in the six universities participated in Survey One, which had 45,661 respondents in the 2016–2018 period. Survey Two, conducted between September and November, focused on graduates’ employment information and job search process, with around 13,750 valid observations. By matching the observations’ student ID and removing inconsistent entries, the dataset simultaneously combines the available demographic, education and job data from graduates who completed both surveys.
Table 1
Frequency table for major sectors
Disciplines
Dataset (Percent)
National total 2015 (Percent)[44]
Economics
4405(9.653)
206,239(6.041)
Law
616(1.350)
129,800(3.802)
Education
93(0.204)
112,424(3.293)
Literature
4171(9.141)
362,972(10.63)
Science
3130(6.859)
255,304(7.479)
Engineering
21,609(47.35)
1,132,226(33.17)
Agriculture
983(2.154)
59,796(1.752)
Medicine
1010(2.231)
209,748(6.144)
Management
9644(21.12)
633,878(18.57)
Total
45,661(100)
3,413,787(100)
Table 1 compares the number of graduates by major sectors in this dataset to China’s total bachelor’s degree graduates in 2015 [44]. The 2015 national total distribution explains why the dataset al.so contains disproportionately large numbers of Engineering and Management graduates. Since all six tertiary education institutions in the dataset are comprehensive universities without specialized colleges such as normal or medical colleges, there are comparatively fewer graduates with a Law, Education or Medicine background (See Table 1).
Table 2 displays the descriptive statistics of the independent variables. Some variables merit detailed descriptions, especially for readers not familiar with Chinese society. Hukou here refers to the hukou registration location during the period of college entrance examination, as some university students move their hukou location to their university’s city after enrollment. Therefore, the hukou question asks whether a graduate lived in a rural or urban area when she/he was in high school. There are three categories of hukou, rural, county seat and metropolis, coded as 0, 1, 2 respectively. Parental education has two categories: high school or below, or above high school, coded as 0 = high school or below, 1 = above high school. Therefore, hukou place and parental education measure family background.
Table 2
Variables summary
 
Observations
Missing
Mean
Std. Dev.
Categories
Survey
University tier
45,661
0
-
-
3
1
Major
45,661
29
-
-
206
1
Hukou
45,661
27
-
-
3
1
Parentedu
45,661
25
-
-
2
1
Gender
45,661
0
-
-
2
1
Satisfaction
45,661
1
-
-
4
1
Wage
45,661
31,911
4733.243
2077.493
-
2
Offer
45,661
32,335
3.372
3.337
-
2
Apply
45,661
32,288
11.854
15.653
-
2
Major-job Relevance
45,661
31,718
-
-
2
2
Job
45,661
31,731
-
-
18
2
Industry
45,661
31,722
-
-
18
2
Time
45,661
32,660
1.558
2.442
-
2
Parentedu refers to the parents’ education level; satisfaction refers to the subjective evaluation of college education; offer refers to the number of offers; apply refers to the number of applications in job search; time refers to months spent on job search
The six universities in the Nanjing sample include three categories of university: prestigious 211 project universities (or 211 universities), middle-ranking public universities (excluding 211 universities), and the lowest-ranking private universities and colleges. In China, private universities and colleges require the lowest college entrance examination scores. The 211 universities are flagship universities with national prestige, requiring the highest entrance examination scores. There are 10,318, 19,719 and 15,162 cases respectively from the highest ranking to the lowest in the dataset. The private universities serve as the base. University tier (university) is coded as 0 = private universities, 1 = ordinary public universities, 2 = 211 universities.
China’s detailed bachelor degree majors are recorded by an eight-digit number. The first two digits refer to the eight broad disciplines, which are Philosophy, Economics, Law, Education, Language, Science, Engineering, Agriculture, Medicine and Management. The first four digits denote the “first level sector” and the most detailed major code requires the use of all the eight digits. If relying only on the first two digits, then Economics and Finance; Sociology and Jurisprudence; Accounting and Marketing; and Language and Journalism are paired together respectively, mixing liberal arts and social science majors. Hence, to avoid over-simplification it is better to use all eight digits. Unlike universities in the U.S., Chinese students’ majors and curricula are relatively fixed once students enroll. Each major in a given university is linked to its specific entrance examination score threshold, so major transfer is not common due to concerns about fairness. Typically, permission to transfer majors requires an applicant to reach the top 5% of the GPA of their peers in the relevant college department. As a result, curricular diversity among students with the same major is limited.

Methods

Measuring Major Specificity

Major specificity is measured by the Herfindahl Hirschman Index (HHI). The HHI index is calculated by the following formula:
$$\:{HHI}_{m\:}=\sum_{o=1}^{N}{S}_{{m}_{o}}^{2}$$
where m denotes major; o denotes occupation index and Smo denotes the share of graduates with the major m flowing into the occupation o [10, 30]. This index ranges from zero to one. A bigger HHI means higher major specificity. By this measure, a major that links to a small set of job roles is of high specificity; while a major with wide and even distribution of job roles is a general major. For example, for major x, if 90% of the graduates go to one job, 10% go to another job, then the HHI for major x is 0.92+0.12=0.82. Similarly, the industry HHI uses the same method to measure major industry diversification.
Table 3 shows the values for major specificity. The HHI index is a measure of major specificity derived by calculating each major’s job or industry concentration. A major with higher HHI value is a major with a more specific or clearer career decision. As the space is limited, only the twenty most common majors are shown below.
Table 3
Major specificity measured by job-wise HHI and industry-wise HHI
Major
Job
HHI
Industry HHI
Average
Wage
Major
Job
HHI
Industry HHI
Average
Wage
Gardening
0.10
0.10
3393
Environment Design
0.10
0.11
3817
Logistics
0.12
0.10
4530
E Commerce
0.12
0.13
4489
Chemical
Engineering
0.15
0.21
4254
Marketing
0.16
0.10
4516
English
0.17
0.16
4358
Statistical Economics
0.18
0.17
4876
International Trade
0.19
0.12
4458
Vehicle Engineering
0.21
0.21
4365
Human
Resource
0.21
0.10
4355
Accounting
0.22
0.11
4105
Engineering
Management
0.23
0.24
4152
Economics
0.24
0.20
5033
Electrical
Information
0.28
0.34
4860
Internet of Things
0.28
0.31
5257
Communication
Engineering
0.32
0.53
5503
Computer
Science
0.33
0.43
5476
Mechanical
Engineering
0.48
0.41
5294
Software
Engineering
0.50
0.62
6520
Generally, social sciences, arts, and humanities majors have low major specificity, majors in science are in the middle, while engineering and computer science-related majors have the highest specificity. For example, E-commerce and Gardening are less specialized majors, while Mechanical Engineering and Computer Science are highly specialized majors. Higher industry or job HHI means that graduates from a specific major background have more concentrated industry or job choices. For example, in the high specificity major Mechanical Engineering, 70% of graduates take jobs in the manufacturing industry, while other industries take no more than 7% each. Another highly specific major is Computer Science, with 79% of graduates going to the IT industry. Among the low specificity majors, 38% of Accounting degree bearers go into finance after graduation, which is the most concentrated industry. Moreover, majors with higher specificity enjoy higher pay. The average wage of the Software Engineering (the most specific major) graduate is nearly twice as much as the wage of Gardening graduates (the least specific major).

The Wage Function

Social science researchers are often confronted with the issues of endogeneity and selection bias [32, 36]. The two-step Heckman selection model can adjust the estimator to deal with potential non-response bias [25, 26], therefore the Heckman models are fitted as the baseline. The dependent variable of interest (step two) will be estimated based on the control of the inverse mills ratio from step one. Around 31,911 out of the 45,661 respondents (69.89%) from Survey One did not answer Survey Two, which collected the data about wages and other job-related variables. The detailed formulations are in Subsection “Heckman model”.
For machine learning approaches, the KNN and Random Forest models are fitted. Then the pair-wise interaction effect is measured by the adjusted H-index based on the prediction of the KNN and Random Forest model fits [22]. The heat map and network plot are displayed using the R package vivid, developed by Inglis et al. [1928]. The original H-index for measuring interaction is:
$$\:{H}_{jk}^{2}=\:\frac{\sum\:_{i=1}^{n}{\left[{f}_{jk}\right({x}_{ij},{x}_{ik})-{f}_{j}({x}_{ij})-{f}_{k}({x}_{ik}\left)\right]}^{2}}{\sum\:_{i=1}^{n}{f}^{2}({x}_{ij},{x}_{ik})}$$
where function fj and fk denotes the partial dependence function, and fjk denotes the two-way partial dependence function [28]. The partial dependence function is estimated as:
$$\:{f}_{s}\left({x}_{s}\right)=\frac{1}{n}\sum_{i=1}^{n}g({x}_{s},\:{x}_{c}^{i})$$
where xc denotes independent variables, xci are the values of the independent variables of n observations, xs denotes the variables of interest, fs refers to the machine learning model selected, and function g() gives the prediction from the machine learning model. This estimates whether and to what extent two variables in the model interact with each other. The interaction effect is more powerful as the H-index gets larger. The adjusted H-index reduces the identification of spurious interactions by ruling out the denominator of the original H-index [4, 28], which is:
$$\:{H}_{jk}=\sqrt{\frac{1}{n}\sum\:_{i=1}^{n}{\left[{f}_{jk}\right({x}_{ij},{x}_{ik})-{f}_{j}({x}_{ij})-{f}_{k}({x}_{ik}\left)\right]}^{2}}$$

Findings

Family Effect on University Tier and Major Choice

In studies of wages, there are two endogeneity concerns: the family-education relationship and sample selection bias. Both family background and education are significant wage determinants, but family background may affect education. In this study, parents’ education and hukou location are family-related variables, while university tier and major decision are education-related variables. The first potential endogeneity is between family and university tier. However, as shown in Table 4, university tier seems to be unrelated to family background.
Table 4
Correlation matrix
 
Hukou
Parentedu
University
Gender
Hukou
1.000
   
Parentedu
0.419
1.000
  
University
-0.012
0.066
1.000
 
Gender
0.013
-0.019
-0.108
1.000
Table 5
Multinomial logit model: family effect on university tier
University
Variable
Model 1
Private
Hukou = 1
0.233***
Hukou = 2
0.472**
Parentedu
0.273***
Year = 1
0.053*
Year = 2
0.756***
Gender
-0.224***
Intercept
-1.206***
Ordinary public
Base outcome
211 University
Hukou = 1
-0.093***
Hukou = 2
0.280***
Parentedu
0.472***
Year = 1
0.453***
Year = 2
1.181***
Gender
-0.753***
Intercept
-1.540***
Chi-square
 
4898.50
***p < 0.01, **p < 0.05, *p < 0.1
The relationship between family and university tier is further analyzed by multinomial logit model in Table 5. The dependent variable university is coded as 0 = private, 1 = ordinary public, 2 = 211 universities. The independent variables are hukou (0 = rural, 1 = county seat, 2 = metropolis), parentedu (0 = high school or below, 1 = above high school), gender (male = 0, female = 1) and year (0 = 2016, 1 = 2017, 2 = 2018). As the VIF values (Variance Inflation Factor) of the independent variables are all smaller than two, multicollinearity is not detected. Ordinary public university (middle-ranking) is set as the base. After the relative risk ratio transformation, cateris paribus, it is found that educated parents are associated with an increased chance (31.4% more likely) of students being enrolled in a private university (low-ranking) vs. an ordinary public university, while educated parents are also associated with an increased chance (60.3% more likely) of being enrolled in a 211 university (high-ranking) vs. an ordinary public university. In other words, with educated parents, graduates are most likely to attend the best universities in the sample, followed by the lowest ranking universities, and least likely to go to a middle-ranking university. Therefore, educated parents do not have a consistent positive impact on university tier. Similarly, the effect of hukou on university tier is not uniform either. The chance of attending private vs. ordinary public universities increases for graduates with more urban hukou; while for 211 universities, hukou switching from rural to county-seat decreases the enrollment rate; while hukou switching from rural to metropolis increases the enrollment rate. Therefore, having urban as opposed to rural hukou does not render a clear positive or negative effect on university tier.
As mentioned previously, enrollment in university in China depends on national entrance examination scores, hence university tier is a proxy of learning ability in high school. Since university tier is not uniformly responsive to family background, including parents’ education and hukou, the family-university endogeneity problem put forward by other scholars is less pertinent.
In addition, there is also a natural concern that students’ subjective selection of universities may be affected by tuition expenses. However, the typical public versus private comparison in the U.S. does not apply in China’s case. In 2018, tuition for most majors in China’s private universities was around 12,000 to 30,000 yuan per year without public subsidy. With full government subsidy, tuition for most majors in 211 universities and ordinary public universities was between 5,000 and 6,000 yuan per year. Therefore, in China, the best universities are also the cheapest. Hence the preference for choosing an upper-tier university rather than a lower tier one, especially when they are located in the same city, is extremely straightforward. Therefore, the biases identified by Hoekstra (omitted family bias and university selection bias) are less relevant here [27].
Table 6
Discipline choice across Hukou locations
Disciplines
Hukou location
Total
Rural
Obs. (percent)
County Seat
Obs. (percent)
Metropolis
Obs. (percent)
Obs. (percent)
Economics
1226 (8.53)
1786 (10.76)
1391 (9.51)
4403 (9.65)
Language
1141 (7.94)
1592 (9.59)
1437 (9.82)
4170 (9.14)
Science
998 (6.94)
1079 (6.50)
1052 (7.19)
3129 (6.86)
Engineering
7094 (49.34)
7514 (45.27)
6992 (47.80)
21,600 (47.36)
Agriculture
480 (3.34)
355 (2.14)
147 (1.00)
982 (2.15)
Medicine
342 (2.38)
371 (2.24)
296 (2.02)
1009 (2.21)
Management
2869 (19.96)
3683 (22.19)
3053 (20.87)
9605 (21.06)
Law
191 (1.33)
191 (1.15)
232 (1.59)
614 (1.35)
Education
36 (0.25)
28 (0.17)
29 (0.20)
93 (0.20)
Total
14,377 (100.00)
16,599 (100.00)
14,629 (100.00)
45,605 (100.00)
It is also necessary to analyze potential endogeneity in terms of family impact on major choice. Table 6 shows graduates’ discipline choice with different hukou backgrounds. The percentage of major choices from the same hukou cohort is shown in parentheses. With the exception of agriculture (2% of total observations), where graduates from rural hukou are 2.34 times more likely to pick agriculture than their peers from metropolitan backgrounds, there is no discernible pattern of major choice across all hukou origins. To validate this conclusion, the relationship between family and discipline choice is analyzed using a logit model in Table 7. Discipline has nine categories: Economics (Econ.), Law, Education (Edu.), Language (Lang.), Science (Sci.), Engineering (Engr.), Agriculture (Agri.), Medicine (Med.) and Management (Mgmt.). Nine logit models are fitted separately. The dependent variable of each logit model is a dummy variable measuring selection of each discipline. For instance, in the first model, Economics = 1, other disciplines = 0. Explanatory variables are hukou and parentedu, with university, gender and year controlled. Within the nine disciplines, hukou has an uniform effect on choice of Language, Agriculture and Medicine. Agriculture and Medicine are less attractive to graduates with more urban hukou, conversely, Language is preferred as hukou changes from rural to metropolis. According to Table 6, these three disciplines represent 13.52% of the sample or 18.52% of the Chinese bachelor’s degree holding population. Additionally, the Wald Test is used to compare coefficients for different levels of hukou in the models, determining if the coefficient for hukou = 1 equals the coefficient for hukou = 2. The test indicates that only the hukou coefficients for Language and Agriculture differ significantly within the three disciplines, excluding Medicine. We also find that county-seat graduates are significantly more likely to choose Economics and Management compared to metropolis and rural graduates, while they are less interested in Law and Science. For parent education, choices of Economics, Law, Agriculture and Medicine are found to be associated with parentedu. Students with educated parents are more likely to choose Economics and Law, while students whose parents did not attend high school tend to favor Medicine and Agriculture. These disciplines represent 15.38% of the sample or 17.73% of the Chinese bachelor’s degree holding population.
Table 7
Logit model: family effect on discipline choice
 
Econ.
Law
Edu.
Lang.
Sci.
Engr.
Agri.
Med.
Mgmt.
Parentedu
0.16***
0.18*
-0.33
-0.05
-0.01
0.00
-0.56***
-0.24***
0.02
Hukou = 1
0.24***
-0.21*
-0.30
0.16***
-0.11**
-0.15
-0.24***
-0.09
0.19***
Hukou = 2
0.20***
0.00
-0.29
0.23***
-0.07
-0.15
-0.66***
-0.39***
0.15***
AUC
0.73
0.70
0.76
0.71
0.63
0.74
0.63
0.70
0.69
Wald Test
0.00***
0.05*
0.80
0.00***
0.018**
0.67
0.00***
0.26
0.00***
***p < 0.01, **p < 0.05, *p < 0.1
AUC (area under the ROC curve) is also reported. The p value of the Wald Test is presented with significance level. The coefficients of the control variables are omitted
Consequently, in contrast to prior research, we first find hukou and parent education level bring no clear advantage in access to higher university tiers; second, there is no evidence that family variables influence most students’ discipline choice. Admittedly, the models show that family variables may still influence university tier and major choice, though in an inconsistent way. For example, compared to having a rural hukou, having a metropolis hukou has a “polarizing” effect on university tier, as metropolis graduates are least likely to attend middle-ranking universities. Law and Science are the least popular disciplines among county-seat graduates, while their preferred disciplines are Economics and Management.
Another endogeneity concern relates to sample selection bias. Compared with the classic scenario in which a female has a higher threshold for not working than a male, this study of fresh college graduates has somewhat different findings. In this dataset, no single variable overwhelmingly dominates in terms of missing values, but all job-related variables have around 70% missing values. This is because nearly two thirds of the graduates who completed Survey One missed Survey Two. If a respondent missed Survey Two, all the job-related variables are recorded as missing in the data. As illustrated earlier, these job-related missing values can be attributed to multiple sources. To name a few, these include: (1) some graduates did not access their school email; (2) some left for further study hence they were not able to report their occupation-related variables; (3) the unemployed; (4) some may reluctant to disclose personal information; and (5) some of the employed may be too busy to bother with the survey. According to an official report covering all universities in Jiangsu Province, the number of graduates who go on to further study is three times the number of unemployed graduates [52]. However, it is reasonable to assume that the reasons for the missing data are not mutually exclusive. For example, an unemployed graduate could choose further study if she or he did not obtain a satisfying job offer, or a graduate went for further study and so then did not check their former school email. As the sources of each missing value are not mutually exclusive and unobserved, there is no additional information with which to explain the occurrence of the missing data. Given the sample selection bias, the Heckman model is used to control the bias in the second step by including the step-one Inverse Mills Ratio. Please see the model result in Sect. “Heckman model”.

Heckman Model

Linear modeling is the traditional and so far the most widely used method family in studies of wages. Hence, we select linear models as a good baseline for comparison with the machine learning models. The first step of the Heckman model is:
$$\:\text{P}\text{r}(R{ecord}_{i}\:=\:1)\:=\:{\alpha}_{0}+{\beta}_{1}{MSpec}_{i}+{\beta}_{2}{X}_{i}+{\beta}_{3}{Z}_{i}+{\varepsilon}_{i},$$
where Recordi is the non-missing status of the wage variable (0 = missing, 1 = non-missing). MSpeci refers to the major specificity measured by job-wise or industry-wise indicator. The vector Xi contains the control variables in both steps collected by Survey One, which are gender, university tier, year and discipline dummies. The vector Zi includes college education satisfaction (satisfaction, 0 = bad, 1 = neutral, 2 = good and 3 = very good), parent education (parentedu) and hukou, functioned as the exclusion restriction put forward by Lennox et al. [32].
As discussed in Subsection “Family Effect on University Tier and Major Choice”, there are multiple sources that may lead to missing values, therefore the selection of appropriate exclusion restrictions is difficult. The three exclusion restriction variables hukou, parentedu and satisfaction are all significant and have the predicted signs in the Heckman first step (Table 8), while are not significant if added in the Heckman second step. Furthermore, if including the three variables in the second step, the VIF of the inverse Mills ratio rapidly exceeds ten, indicating a high multicollinearity problem. In the survey, to record satisfaction students were asked to evaluate their degree of satisfaction with the college education they received. Students’ satisfaction is highly associated with school status and other social connections [49, 55]. Hence, lower satisfaction with their college education can lead to graduates’ infrequent use of school email after graduation and corresponding tendency to miss Survey 2, corresponding to the missing data source (1) discussed in Subsection “Family Effect on University Tier and Major Choice”: some graduates missed Survey Two through not checking school email. Meanwhile, hukou and parentedu are found to be negatively associated with non-missing values. That is to say, urban graduates or graduates with educated parents were more likely to miss Survey Two. This is explained by the missing data source (2) in subsection “Family Effect on University Tier and Major Choice”, as graduates with a better-off family background are more likely to leave for further study, hence their occupation-related variables are missing. Therefore, these three variables are selected as the potential exclusion restrictions.
Although satisfaction, hukou and parentedu are not significant if added in the Heckman step two, their exogeneity in the wage function is disputed in the academic literature. There are scholars who argue that students’ satisfaction with their college education is affected by their valuation and image of the school and their expectations [2], or that satisfaction may predict students’ productivity in school [47]. Therefore, satisfaction can be a potential wage determinant through its association with students’ valuation of their school or productivity in school. In terms of the family variables, some scholars have argued that wage difference is mainly caused by educational attainment, rather than rural-urban hukou differences [38, 40]. However, some empirical studies also state that hukou is relevant to job and wages, as college education and hukou transfer are found to interact in wage pricing [56], or because hukou can affect graduates’ job seeking intensity [54]. Similarly, the effect of parent education, or socioeconomic status (SES) in a broader sense, may also affect graduates’ job seeking behavior and wages [9, 13, 35]. Therefore, we must admit that the three exclusion restriction variables are ad hoc due to exogeneity concerns.
Table 8
Heckman first step probit: non-missing status of the wage data
 
Model 1
Model 2
Mspec
-0.447***
-0.089
University = 1
-0.043**
-0.039**
University = 2
-0.187***
-0.188***
Parentedu
-0.239***
-0.241***
Hukou = 1
-0.081***
-0.082***
Hukou = 2
-0.210***
-0.211***
Satisfaction = 1
0.147*
0.147*
Satisfaction = 2
0.235**
0.233**
Satisfaction = 3
0.289*
0.286*
Gender
0.031*
0.032*
Intercept
-0.128
-0.227**
Discipline Dummies
Yes
Yes
Year Dummies
Yes
Yes
AUC
0.665
0.663
Observations
45,661
45,661
***p < 0.01, **p < 0.05, *p < 0.1
Table 8 shows the selection equation (first step) of the Heckman model. Model 1 uses job-wise HHI, whereas Model 2 uses industry-wise HHI as the major specificity indicator. The major-related variables in the Heckman first step are major specificity and the discipline dummies. For major specificity, industry-wise major specificity has an insignificant negative sign, but job-wise major specificity is significantly negative, indicating that majors with high specificity may be associated with more missing values. One explanation is that high-specificity majors, such as majors in Medicine and Science, have the highest rate of further study, and thus graduates with these majors are more likely to having missing values for the wage variable [41]. The discipline dummy variables are omitted for brevity, showing that only Agriculture has a significant positive sign (with Economics as the base), indicating that the Agriculture graduates in this dataset were more inclined to answer Survey Two. Besides the major variables, a higher university tier, parent education, and urban hukou all increase the likelihood of missing values for wages, whereas – as illustrated earlier – higher satisfaction with college education decreases it. The AUC stands for the area under the ROC curve. The AUC value of 0.66 indicates that the probit model is with certain discriminatory power.
The second step of the Heckman model is:
$$\begin{aligned}{Logwage}_{i}&={\alpha}_{0}+{\beta}_{1}{MSpec}_{i}+{\beta}_{2}{Rel}_{i}+{\beta}_{3}{MSpec}_{i}\:*{Rel}_{i}\\&+{\beta}_{4}{X}_{i}+{\beta}_{5}{X}_{i}^{\prime}\:+\:{\beta}_{6}{IMR}_{i}+{\varepsilon}_{i}\:,\end{aligned}$$
where Logwagei is the logarithm of wage, MSpeci is the major specificity, and Reli refers to the major-occupation relevancy (0 = Not Relevant, 1 = Relevant). The interaction between the MSpeci and the Reli is also included. The vector Xi contains the same control variables as in the first step, collected by Survey One, which are gender, university tier, year and discipline dummies. The vector X’i contains job search variables (searching time, application number and offer number), job dummies and industry dummies, all collected by Survey Two. IMRi stands for the inverse Mills ratio.
In Table 9, Models 1 and 3 employ the job-wise HHI index; while Models 2 and 4 use the industry-wise HHI as the indicator of major specificity. The VIF values of MSpec, Rel and IMR are smaller than ten, indicating that the variables of interest do not suffer from multicollinearity. As shown in Models 1 and 2, major-job relevancy and the major specificity are significant wage determinants, as they are both positively correlated with wages. Adding the interaction term, Models 3 and 4 indicate that relevancy or major specificity alone cannot guarantee a high payoff, but their interaction is positive and significant in terms of wage estimation. If major specificity is a measure of major productivity, then the realization of the major premium in the labor market is conditioned by the major job relevancy. For example, according to Model 3, the interaction between major specificity and major job relevancy is 0.254. Since the dependent variable is log-transformed, an increase in the interaction term from zero to one would explain an e0.254 − 1 = 28.9% increase in wages. Table 12 in the Appendix shows the OLS results for comparison to the Heckman model. The OLS and the Heckman second step give almost the same coefficients for major specificity and its interaction with the major-job relevance. As the dataset does not record more detailed information on the missing values, the missing value analysis cannot go any further. Therefore, the machine learning models below will not control for sample selection bias.
Table 9
Heckman second step OLS: wage estimation
 
Model 1
Model 2
Model 3
Model 4
MSpec
0.167***
0.235***
-0.016
0.075
Rel
0.028***
0.026***
-0.036
-0.021
MSpec*Rel
  
0.254***
0.207***
University = 1
0.041***
0.037***
0.040***
0.037***
University = 2
0.246***
0.243***
0.244***
0.242***
Gender
-0.063***
-0.063***
-0.063***
-0.063***
IMR
0.150***
0.143***
0.150***
0.145***
Intercept
8.287***
8.283***
8.329***
8.312***
Job Dummies
YES
YES
YES
YES
Job Searching Variables
YES
YES
YES
YES
Industry Dummies
YES
YES
YES
YES
Discipline Dummies
YES
YES
YES
YES
Year Dummies
YES
YES
YES
YES
Adjusted R2
0.33
0.33
0.33
0.33
VIF
    
Mspec
2.75
3.10
6.05
7.04
Rel
1.22
1.22
6.73
3.78
IMR
1.61
1.44
5.85
1.44
Observations
29,311
29,209
29,311
29,209
***p < 0.01, **p < 0.05, *p < 0.1
The VIF values of the interaction term MSpec*Rel in Models 3 & 4 are 10.34 are 12.46 due to the existence of MSpec and Rel. The VIF values of MSpec*Rel drop to 3.74 and 2.90 if MSpec and Rel are excluded. As the model results remain almost the same if MSpec and Rel are excluded in Models 3 and 4, they are omitted due to space constraints.

Machine Learning Models

As major specificity measured by industry shows very similar results to that measured by job, we display only the results of the latter. Ten numerical and ordinal independent variables are used to fit the machine learning models. In order to compute the interaction between major specificity and relevance, the KNN and Random Forest regressions are fitted first. As both models are regressions rather than classifications, the tuning metric is MSE (mean square error) or RMSE (root mean square error). As shown by Fig. 3 in the Appendix, the KNN regression uses 5-fold cross validation to tune the k value. The MSE is used as an evaluation metric. The MSE flattens out when k is greater than 25. The optimal k is 63, with the smallest MSE of 0.112 (RMSE = 0.335). As shown by Fig. 4 in the Appendix, the number of trees (n_estimators) and variable splits (max_features) are optimized for Random Forest regression. With 5-fold cross validation, the initial number of trees is set within [1, 1000]. The RMSE flattens out when n_estimators is greater than 300. The optimal n_estimators is 464 with the smallest RMSE, hence the number of trees is set within [1, 500]. Because there are ten independent variables, the range of max_features is tuned within [1, 10]. Using 5-fold cross validation resampling, the optimal value of max_features is 3, with the smallest RMSE of 0.322. Figure 4 also compares the RMSE of max_features = Best (max_features = 3) and the baseline max_features = None (max_features = 10).
Table 10 compares the performance of the KNN regression (k = 63) and the Random Forest regression (n_estimators = 464, max_features = 3) of 5-fold cross validation, using three metrics: RMSE, MAE, and R2 Score. The Random Forest has a higher R2 score, lower RMSE and MAE than the KNN counterpart. As a result, the Random Forest regression outperforms KNN in this case.
Table 10
KNN and random forest performance
 
RMSE
MAE
R2
KNN
0.335
0.251
0.170
Random Forest
0.322
0.242
0.263
After that, the KNN and Random Forest fits are used to calculate variable interaction. For interaction, the partial dependence function is calculated first, to measure the change in the average predicted value as specified variables vary over their marginal distribution. Then for measuring variable interaction, the adjusted H-index is calculated by comparing the joint partial dependence for a pair of variables to their marginal effects respectively [28]. In addition, the Fisher agnostic permutation approach is used to calculate the variable importance [20, 28]. A variable is considered important if permuting its values increases the model prediction error. Table 13 in the Appendix displays the detailed variable importance index. Compared with the education-related variables, family variables such as hukou or parentedu are less important wage determinants.
Table 11
Adjusted H-index: interaction with major specificity
Model
Rel
University
Gender
Apply
Year
Time
Other Variables
KNN
0.023
0.021
0.025
0.013
0.019
< 0.010
< 0.010
R.F.
0.015
0.021
0.021
0.012
< 0.010
< 0.010
< 0.010
R.F. stands for Random Forest model. Rel refers to the major-job relevancy. Family variables are hukou and parentedu
Table 11 displays part of the adjusted Friedman H-index, which measures major specificity’s interaction with the other variables, calculated using KNN and Random Forest fits. The KNN’s adjusted H-index reveals a stronger interaction between major specificity and major-job relevancy (0.023) than its Random Forest counterpart (0.015). Using the KNN, major specificity mostly interacts with gender, major-job relevancy and university tier. Using Random Forest, major specificity interacts most strongly with university tier, gender and major-job relevancy. According to Table 11, major specificity strongly interacts with major-job relevancy (rel), university tier (university) and gender. If major specificity represents one dimension of major productivity, then university tier is another dimension of the overall quality of graduates, while major-job relevancy measures the match between the major and the job. Major specificity’s interactions with university tier and major job relevancy combine as a measure of a candidate’s productivity in terms of specialized skills, overall abilities and career compatibility. Therefore, graduates’ starting wage determination can be viewed as an integration of major, university tier and education-career relevance.
Figure 1 shows the heatmaps of the KNN and Random Forest fits. The off-diagonal Vint stands for variable interaction; diagonal Vimp stands for variable importance. Comparing the color contrast on the same scale of Vint, the KNN fit identifies more pairwise interactions. Containing the same information as the heatmap, Fig. 2 is better suited to interaction detection. Pairwise interactions are shown by the color depth of the line. Variable importance is indicated by the node size. The interaction between the major specificity (mspec) and the major-job relevancy (rel) is presented in both figures. The figures also show that family factors such as hukou or parentedu do not have an obvious interaction with the educational variables such as university tier or major specificity.
Fig. 1
Heatmaps of the KNN and random forest fits
Bild vergrößern
Fig. 2
Network plots of the KNN and random forest fits
Bild vergrößern

Conclusion

With dramatic demographic changes such as the steady spread of higher education, an aging population and changes in the labor demand structure, graduate employment in contemporary China has become an issue of concern for the government, academia and the general public. The urgency of this issue has overtaken the former focus on the employment prospects of China’s migrant labor force. As Di notes, the job market for college graduates is currently China’s biggest employment problem and will continue to be so over the long term [18]. In light of this, this paper studies major pricing mechanisms in the Chinese labor market, investigating the issue of Chinese graduate employment using machine learning approaches.
Major specificity quantifies and distinguishes the value of every major within the labor market. We find that majors with higher specificity have better starting wages. Additionally, the coefficients for major specificity and its interaction with major-job relevancy in the Heckman models are very similar to their OLS counterparts, implying that the sample selection adjustment by the Heckman does not change the major specificity coefficients or the interaction term. Therefore, the Random Forest and KNN models are fitted for the wage function. Random Forest outperforms in terms of prediction precision, but KNN identifies more of an interaction relationship. Both the Heckman model and the machine learning models show that a major’s wage premium is subject to the major relevance of the job. Therefore, the interaction between major specificity and major-job relevance further explains the wage mechanism for new graduates. In addition, the models also detect an interaction between major specificity and university tier. Consequently, college major is found to interact with university tier and major-job relevancy simultaneously, providing an overall picture of the effect of college education on wages.
With respect to China’s social mobility in the context of graduate job market, we discover that university tiers are not positively responsive to family variables such as hukou and parental education, a finding that can be attributed to China’s unique education system and the college entrance exam. The rigorous Chinese college entrance exam and “the cheapest is the best” rule for Chinese universities make China’s higher education less associated with family background. Therefore, China’s university admission policy more fully enables equality of educational opportunity than its counterparts in developed countries. Besides, there is no evidence that family variables influence the discipline choice of the majority of students. As a result, in contrast to non-China focused research, the overall effect of education on Chinese bachelors’ degree graduate starting wages is less sensitive to the family endogeneity concern. In terms of hukou, it should be noted that there is no evidence that urban hukou confers a clear advantage in university tier or wage outcomes, as demonstrated by the models in this study. However, hukou or family background may still have an impact on education decisions, as graduates from better-off families are more likely to be associated with missing values for the wage variable, which can be attributed to their propensity to undertake further study. Hence, hukou, previously an instrument of population management, has diminished effects on university tier, major choice and starting wages. Consequently, both the comparatively equitable education system and the less-strict hukou policy enhance social mobility in contemporary China.
In terms of research implications, this study is instructive to both the government and individuals. For policy makers, major specificity signals majors’ compatibility with labor demand in the context of technological progress and the business cycle. The government should add this index in their unemployment monitoring database in addition to the youth unemployment rate. For majors with significant declining scores on this index, the government should be proactive in downsizing enrollments, driving curriculum change and holding targeted campus job fairs to address the mismatch between these majors and the labor market. In addition, this index can also be used to make comparisons between universities as a way of evaluating the employment quality prospects for a given major. For individuals, this study is particularly valuable for lower class families facing major decisions. First, they are in general less exposed to information about major choices and their respective career paths. Second, graduates in this group have a greater lack of family resources during the job-seeking process, alongside a higher reliance on their starting wages. Therefore, students from households with low socioeconomic status should be very cautious when selecting low specificity majors.
In terms of research methods, while former studies usually group hundreds of university majors into several discipline dummy variables, this study uses the major specificity index to further explore the pricing mechanism in addition to presenting the wage gap between majors. This has implications for other studies in which the variable of interest has dozens of categories. Second, studies in the field of public policy analysis often involve multiple interacted independent variables. The advantage of the machine learning model visualized in this study is that it exhausts and presents all the pair-wise interactions intuitively. The interaction and visualization methods used in this study can be adopted for policy analysis.
Admittedly, this study has limitations. Firstly, since the dataset did not record the exact reason for missing values, the missing value analysis is necessarily limited. Secondly, the major-job relevancy measure used in this study is based on respondents’ subjective evaluation, which is likely to introduce systematic bias. Thirdly, starting wages are different to lifelong wages. As such, there is much room for future research on China’s unique education system and its relationship to social mobility.

Declarations

Conflicting Interests

No conflict of interest was declared by all the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Dr. Tian Hang

obtained his PhD degree in Public Policy and Political Economy at the University of Texas at Dallas. Currently he is a postdoctoral associate at the Joint Postdoctoral Workstation of the School of Management and Engineering at Nanjing University & Holly Futures Co., Ltd. His research interests lie in supply chain analysis, political economy and business management.

Dr. Yong Zhou

is the Chairman of the Jiangsu SOHO Holdings Group. As a Chinese State-Council Allowance Obtained Expert, he is a professor of the Business School of Hohai University and also the postgraduate supervisor at Nanjing Audit University and Nanjing University of Finance and Economics. His research areas include political economy, business management, finance, and the Belt and Road Initiative.
download
DOWNLOAD
print
DRUCKEN
Titel
Chinese University Major Decision and Its Effect on Wages: Modeling Interaction Between Major Specificity and Education-Job Relevancy Using Machine Learning Approaches
Verfasst von
Tian Hang
Yong Zhou
Publikationsdatum
10.01.2025
Verlag
Springer Netherlands
Erschienen in
Journal of Chinese Political Science / Ausgabe 3/2025
Print ISSN: 1080-6954
Elektronische ISSN: 1874-6357
DOI
https://doi.org/10.1007/s11366-024-09902-5

Appendix

Table 12
OLS: wage estimation
 
Model 1
Model 2
Model 3
Model 4
Mspec
0.043
0.092*
0.034
0.082
Rel
-0.035
-0.020
-0.036
-0.021
Mspec*Rel
0.252***
0.203***
0.253***
0.206***
University = 1
0.041***
0.037***
0.044***
0.04***
University = 2
0.268***
0.264***
0.264***
0.261***
Gender
-0.068***
-0.068***
-0.067***
-0.067***
Parentedu
   
0.035***
0.035***
Hukou = 1
   
0.012
0.011
Hukou = 2
   
0.007
0.007
Satisfaction = 1
   
-0.024
-0.022
Satisfaction = 2
   
-0.053
-0.050
Satisfaction = 3
   
-0.048
-0.044
Intercept
8.482
8.482***
8.428***
8.416***
Job Searching Variables
YES
YES
YES
YES
Job Dummies
YES
YES
YES
YES
Industry Dummies
YES
YES
YES
YES
Year Dummies
YES
YES
YES
YES
Discipline Dummies
YES
YES
YES
YES
Adjusted R2
0.27
0.27
0.27
0.27
Observations
9612
9568
9612
9568
***p < 0.01, **p < 0.05, *p < 0.1
Table 13
Variable importance index
Rank
KNN
Random Forest
Variable
Index
Variable
Index
1
University
0.117
University
0.124
2
Year
0.102
Mspec
0.113
3
Mspec
0.090
Gender
0.104
4
Gender
0.085
Year
0.092
5
Hukou
0.075
Apply
0.078
6
Rel
0.066
Offer
0.057
7
Parentedu
0.059
Time
0.051
8
Time
0.057
Rel
0.042
9
Offer
0.056
Hukou
0.041
10
Apply
0.045
Parentedu
0.034
The importance of major specificity (Mspec) ranks third in KNN and second in Random Forest. Hukou and parentedu are ranked fifth and seventh in KNN, and ninth and tenth in Random Forest
Fig. 3
Tuning of the KNN regression
Bild vergrößern
Fig. 4
Tuning of the random forest regression
Bild vergrößern
1.
Zurück zum Zitat Altonji, J. G., E. Blom, and C. Meghir. 2012. Heterogeneity in human capital investments: High school curriculum, college major, and careers. Annual Review of Economics 4(1): 185–223.CrossRef
2.
Zurück zum Zitat Alves, H., and M. Raposo. 2007. Conceptual model of student satisfaction in higher education. Total Quality Management 18(5): 571–588.CrossRef
3.
Zurück zum Zitat Andrews, R. J., J. Li, and M. F. Lovenheim. 2016. Quantile treatment effects of college quality on earnings. Journal of Human Resources 51(1): 200–238.CrossRef
4.
Zurück zum Zitat Apley, D. W., and J. Zhu. 2020. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(4): 1059–1086.CrossRef
5.
Zurück zum Zitat Arcidiacono, P. 2004. Ability sorting and the returns to college major. Journal of Econometrics 121(1–2): 343–375.CrossRef
6.
Zurück zum Zitat Becker, G. S. 1962. Investment in human capital: A theoretical analysis. Journal of Political Economy 70(5): 9–49.CrossRef
7.
Zurück zum Zitat Becker, G. S. 1964. Human capital: A theoretical and empirical analysis with special reference to education. 3rd Edition. Chicago: The University of Chicago Press.
8.
Zurück zum Zitat Berger, M. C. 1988. Predicted future earnings and choice of college major. ILR Review 41(3): 418–429.CrossRef
9.
Zurück zum Zitat Blanden, J., P. Gregg, and S. Machin. 2005. Intergenerational mobility in Europe and North America. A report supported by the Sutton Trust. London School of Economics.
10.
Zurück zum Zitat Blom, E., B. C. Cadena, and B. J. Keys. 2021. Investment over the business cycle: Insights from College Major Choice. Journal of Labor Economics 39(4): 1043–1082.CrossRef
11.
Zurück zum Zitat Bretz Jr, R. D. 1989. College grade point average as a predictor of adult success: A meta-analytic review and some additional evidence. Public Personnel Management 18(1): 11–22.CrossRef
12.
Zurück zum Zitat Cai, F. 2011. Hukou system reform and unification of rural–urban social welfare. China & World Economy 19(3): 33–48.CrossRef
13.
Zurück zum Zitat Causa, O., and A. Johansson. 2011. Intergenerational social mobility in OECD countries. OECD Journal: Economic Studies 1: 1–44.
14.
Zurück zum Zitat Chan, K. W. 2015. Five decades of the Chinese hukou system. In Handbook of Chinese migration, ed. R. R. Iredale. 23–47. Cheltenham: Edward Elgar Publishing.
15.
Zurück zum Zitat Cheng, T., and M. Selden. 1994. The origins and social consequences of China’s hukou system. The China Quarterly 139: 644–668.CrossRef
16.
Zurück zum Zitat Chevalier, A. 2011. Subject choice and earnings of UK graduates. Economics of Education Review 30(6): 1187–1201.CrossRef
17.
Zurück zum Zitat Di, D.-S. 2009. The population curve contains business opportunities. IT Managerial World 19: 4–6.
18.
Zurück zum Zitat Di, D.-S. 2020. It is a reference for decision makers and a wingman for investors, please see today’s demographic interpretation. Retrieved from https://www.youtube.com/watch?v=aLXxsszjRFI%26t=542s.
19.
Zurück zum Zitat Earle, D., and C. B. Hurley. 2015. Advances in dendrogram seriation for application to visualization. Journal of Computational and Graphical Statistics 24(1): 1–25.CrossRef
20.
Zurück zum Zitat Fisher, A., C. Rudin, and F. Dominici. 2019. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research 20(177): 1–81.
21.
Zurück zum Zitat Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29: 1189–1232.CrossRef
22.
Zurück zum Zitat Friedman, J. H., and B. E. Popescu. 2008. Predictive learning via rule ensembles. The Annals of Applied Statistics 2: 916–954.CrossRef
23.
Zurück zum Zitat Frieze, I. H., J. E. Olson, and D. C. Good. 1990. Perceived and actual discrimination in the salaries of male and female managers. Journal of Applied Social Psychology 20(1): 46–67.CrossRef
24.
Zurück zum Zitat Golsteyn, B. H., and A. Stenberg. 2017. Earnings over the life course: General versus vocational education. Journal of Human Capital 11(2): 167–212.CrossRef
25.
Zurück zum Zitat Greene, W. H. 1994. Accounting for excess zeros and sample selection in Poisson and negative binomial regression models. ERN: Discrete Regression & Qualitative Choice Models (Single) (Topic).
26.
Zurück zum Zitat Heckman, J. J. 1976. The common structure of statistical models of truncation, Sample Selection and Limited Dependent Variables and a simple estimator for such models. Annals of Economic and Social Measurement 5: 475–492.
27.
Zurück zum Zitat Hoekstra, M. 2009. The effect of attending the flagship state university on earnings: A discontinuity-based approach. The Review of Economics and Statistics 91(4): 717–724.CrossRef
28.
Zurück zum Zitat Inglis, A., A. Parnell, and C. B. Hurley. 2022. Visualizing variable importance and variable interaction effects in machine learning models. Journal of Computational and Graphical Statistics 31(3): 766–778.CrossRef
29.
Zurück zum Zitat Kong, J. 2017. Determinants of graduates’ job opportunities and initial wages in China. International Labour Review 156(1): 99–112.CrossRef
30.
Zurück zum Zitat Leighton, M., and J. D. Speer. 2020. Labor market returns to college major specificity. European Economic Review 128: 103489.CrossRef
31.
Zurück zum Zitat Lemieux, T. 2014. Occupations, fields of study and returns to education. Canadian Journal of Economics/Revue Canadienne D’economique 47(4): 1047–1077.CrossRef
32.
Zurück zum Zitat Lennox, C. S., J. R. Francis, and Z. Wang. 2012. Selection models in accounting research. The Accounting Review 87(2): 589–616.CrossRef
33.
Zurück zum Zitat Li, F.-L., X.-H. Ding, and W. J. Morgan. 2009. Higher education and the starting wages of graduates in China. International Journal of Educational Development 29(4): 374–381.CrossRef
34.
Zurück zum Zitat Li, H., L. Meng, X. Shi, and B. Wu. 2012. Does having a cadre parent pay? Evidence from the first job offers of Chinese college graduates. Journal of Development Economics 99(2): 513–520.CrossRef
35.
Zurück zum Zitat Lillard, L. A., and R. J. Willis. 1994. Intergenerational educational mobility: Effects of family and state in Malaysia. Journal of Human Resources 29: 1126–1166.CrossRef
36.
Zurück zum Zitat Liu, C., Y. Tao, and C. Yi. 2023. How does workplace affect employee political efficacy in China? Journal of Chinese Political Science 28(2): 301–329.CrossRef
37.
Zurück zum Zitat Liu, X. 2020. The development of private universities in socialist China. Higher Education Policy 33(1): 1–19.CrossRef
38.
Zurück zum Zitat Lu, Z., and S. Song. 2006. Rural-urban migration and wage determination: The case of Tianjin, China. China Economic Review 17(3): 337–345.CrossRef
39.
Zurück zum Zitat Ma, Y. 2024. Reconceptualizing policy change in China: From soft to harder forms of law in the household registration system reform. Journal of Chinese Governance 9(1): 23–48.CrossRef
40.
Zurück zum Zitat Meng, X., and J. Zhang. 2001. The two-tier labor market in urban China: Occupational segregation and wage differentials between urban residents and rural migrants in Shanghai. Journal of Comparative Economics 29: 485–504.CrossRef
41.
Zurück zum Zitat Mycos Research Institute. 2021. Chinese 4-Year College Graduates’ Employment Annual Report. Retrieved from https://www.pishu.com.cn/skwx_ps/bookdetail?SiteID=14&ID=12501799
42.
Zurück zum Zitat National Bureau of Statistics. 2021. Communiqué of the Seventh National Population Census (No. 7). Retrieved from http://www.stats.gov.cn/english/PressRelease/202105/t20210510_1817192.html
43.
Zurück zum Zitat National Bureau of Statistics. 2023. Government Information. Retrieved from http://www.stats.gov.cn/xxgk/jd/sjjd2020/202301/t20230118_1892285.html
44.
Zurück zum Zitat National Ministry of Education. 2016. China Education Yearbook 2015. Retrieved from http://www.moe.gov.cn/jyb_sjzl/moe_364/zgjynj_2015/
45.
Zurück zum Zitat National Ministry of Education. 2023. 2022 Education Statistics. Retrieved from http://www.moe.gov.cn/jyb_sjzl/sjzl_fztjgb/202307/t20230705_1067278.html
46.
Zurück zum Zitat Peters, M. A., and T. Besley. 2018. China’s double first-class University Strategy: Double first class. Educational Philosophy and Theory 50(12): 1075–1079.CrossRef
47.
Zurück zum Zitat Rode, J. C., M. L. Arthaud-Day, C. H. Mooney, J. P. Near, T. T. Baldwin, W. H. Bommer, and R. S. Rubin. 2005. Life satisfaction and student performance. Academy of Management Learning & Education 4(4): 421–433.CrossRef
48.
Zurück zum Zitat Schultz, T. 1975. Human capital and disequilibrium. Journal of Economic Literature 13(3): 827–846.
49.
Zurück zum Zitat Sharma, S. 2024. Role of social capital in determining happiness and life satisfaction: Mediation of self-reported health using path analysis. Fudan Journal of the Humanities and Social Sciences 17(2): 211–242.CrossRef
50.
Zurück zum Zitat Solon, G. 1992. Intergenerational income mobility in the United States. The American Economic Review 82: 393–408.
51.
Zurück zum Zitat Spence, M. 1981. Signaling, screening, and information. In Studies in labor markets, ed. S. Rosen. 319–358. Chicago: University of Chicago Press.
52.
Zurück zum Zitat Su, C.-H. 2018. Report of the employment survey of the 2017’s bachelor graduates in Jiangsu Province. Nanjing: Phoenix Education Publishing.
53.
Zurück zum Zitat Wang, Q., and Z. Sun. 2021. Geographic location, development of higher education and donations to Chinese non-public foundations. Journal of Chinese Governance 8(3): 373–398.CrossRef
54.
Zurück zum Zitat Wang, W., and P. G. Moffatt. 2008. Hukou and graduates’ job search in China. Asian Economic Journal 22(1): 1–23.CrossRef
55.
Zurück zum Zitat Wilcox, G., and D. Nordstokke. 2019. Predictors of University Student satisfaction with life, academic Self-Efficacy, and achievement in the First Year. Canadian Journal of Higher Education 49(1): 104–124.CrossRef
56.
Zurück zum Zitat Xiao, Y., and Y. Bian. 2018. The influence of hukou and college education in China’s labour market. Urban Studies 55(7): 1504–1524.CrossRef
57.
Zurück zum Zitat Xiong, Y., and M. Li. 2021. Industrial ecology and local citizenship of migrant children in urban China. Journal of Chinese Governance 7(3): 466–488.CrossRef
58.
Zurück zum Zitat Young, J. 2013. China’s Hukou System: Markets, migrants and Institutional Change. New York: Palgrave Macmillian.CrossRef
59.
Zurück zum Zitat Yu, N., Y. Jin, and Z. Wang. 2024. Political cycles of infrastructure investment: How career concerns shape strategic behavior of city leaders? Journal of Chinese Political Science 1–27. https://doi.org/10.1007/s11366-024-09896-0
60.
Zurück zum Zitat Zong, X., and W. Zhang. 2019. Establishing World-class universities in China: Deploying a quasi-experimental design to evaluate the net effects of Project 985. Studies in Higher Education 44(3): 417–431.CrossRef
Bildnachweise
Schmalkalden/© Schmalkalden, NTT Data/© NTT Data, Verlagsgruppe Beltz/© Verlagsgruppe Beltz, EGYM Wellpass GmbH/© EGYM Wellpass GmbH, rku.it GmbH/© rku.it GmbH, zfm/© zfm, ibo Software GmbH/© ibo Software GmbH, Lorenz GmbH/© Lorenz GmbH, Axians Infoma GmbH/© Axians Infoma GmbH, genua GmbH/© genua GmbH, Prosoz Herten GmbH/© Prosoz Herten GmbH, Stormshield/© Stormshield, MACH AG/© MACH AG, OEDIV KG/© OEDIV KG, Rundstedt & Partner GmbH/© Rundstedt & Partner GmbH, Doxee AT GmbH/© Doxee AT GmbH