Introduction
Background
Contributions
Preliminary knowledge
Medical text data
BERT
Methodology
Cloud computing framework
Disease prediction and department recommendation model
Pre-training a Chinese medical BERT model
Fine-tuning CHMBERT for disease prediction and department recommendation
Experiments
Dataset
Baselines
-
TextCNN [15]. It is a text classification algorithm based on CNN. It utilizes multiple convolution kernels in different sizes to extract key information from sentences, which can capture local correlation of sentences. TextCNN has simple architecture and fast training speed, achieving state-of-the-art results on multiple datasets.
-
BiLSTM [16]. RNN is a widely applied NLP model that can process variable length text sequences and learn long distance dependencies from sentences. In this experiment, a single-layer bidirectional LSTM network was utilized to classify the input text.
-
LEAM [17].It is a model based on attention mechanism. It performs well in text representation by learning the joint embedding of word and label in the same space. Compared with other attention-based models, LEAM needs fewer model parameters and converges faster, and has good interpretability.
-
Transformer [9].It is a sequence processing model based on self-attention mechanism, which can learn long-distance dependency from sentences. It can run in parallel paradigm and is the basis of BERT and other pre-trained models.
-
BERT-base [4]. It is the original Chinese BERT pre-trained model published by Google, which achieves the state-of-the-art performance in many text classification tasks.
-
BERT-wwm [18]. The updated version of BERT, published by Harbin Industrial University, is a Chinese pre-trained model based on Whole Word Masking technology. Its performance is slightly better than that of the original BERT in sentence classification task.
Implementation details
Experimental results
Methods | Average macro (Top-1) | Average macro (Top-5) | Average macro (Top-10) | |||
---|---|---|---|---|---|---|
Accuracy | F1-score | Accuracy | F1-score | Accuracy | F1-score | |
TextCNN | 65.38 | 60.6 | 92.91 | 91.49 | 97.27 | 96.73 |
LSTM | 64.18 | 59.68 | 91.9 | 90.08 | 96.61 | 95.87 |
LEAM | 63.44 | 55.79 | 92.1 | 90.2 | 96.9 | 96.3 |
Transformer | 65.11 | 59.97 | 92.74 | 91.24 | 97.12 | 96.6 |
BERT-base | 65.95 | 61.36 | 93.11 | 91.59 | 97.28 | 96.78 |
BERT-wwm | 66.12 | 61.56 | 93.06 | 91.47 | 97.34 | 96.82 |
CHMBERT | 66.28 | 61.95 | 93.08 | 91.58 | 97.27 | 96.83 |
Methods | Average macro (Top-1) | Average macro (Top-2) | Average macro (Top-3) | |||
---|---|---|---|---|---|---|
Accuracy | F1-score | Accuracy | F1-score | Accuracy | F1-score | |
TextCNN | 85.87 | 72.32 | 94.74 | 84.52 | 97.16 | 89.77 |
LSTM | 84.72 | 70.98 | 94.05 | 83.46 | 96.55 | 89.00 |
LEAM | 84.64 | 68.35 | 94.02 | 83.04 | 96.73 | 88.88 |
Transformer | 84.98 | 69.59 | 94.37 | 83.87 | 96.95 | 89.51 |
BERT-base | 86.47 | 73.36 | 95.04 | 85.29 | 97.32 | 89.62 |
BERT-wwm | 86.52 | 73.47 | 95.03 | 84.47 | 97.24 | 89.73 |
CHMBERT | 86.66 | 74.06 | 95.18 | 85.30 | 97.44 | 90.67 |