Introduction
Related work
Text semantic representation modelling
Chinese text error correction
The proposed method
Overview
Associative knowledge network modelling
Noun entity node creation
Associative relationship creation
Associative strength computation
Incremental updating of associative relationship
Method of checking the semantic coherence of noun context
Label | Text sentence |
---|---|
Correct: | 杜仲是一种很滋补的药材, 对我们很多的疾病都有很好治疗效果。 Eucommia ulmoides is a very nourishing medicinal material, which has a good therapeutic effect on many of our diseases |
Incorrect: ① ② | 杜仲是一种很滋补的食物, 对我们很多的疾病都有很好治疗效果作用。 Eucommia ulmoides is a very nourishing food, which has a good therapeutic effect and function on many of our diseases |
Correct: | 食欲不振的人, 吃龙眼可以得到很好的改善。 People with poor appetite can be well improved by eating longan |
Incorrect: ① ③ | 吃龙眼食欲不振的人, 可以得到很好的作用改善。 People with poor appetite after eating longan can get a good effect improvement |
Correct: | 对于贫血的人、体质虚弱的人吃龙眼是很有益处的。 Longan is very beneficial for anaemic people and people with weak constitution |
Incorrect: ① ② | 对于贫血的人、健康虚弱的人吃龙眼水果是很有益处的。 Longan fruit is very beneficial for anaemic people and people with weak healthy |
Correct: | 大力发展社会主义先进文化。 Vigorously developing advanced socialist culture |
Incorrect: ② | 大力发展社会主义先进艺术。 Vigorously developing advanced socialist art |
Correct: | 历史和现实都告诉我们, 法治兴则国兴, 法治强则国强。 Both history and reality tell us that, the prosperous rules of law make the country prosperous, and the strong rules of law make the country strong |
Incorrect: ② ③ | 历史和今天都告诉我们, 国兴则法治兴, 法律强则国强。 Both history and today tell us that, the prosperity of the country makes the rule of law flourish, while the strong of the law makes the country strong |
Correct: | 中国共产党是中国工人阶级的先锋队, 同时是中国人民和中华民族的先锋队。 The Communist Party of China (CPC) is the vanguard of the Chinese working class, and also the vanguard of the Chinese people and the Chinese nation |
Incorrect: ② | 中国共产党是中国人民的先锋队, 同时是中国人民和中华民族的未来。 The Communist Party of China (CPC) is the vanguard of the Chinese people, and also the future of the Chinese people and the Chinese nation |
Current document preprocessing
Acquisition of multilevel contextual relationships of noun entities
Associative coupling degree computing
Multilevel associative coupling degree features
Coherence checking using interpretable classification
Experimental methods
Method parameters
Evaluation metrics
Experimental datasets
Corpus I | Corpus II | |
---|---|---|
Quantity of text for constructing a background knowledge network | 10,697 | 7149 |
Quantity of text for coherence checking | 100 | 100 |
Numerical results and discussions
Name | \(Vacd_{inside}\) | \(Vacd_{between}\) | \(Vacd_{1}\) | \(Vacd_{2}\) | \(Vacd_{3}\) | \(Vacd_{4}\) | |
---|---|---|---|---|---|---|---|
市场 | Market | 19.273 | 19.273 | 12.628 | 11.691 | 11.076 | 6.4567 |
牛奶 | Milk | 97.443 | 12.715 | 206.32 | 112.00 | 28.551 | 20.597 |
种类 | Type | 11.973 | 11.973 | 27.971 | 14.623 | 11.691 | 10.680 |
牛奶 | Milk | 97.443 | 1.3333 | 206.32 | 112.00 | 28.551 | 20.597 |
阿胶 | Donkey-hide gelatine | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 |
理念 | Idea | 1.3333 | 1.3333 | 5.6206 | 2.4995 | 1.5000 | 1.3333 |
牛奶 | Milk | 97.443 | 134.87 | 206.32 | 112.00 | 28.551 | 20.597 |
关键 | Key | 9.1841 | 9.1841 | 12.968 | 9.9160 | 8.5892 | 3.7447 |
营养 | Nutrition | 154.41 | 154.41 | 112.00 | 109.71 | 66.647 | 32.737 |
含量 | Content | 132.22 | 132.22 | 206.32 | 109.71 | 27.971 | 19.570 |
… | … | … | … | … | … | … | |
抵抗力 | Resistant | 0.0000 | 0.0000 | 9.3108 | 4.8435 | 1.0000 | 0.0000 |
高钙 | High calcium | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 |
牛奶 | Milk | 42.992 | 0.0000 | 79.439 | 31.765 | 22.357 | 13.311 |
事情 | Thing | 0.0000 | 0.0000 | 5.9122 | 2.3333 | 1.0000 | 0.0000 |
牛奶 | Milk | 28.479 | 28.479 | 79.439 | 31.765 | 22.357 | 13.311 |
含钙量 | Calcium content | 5.3162 | 5.3162 | 13.311 | 6.7186 | 2.5000 | 1.0000 |
食物 | Food | 26.531 | 26.531 | 58.442 | 37.833 | 31.765 | 5.9122 |
钙 | Calcium | 0.0000 | 1.0000 | 79.439 | 37.833 | 8.9932 | 6.7186 |
碳酸钙 | Calcium carbonate | 0.0000 | 0.0000 | 2.1067 | 1.0000 | 0.0000 | 0.0000 |
滋养品 | Nourishment | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 |
… | … | … | … | … | … | … | |
中医学 | Traditional Chinese medicine | 0.0000 | 0.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0000 |
… | … | … | … | … | … | … | |
雪蛤膏 | Snow clam paste | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 |
Name | \(Vacd_{4}\) | \(Vacd_{inside}\) | \(Vacd_{between}\) | \(Vacd_{1}\) | \(Vacd_{2}\) | \(Vacd_{3}\) | |
---|---|---|---|---|---|---|---|
中华民族 | Chinese nation | 6.6407 | 6.6407 | 11.851 | 9.0341 | 6.1143 | 2.6826 |
中国共产党 | Chinese Communist | 11.343 | 11.343 | 33.167 | 18.969 | 15.120 | 9.6730 |
人 | People | 5.8105 | 5.8105 | 7.4738 | 4.1817 | 3.5780 | 3.4692 |
历史使命 | Historical mission | 80.034 | 80.034 | 86.296 | 52.163 | 43.588 | 33.167 |
公共设施 | Public facilities | 0.0000 | 0.0000 | 1.9668 | 1.0000 | 0.0000 | 0.0000 |
历史使命 | Historical mission | 15.565 | 15.565 | 14.709 | 11.529 | 1.5750 | 1.4210 |
中国共产党 | China Engineering party | 24.830 | 0.0000 | 4.9653 | 2.4979 | 1.0000 | 0.0000 |
领导 | Leader | 2.9734 | 2.9734 | 2.2328 | 1.5893 | 1.0000 | 0.8201 |
… | … | … | … | … | … | … | |
市场 | Market | 3.6933 | 1.0000 | 15.275 | 5.3381 | 3.5209 | 3.1440 |
一代人 | A generation | 42.909 | 42.909 | 71.694 | 55.993 | 16.068 | 12.536 |
一代人 | A generation | 0.7358 | 0.7358 | 1.0000 | 0.7358 | 0.6734 | 0.5329 |
长征路 | Long march road | 57.313 | 4.2077 | 222.94 | 37.288 | 34.500 | 18.595 |
… | … | … | … | … | … | … | |
危房 | Dilapidated houses | 0.0000 | 0.0000 | 1.0000 | 0.7597 | 0.5794 | 0.2893 |
基层 | Grass roots | 11.828 | 11.828 | 11.828 | 1.0000 | 0.0000 | 0.0000 |
… | … | … | … | … | … | … | |
事业 | Cause | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 |
… | … | … | … | … | … | … | |
消费结构 | Consumption structure | 0.0000 | 0.0000 | 1.2438 | 1.0000 | 0.8161 | 0.0000 |
… | … | … | … | … | … | … | |
公益 | Public welfare | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 |
Performance analysis on different paragraph feature numbers
Only basic features | Add 4 paragraph features | |
---|---|---|
Dataset I | 91.92 ± 0.53 | 92.74 ± 1.12 |
Dataset II | 91.70 ± 0.65 | 92.35 ± 0.97 |
Performance influence under different capacity scales of background knowledge network
Performance analysis using different relationship measurements
Performance analysis of comparable methods
Complexity analysis
Training stage (3000 texts) time/memory usage | Detecting stage (10 texts) time/memory usage | |
---|---|---|
AssoCheck | 5.0 h/170.700 MB | 108.5 s/85.504 MB |
SoftMB [38] | 2.9 h/336.289 MB | 26.40 s/22.427 MB |