Introduction
Related works
Fuzzy inference model
Concept semantic similarity
Authors | Methods | Classification |
---|---|---|
Wu et al. | Method based on path between concepts and depth of common parent node [28] | Path distance method |
Fellbaum et al. | Method based on number of nodes between concepts and maximum depth [31] | Path distance method |
Fellbaum et al. | The number of direction changes in traversing path [32] | Path distance method |
Lin | Method based on information commonness and total information [29] | Information content method |
Resnik | Method based on maximum information of common ancestor [33] | Information content method |
Jiang et al. | Method based on maximum semantic similarity of word pairs [34] | Information content method |
Tversky | Method based on number of public attributes [30] | Attribute feature method |
Xun et al. | Method based on synonyms and vector space model [35] | Attribute feature method |
Focused crawler based on SDVSM
Semantic disambiguation graph
Topic graph construction
Ambiguation term identification
Indicator calculation
Ambiguity resolution
Rule number | Topic relevance (TR) | Topic popularity (TP) | Topic importance (TI) | Rule result |
---|---|---|---|---|
1 | L | L | L | H |
2 | L | L | H | H |
3 | L | H | L | H |
4 | H | L | L | H |
5 | L | H | H | L |
6 | H | L | H | L |
7 | H | H | L | L |
8 | H | H | H | L |
Disambiguation term extraction
Semantic vector space model
Hyperlink priority prediction
Web term disambiguation
Hyperlink priority estimation
Experiment
Experimental design
Experimental focused crawler
Experimental initial data
Topics | Initial URLs |
---|---|
1. Fifth-generation mobile networks | |
2. Artificial neural networks | |
3. Information retrieval | |
4. Web search engine | |
5. Driverless | |
6. Distributed computing | |
7. Virtual reality | |
8. Data mining | |
9. Data analysis | |
10. Network security | |
Experimental evaluation indicator
Experimental crawling results
First group results
Numbers of retrieved pages | BF crawler | VSM crawler | SSRM crawler | SDVSM crawler | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
HR | AS | HR | AS | AE | HR | AS | AE | HR | AS | AE | |
100 | 0.464 | 0.448 | 0.442 | 0.407 | 0.399 | 0.459 | 0.459 | 0.246 | 0.504 | 0.416 | 0.148 |
200 | 0.459 | 0.425 | 0.460 | 0.412 | 0.411 | 0.449 | 0.449 | 0.239 | 0.476 | 0.418 | 0.148 |
300 | 0.432 | 0.440 | 0.486 | 0.414 | 0.453 | 0.447 | 0.447 | 0.213 | 0.471 | 0.416 | 0.139 |
400 | 0.439 | 0.431 | 0.472 | 0.416 | 0.470 | 0.445 | 0.445 | 0.194 | 0.462 | 0.414 | 0.131 |
500 | 0.438 | 0.424 | 0.472 | 0.418 | 0.476 | 0.437 | 0.437 | 0.181 | 0.474 | 0.412 | 0.125 |
600 | 0.425 | 0.422 | 0.460 | 0.419 | 0.485 | 0.435 | 0.435 | 0.178 | 0.489 | 0.413 | 0.124 |
700 | 0.421 | 0.420 | 0.455 | 0.419 | 0.484 | 0.434 | 0.434 | 0.177 | 0.483 | 0.414 | 0.123 |
800 | 0.436 | 0.431 | 0.447 | 0.419 | 0.473 | 0.432 | 0.432 | 0.177 | 0.470 | 0.415 | 0.123 |
900 | 0.446 | 0.428 | 0.450 | 0.420 | 0.463 | 0.432 | 0.432 | 0.180 | 0.465 | 0.416 | 0.122 |
1000 | 0.439 | 0.425 | 0.451 | 0.419 | 0.452 | 0.430 | 0.430 | 0.183 | 0.473 | 0.415 | 0.121 |
1500 | 0.427 | 0.412 | 0.449 | 0.422 | 0.403 | 0.425 | 0.425 | 0.205 | 0.476 | 0.418 | 0.118 |
2000 | 0.423 | 0.416 | 0.446 | 0.422 | 0.372 | 0.425 | 0.425 | 0.233 | 0.468 | 0.421 | 0.127 |
2500 | 0.420 | 0.422 | 0.464 | 0.424 | 0.369 | 0.424 | 0.424 | 0.240 | 0.475 | 0.425 | 0.129 |
3000 | 0.444 | 0.417 | 0.461 | 0.422 | 0.378 | 0.425 | 0.425 | 0.228 | 0.474 | 0.423 | 0.124 |
3500 | 0.436 | 0.414 | 0.458 | 0.421 | 0.391 | 0.423 | 0.423 | 0.212 | 0.465 | 0.423 | 0.122 |
4000 | 0.448 | 0.413 | 0.461 | 0.421 | 0.403 | 0.423 | 0.423 | 0.202 | 0.463 | 0.424 | 0.121 |
4500 | 0.452 | 0.413 | 0.462 | 0.422 | 0.411 | 0.422 | 0.422 | 0.196 | 0.463 | 0.423 | 0.118 |
5000 | 0.448 | 0.409 | 0.465 | 0.422 | 0.411 | 0.421 | 0.421 | 0.194 | 0.471 | 0.424 | 0.117 |
Second group results
Numbers of retrieved pages | BF crawler | VSM crawler | SSRM crawler | SDVSM crawler | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
HR | AS | HR | AS | AE | HR | AS | AE | HR | AS | AE | |
100 | 0.532 | 0.405 | 0.434 | 0.419 | 0.401 | 0.444 | 0.430 | 0.216 | 0.350 | 0.431 | 0.169 |
200 | 0.468 | 0.417 | 0.433 | 0.427 | 0.464 | 0.416 | 0.429 | 0.213 | 0.377 | 0.438 | 0.176 |
300 | 0.412 | 0.411 | 0.439 | 0.427 | 0.505 | 0.407 | 0.431 | 0.201 | 0.388 | 0.434 | 0.159 |
400 | 0.407 | 0.417 | 0.407 | 0.429 | 0.543 | 0.390 | 0.434 | 0.184 | 0.379 | 0.435 | 0.145 |
500 | 0.398 | 0.413 | 0.389 | 0.426 | 0.552 | 0.382 | 0.435 | 0.181 | 0.376 | 0.434 | 0.135 |
600 | 0.408 | 0.414 | 0.392 | 0.425 | 0.542 | 0.382 | 0.437 | 0.185 | 0.367 | 0.435 | 0.130 |
700 | 0.386 | 0.410 | 0.390 | 0.427 | 0.528 | 0.379 | 0.436 | 0.189 | 0.361 | 0.437 | 0.128 |
800 | 0.371 | 0.407 | 0.380 | 0.426 | 0.510 | 0.384 | 0.435 | 0.196 | 0.361 | 0.436 | 0.128 |
900 | 0.385 | 0.411 | 0.381 | 0.428 | 0.495 | 0.382 | 0.435 | 0.204 | 0.364 | 0.436 | 0.130 |
1000 | 0.400 | 0.408 | 0.387 | 0.430 | 0.475 | 0.381 | 0.436 | 0.213 | 0.371 | 0.438 | 0.134 |
1500 | 0.436 | 0.404 | 0.400 | 0.433 | 0.489 | 0.403 | 0.443 | 0.214 | 0.413 | 0.451 | 0.141 |
2000 | 0.391 | 0.402 | 0.414 | 0.442 | 0.528 | 0.405 | 0.444 | 0.199 | 0.415 | 0.450 | 0.130 |
2500 | 0.369 | 0.397 | 0.411 | 0.444 | 0.543 | 0.410 | 0.442 | 0.198 | 0.412 | 0.449 | 0.123 |
3000 | 0.357 | 0.392 | 0.415 | 0.443 | 0.537 | 0.411 | 0.442 | 0.200 | 0.420 | 0.449 | 0.122 |
3500 | 0.337 | 0.392 | 0.421 | 0.441 | 0.514 | 0.416 | 0.442 | 0.200 | 0.423 | 0.448 | 0.118 |
4000 | 0.336 | 0.392 | 0.417 | 0.441 | 0.498 | 0.417 | 0.444 | 0.202 | 0.425 | 0.448 | 0.115 |
4500 | 0.344 | 0.394 | 0.419 | 0.441 | 0.484 | 0.414 | 0.443 | 0.204 | 0.426 | 0.446 | 0.114 |
5000 | 0.361 | 0.400 | 0.426 | 0.439 | 0.472 | 0.418 | 0.443 | 0.208 | 0.426 | 0.444 | 0.114 |
Third group results
Numbers of retrieved pages | BF Crawler | VSM Crawler | SSRM Crawler | SDVSM Crawler | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
HR | AS | HR | AS | AE | HR | AS | AE | HR | AS | AE | |
100 | 0.498 | 0.426 | 0.438 | 0.413 | 0.400 | 0.440 | 0.445 | 0.231 | 0.427 | 0.423 | 0.158 |
200 | 0.464 | 0.421 | 0.447 | 0.419 | 0.437 | 0.447 | 0.439 | 0.226 | 0.427 | 0.428 | 0.162 |
300 | 0.422 | 0.425 | 0.463 | 0.421 | 0.479 | 0.438 | 0.439 | 0.207 | 0.430 | 0.425 | 0.149 |
400 | 0.423 | 0.424 | 0.439 | 0.422 | 0.507 | 0.420 | 0.439 | 0.189 | 0.421 | 0.425 | 0.138 |
500 | 0.418 | 0.419 | 0.431 | 0.422 | 0.514 | 0.410 | 0.436 | 0.181 | 0.425 | 0.423 | 0.130 |
600 | 0.416 | 0.418 | 0.426 | 0.422 | 0.513 | 0.412 | 0.436 | 0.181 | 0.428 | 0.424 | 0.127 |
700 | 0.403 | 0.415 | 0.423 | 0.423 | 0.506 | 0.411 | 0.435 | 0.183 | 0.422 | 0.426 | 0.126 |
800 | 0.404 | 0.419 | 0.414 | 0.423 | 0.492 | 0.412 | 0.433 | 0.187 | 0.416 | 0.426 | 0.125 |
900 | 0.415 | 0.420 | 0.416 | 0.424 | 0.479 | 0.413 | 0.433 | 0.192 | 0.414 | 0.426 | 0.126 |
1000 | 0.420 | 0.416 | 0.419 | 0.425 | 0.463 | 0.412 | 0.433 | 0.198 | 0.422 | 0.427 | 0.128 |
1500 | 0.431 | 0.408 | 0.425 | 0.428 | 0.446 | 0.424 | 0.434 | 0.210 | 0.445 | 0.434 | 0.129 |
2000 | 0.407 | 0.409 | 0.430 | 0.432 | 0.450 | 0.426 | 0.434 | 0.216 | 0.442 | 0.436 | 0.129 |
2500 | 0.394 | 0.410 | 0.438 | 0.434 | 0.456 | 0.433 | 0.433 | 0.219 | 0.443 | 0.437 | 0.126 |
3000 | 0.401 | 0.405 | 0.438 | 0.433 | 0.457 | 0.430 | 0.433 | 0.214 | 0.447 | 0.436 | 0.123 |
3500 | 0.387 | 0.403 | 0.439 | 0.431 | 0.453 | 0.428 | 0.433 | 0.206 | 0.444 | 0.435 | 0.120 |
4000 | 0.392 | 0.403 | 0.439 | 0.431 | 0.451 | 0.426 | 0.433 | 0.202 | 0.444 | 0.436 | 0.118 |
4500 | 0.398 | 0.404 | 0.440 | 0.431 | 0.447 | 0.423 | 0.432 | 0.200 | 0.444 | 0.435 | 0.116 |
5000 | 0.405 | 0.405 | 0.445 | 0.430 | 0.441 | 0.424 | 0.432 | 0.201 | 0.449 | 0.434 | 0.115 |