In the web space, information of an entity is often presented by a set of pages that constitutes a logical page group and its proper handling is required. This paper proposes a method for collecting researchers’ homepages (or entry pages) by applying new simple and effective page group models for combining page group structure and page content, aiming at narrowing down the candidates for further precise and heavy processes. We mainly focus on high recall but less on precision.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
- A Method for Creating a High Quality Collection of Researchers’ Homepages from the Web
- Springer Berlin Heidelberg