2010 | OriginalPaper | Buchkapitel
Web Content Mining Using MicroGenres
verfasst von : Václav Snášel, Miloš Kudělka, Zdeněk Horák
Erschienen in: Advanced Techniques in Web Intelligence - I
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The size and growth of the current Web is still creating new challenges to researchers. For example, one of these challenges is the improvement of user familarity to a large number of Web pages. Today’s search engines provide tools that allow users to refine their queries. One way is the refinement of a query based on the analysis of web content. Possible outcomes are not only recommended collocations, but also recommended page genres (e.g., discussion forums, etc.). It is proving to be very useful to provide the details of page content when viewing the page. Not only text snippets, but also parts of the page menu, for certain pages how many posts are present in the discussion, what day the review was created, or what the price is of a product sold on the page. Obtaining this information from unstructured or semi-structured content is not straightforward. In this chapter the development of methods capable of detecting and extracting information from Web pages will be addressed. The concept of objects, called MicroGenre will be presented. Finally we also present experiments with our own Pattrio method, which provides a way to detect objects placed on Web pages.