The previous chapter introduced kernel-based techniques and their baseline application to text classification. In this chapter we develop and explore learning techniques that integrate knowledge in the classification task to improve the performance of support vector machines (SVMs) in text classification applications.
The introduction of unlabeled data in the learning stage is investigated. With the deluge of digital text data, unlabeled texts are ubiquitous. Whether it is the Internet, email servers, database files or plain file systems, the sources for digital texts are countless. However, such texts are usually unlabeled, and their labeling is mostly manual and costly. Therefore, a research field on the study and use of these unlabeled texts has been emerging. It is further exploited the potential of using several learning machines organized in a committee. Knowing that there is no unique classifier that suits all situations, the focus is on using the diversity of classifiers to enhance performance.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten