2005 | OriginalPaper | Buchkapitel
Finding Structure in a Document Collection
verfasst von : Sholom M. Weiss, Nitin Indurkhya, Tong Zhang, Fred J. Damerau
Erschienen in: Text Mining
Verlag: Springer New York
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Prediction methods look at stored examples with correct answers and project answers for new examples. One would expect that if we cannot obtain answers for the training examples, then the process cannot be completed. Given a collection of documents, we have no problem transforming the unstructured set of words for each document into a structured spreadsheet. But the last column also must be filled in. In Figure 5.1, we see a spreadsheet, a list of labels, and the spreadsheet column containing the labeled answers. Someone must compose a list of potential labels. Given the list, someone assigns labels to the documents. Sometimes label assignment can be automated, such as the label that a company’s stock price has risen. In most instances, such as topic assignment to newswire articles, the assignment of labels is done by humans, and this can be a tedious and expensive task. Is there any way to assign labels automatically to a document collection? We will discuss this task. Not only will the labels be assigned, but the list of labels will also be determined automatically. Because such key information is missing from the problem description, our expectations for accurate predictive performance should be reduced from standard prediction applications with labeled data.