tutorial

Bringing structure to text: mining phrases, entities, topics, and hierarchies

Authors:
Jiawei Han

The University of Illinois at Urbana Champaign, Urbana, IL, USA

The University of Illinois at Urbana Champaign, Urbana, IL, USA
View Profile

,
Chi Wang

The University of Illinois at Urbana Champaign, Urbana, IL, USA

The University of Illinois at Urbana Champaign, Urbana, IL, USA
View Profile

,
Ahmed El-Kishky

The University of Illinois at Urbana Champaign, Urbana, IL, USA

The University of Illinois at Urbana Champaign, Urbana, IL, USA
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 1968https://doi.org/10.1145/2623330.2630804

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1968

ABSTRACT

Mining phrases, entity concepts, topics, and hierarchies from massive text corpus is an essential problem in the age of big data. Text data in electronic forms are ubiquitous, ranging from scientific articles to social networks, enterprise logs, news articles, social media and general web pages. It is highly desirable but challenging to bring structure to unstructured text data, uncover underlying hierarchies, relationships, patterns and trends, and gain knowledge from such data.

In this tutorial, we provide a comprehensive survey on the state-of-the art of data-driven methods that automatically mine phrases, extract and infer latent structures from text corpus, and construct multi-granularity topical groupings and hierarchies of the underlying themes. We study their principles, methodologies, algorithms and applications using several real datasets including research papers and news articles and demonstrate how these methods work and how the uncovered latent entity structures may help text understanding, knowledge discovery and management.

Supplemental Material

p1968-sidebyside1.mp4

mp4

1.7 GB

Download

p1968-sidebyside2.mp4

mp4

1.6 GB

Download

Index Terms

Bringing structure to text: mining phrases, entities, topics, and hierarchies
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Mining topics in short texts (e.g. tweets, instant messages) can help people grasp essential information and understand key contents, and is widely used in many applications related to social media and text analysis. The sparsity and noise of short ...
Read More
CitationLDA++: an Extension of LDA for Discovering Topics in Document Network
SoICT '18: Proceedings of the 9th International Symposium on Information and Communication Technology

Along with rapid development of electronic scientific publication repositories, automatic topics identification from papers has helped a lot for the researchers in their research. Latent Dirichlet Allocation (LDA) model is the most popular method which ...
Read More
Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data
BIGDATASERVICE '15: Proceedings of the 2015 IEEE First International Conference on Big Data Computing Service and Applications

Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Check for updates
Author Tags
information networks
phrase mining
text mining
topic model
Qualifiers
- tutorial
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 748
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bringing structure to text: mining phrases, entities, topics, and hierarchies

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

Cited By

Index Terms

Recommendations

A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge

CitationLDA++: an Extension of LDA for Discovering Topics in Document Network

Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data