We present a symbolic and graph-based approach for mapping knowledge domains. The symbolic component relies on shallow linguistic processing of texts to extract multi-word terms and cluster them based on lexico-syntactic relations. The clusters are subjected to graph decomposition based on inherent graph theoretic properties of association graphs of items (multi-word terms and authors). This includes the search for complete minimal separators that can decompose the graphs into central (core topics) and peripheral atoms. The methodology is implemented in the TermWatch system and can be used for several text mining tasks. In this paper, we apply our methodology to map the dynamics of terrorism research between 1990-2006. We also mined for frequent itemsets as a mean of revealing dependencies between formal concepts in the corpus. A comparison of the extracted frequent itemsets and the structure of the central atom shows an interesting overlap. The main features of our approach lie in the combination of state-of-the-art techniques from Natural Language Processing (NLP), Clustering and Graph Theory to develop a system and a methodology adapted to uncovering hidden sub-structures from texts.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition
- Springer Berlin Heidelberg