27-08-2023
An Approach for Analyzing Unstructured Text Data Using Topic Modeling Techniques for Efficient Information Extraction
Authors:
Ashwini Zadgaonkar, Avinash J. Agrawal
Published in:
New Generation Computing
Log in
Abstract
Topic modeling techniques are popularly used for document clustering, large-scale text analysis, information extraction from unstructured text documents, feature selection from large corpus, and various recommendation systems. This work suggested a framework using topic modeling techniques for legal information extraction from the Indian judicial system’s unstructured legal judgments. The suggested approach aims to eliminate time-consuming manual judgment analysis in favor of automated judgment analysis that can quickly examine large number of judgments in reduced time span. In this work, we have experimented with different topic modeling methodologies for information extraction. The proposed framework is built on the Latent Dirichlet Allocation, to categorize legal judgments into extracted topic groups. Indian Supreme Court judgements are considered for the experimental setting. The three main elements of the framework are pre-processing, applying the topic model, and model evaluation using a coherence score metric. The framework was successfully applied to a corpus size of 100, 500, and 1000 legal judgments in batches. The proposed framework is used to measure legal judgment similarity to demonstrate its quantitative evaluation. In the future scope, various legal tasks that can benefit from the proposed framework for performance improvement are suggested.