Back to articles
Volume: 28 | Article ID: art00012
Image
Recognizing Predatory Chat Documents using Semi-supervised Anomaly Detection
  DOI :  10.2352/ISSN.2470-1173.2016.17.DRR-063  Published OnlineFebruary 2016
Abstract

Chat-logs are informative documents available to nowadays social network providers. Providers and law enforcement tend to use these huge logs anonymously for automatic online Sexual Predator Identification (SPI) which is a relatively new area of application. The task plays an important role in protecting children and juveniles against being exploited by online predators. Pattern recognition techniques facilitate automatic identification of harmful conversations in cyber space by law enforcements. These techniques usually require a large volume of high-quality training instances of both predatory and non-predatory documents. However, collecting non-predatory documents is not practical in real-world applications, since this category contains a large variety of documents with many topics including politics, sports, science, technology and etc. We utilized a new semi-supervised approach to mitigate this problem by adapting an anomaly detection technique called One-class Support Vector Machine which does not require non-predatory samples for training. We compared the performance of this approach against other state-of-the-art methods which use both positive and negative instances. We observed that although anomaly detection approach utilizes only one class label for training (which is a very desirable property in practice); its performance is comparable to that of binary SVM classification. In addition, this approach outperforms the classic two-class Naïve Bayes algorithm, which we used as our baseline, in terms of both classification accuracy and precision.

Subject Areas :
Views 35
Downloads 16
 articleview.views 35
 articleview.downloads 16
  Cite this article 

Mohammadreza Ebrahimi, Ching Y. Suen, Olga Ormandjieva, Adam Krzyzak, "Recognizing Predatory Chat Documents using Semi-supervised Anomaly Detectionin Proc. IS&T Int’l. Symp. on Electronic Imaging: Document Recognition and Retrieval XXIII,  2016,  https://doi.org/10.2352/ISSN.2470-1173.2016.17.DRR-063

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2016
72010604
Electronic Imaging
2470-1173
Society for Imaging Science and Technology