Top

Published in:

2018 | OriginalPaper | Chapter

Detecting Complex Sensitive Information via Phrase Structure in Recursive Neural Networks

Authors : Jan Neerbek, Ira Assent, Peter Dolog

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

State-of-the-art sensitive information detection in unstructured data relies on the frequency of co-occurrence of keywords with sensitive seed words. In practice, however, this may fail to detect more complex patterns of sensitive information. In this work, we propose learning phrase structures that separate sensitive from non-sensitive documents in recursive neural networks. Our evaluation on real data with human labeled sensitive content shows that our new approach outperforms existing keyword based strategies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Trans2Vec: Learning Transaction Embedding via Items and Frequent Itemsets

next chapter A Distance Scaling Method to Improve Density-Based Clustering

Berardi, G., Esuli, A., Macdonald, C., Ounis, I., Sebastiani, F.: Semi-automated text classification for sensitivity identification. In: CIKM, pp. 1711–1714 (2015)

Chow, R., Philippe, G., Staddon, J.: Detecting privacy leaks using corpus-based association rules. In: ACM SIGKDD, pp. 893–901 (2008)

Cormack, G.V., Grossman, M.R., Hedin, B., Oard, D.W.: Overview of the TREC 2010 legal track. In: TREC (2010)

Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: IEEE ICNN, pp. 347–352 (1996)

Grechanik, M., McMillan, C., Dasgupta, T., Poshyvanyk, D., Gethers, M.: Redacting sensitive information in software artifacts. In: ICPC, pp. 314–325 (2014)

Hart, M., Manadhata, P., Johnson, R.: Text classification for data loss prevention. In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 18–37. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22263-4_2CrossRef

Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: NIPS, pp. 2096–2104 (2014)

Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22CrossRef

Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

10.

Sánchez, D., Batet, M.: C-sanitized: a privacy model for document redaction and sanitization. JASIST 67, 148–163 (2016)

11.

Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011)

12.

Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML, pp. 129–136 (2011)

13.

Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: NIPS WS Deep Learning and Unsupervised Feature Learning, pp. 1–9 (2010)

14.

Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642 (2013)

15.

Taylor, A., Marcus, M., Santorini, B.: The Penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks. Text, Speech and Language Technology, vol. 20. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_1CrossRef

16.

Tomlinson, S.: Learning task experiments in the TREC 2010 legal track. In: TREC (2010)

Title: Detecting Complex Sensitive Information via Phrase Structure in Recursive Neural Networks
Authors: Jan Neerbek
Ira Assent
Peter Dolog
Publisher: Springer International Publishing
Book: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-3-319-93039-8

Electronic ISBN: 978-3-319-93040-4

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-93040-4_30

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner