Skip to main content
Top

2018 | OriginalPaper | Chapter

Detecting Complex Sensitive Information via Phrase Structure in Recursive Neural Networks

Authors : Jan Neerbek, Ira Assent, Peter Dolog

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

State-of-the-art sensitive information detection in unstructured data relies on the frequency of co-occurrence of keywords with sensitive seed words. In practice, however, this may fail to detect more complex patterns of sensitive information. In this work, we propose learning phrase structures that separate sensitive from non-sensitive documents in recursive neural networks. Our evaluation on real data with human labeled sensitive content shows that our new approach outperforms existing keyword based strategies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Berardi, G., Esuli, A., Macdonald, C., Ounis, I., Sebastiani, F.: Semi-automated text classification for sensitivity identification. In: CIKM, pp. 1711–1714 (2015) Berardi, G., Esuli, A., Macdonald, C., Ounis, I., Sebastiani, F.: Semi-automated text classification for sensitivity identification. In: CIKM, pp. 1711–1714 (2015)
2.
go back to reference Chow, R., Philippe, G., Staddon, J.: Detecting privacy leaks using corpus-based association rules. In: ACM SIGKDD, pp. 893–901 (2008) Chow, R., Philippe, G., Staddon, J.: Detecting privacy leaks using corpus-based association rules. In: ACM SIGKDD, pp. 893–901 (2008)
3.
go back to reference Cormack, G.V., Grossman, M.R., Hedin, B., Oard, D.W.: Overview of the TREC 2010 legal track. In: TREC (2010) Cormack, G.V., Grossman, M.R., Hedin, B., Oard, D.W.: Overview of the TREC 2010 legal track. In: TREC (2010)
4.
go back to reference Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: IEEE ICNN, pp. 347–352 (1996) Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: IEEE ICNN, pp. 347–352 (1996)
5.
go back to reference Grechanik, M., McMillan, C., Dasgupta, T., Poshyvanyk, D., Gethers, M.: Redacting sensitive information in software artifacts. In: ICPC, pp. 314–325 (2014) Grechanik, M., McMillan, C., Dasgupta, T., Poshyvanyk, D., Gethers, M.: Redacting sensitive information in software artifacts. In: ICPC, pp. 314–325 (2014)
7.
go back to reference Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: NIPS, pp. 2096–2104 (2014) Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: NIPS, pp. 2096–2104 (2014)
9.
go back to reference Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
10.
go back to reference Sánchez, D., Batet, M.: C-sanitized: a privacy model for document redaction and sanitization. JASIST 67, 148–163 (2016) Sánchez, D., Batet, M.: C-sanitized: a privacy model for document redaction and sanitization. JASIST 67, 148–163 (2016)
11.
go back to reference Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011) Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011)
12.
go back to reference Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML, pp. 129–136 (2011) Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML, pp. 129–136 (2011)
13.
go back to reference Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: NIPS WS Deep Learning and Unsupervised Feature Learning, pp. 1–9 (2010) Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: NIPS WS Deep Learning and Unsupervised Feature Learning, pp. 1–9 (2010)
14.
go back to reference Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642 (2013) Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642 (2013)
16.
go back to reference Tomlinson, S.: Learning task experiments in the TREC 2010 legal track. In: TREC (2010) Tomlinson, S.: Learning task experiments in the TREC 2010 legal track. In: TREC (2010)
Metadata
Title
Detecting Complex Sensitive Information via Phrase Structure in Recursive Neural Networks
Authors
Jan Neerbek
Ira Assent
Peter Dolog
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-93040-4_30

Premium Partner