Skip to main content
Log in

Linguistic steganalysis using the features derived from synonym frequency

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A linguistic steganalysis method is proposed to detect synonym substitution-based steganography, which embeds secret message into a text by substituting words with their synonyms. First, attribute pair of a synonym is introduced to represent its position in an ordered synonym set sorting in descending frequency order and the number of its synonyms. As a result of synonym substitutions, the number of high frequency attribute pairs may be reduced while the number of low frequency attribute pairs would be increased. By theoretically analyzing the changes of the statistical characteristics of attribute pairs caused by SS steganography, a feature vector based on the difference of the relative frequencies of different attribute pairs is utilized to detect the secret message. Finally, the impact on the extracted feature vector caused by synonym coding strategies is analyzed. Experimental results demonstrate that the proposed linguistic steganalysis method can achieve better detection performance than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Synset is defined as a set of words with identical or similar meanings

References

  1. Atallah MJ, Raskin V, Crogan M, Hempelmann C, Kerschbaum F, Mohamed D, Naik S (2001) Natural language watermarking: design, analysis, and a proof-of-concept implementation. In: Proceedings of 4th International Workshop Information Hiding, Lecture Notes in Computer Science, Springer, Berlin, vol 2137, pp 185–199

  2. Bolshakov A (2004) A method of linguistic steganography based on collocationally-verified synonymy. In: Proceedings of 6th International Workshop Information Hiding, Lecture Notes in Computer Sciences, Springer, Berlin, vol 3200, pp 180–191

  3. Chang CC, Lin CJ (2010) LIBSVM: a library for support vector machines. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm

  4. Chapman MT, Davida GI (1997) Hiding the hidden: a software system for concealing ciphertext as innocuous text. In: Proceedings of the International Conference on Information and Communications Security, Lecture Notes in Computer Sciences, Springer, Berlin, vol 1334, pp 333–345

  5. Chen ZL, Huang LS, Yu ZS, Yang W et al (2008) Linguistic steganography detection using statistical characteristics of correlations between Words. In: Proceedings of 10th International Workshop on Information Hiding, Lecture Notes in Computer Sciences, Springer, Berlin, vol 5284, pp 224–235

  6. Chiang YL, Chang LP, Hsieh WT, Chen WC (2003) Natural language watermarking using semantic substitution for Chinese text. In: Proceedings of 2nd International Workshop Digital Watermarking, Lecture Notes in Computer Sciences, Springer, Berlin, vol 2939, pp 129–140

  7. Google Terms of Service (2010) [Online]. Available: http://www.google.com/accounts/TOS?hl=en

  8. Leech G, Rayson P, Wilson A (2001) Word frequencies in written and spoken english: based on the British National Corpus. Longman, London

    Google Scholar 

  9. Leech G, Rayson P, Wilson A (2010) Word frequencies in written and spoken english: based on the British National Corpus. [Online]. Available: http://ucrel.lancs.ac.uk/bncfreq/

  10. Liu YL, Sun XM, Gan C, Wang H (2007) An efficient linguistic steganography for Chinese text. In: Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, pp 2094–2097

  11. Liu YL, Sun XM, Liu YP, Li CT (2008) MIMIC-PPT: mimicking-based steganography for microsoft PowerPoint document. Inf Tech J 7(4):654–660

    Article  Google Scholar 

  12. Luo G, Sun XM, Xiang LY, Liu YL, Gan C (2008) Steganalysis on synonym substitution steganography. J Comput Res Dev (Chinese) 45(10):1696–1703

    Google Scholar 

  13. Meral HM, Sankur B, Özsoy AS, Güngör T, Sevinç E (2009) Natural language watermarking via morphosyntactic alterations. Comput Speech Lang 23(1):107–125

    Article  Google Scholar 

  14. Muhammad HZ, Rahman SMSAA, Shakil A (2009) Synonym based Malay linguistic text steganography. In: 2009 Conference on Innovative Technologies in Intelligent Systems and Industrial Applications, pp 423–427

  15. Shirali-Shahreza MH, Shirali-Shahreza M (2008) A new synonym text steganography. In: Proceedings of the 4th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp 1524–1526

  16. Taskiran CM, Topkara U, Topkara M, Delp EJ (2006) Attacks on lexical natural language steganography systems. In: Proceedings of the SPIE, Security, Steganography and Watermarking of Multimedia Contents VIII, vol 6072, pp 97–105

  17. Topkara U, Topkara M, Atallah MJ (2006) The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In: Proceedings of the 8th Workshop on Multimedia and security. ACM Press, pp 164–174

  18. Wang Y, Moulin P (2007) Optimized feature extraction for learning-based image steganalysis. IEEE Trans Inf Forensics Secur 2(1):31–45

    Article  Google Scholar 

  19. Winstein K (2010) Lexical steganography through adaptive modulation of the word choice hash. [Online]. Available: http://alumni.imsa.edu/~keithw/tlex/lsteg.ps

  20. Winstein K (2010) Tyrannosaurus lex. [Online]. Available: http://alumni.imsa.edu/~keithw/tlex/

  21. WordNet (2010) [Online]. Available: http://wordnet.princeton.edu/

  22. Yang JL, Wang JM, Wang CK, Li DY (2007) A novel scheme for watermarking natural language text. In: Proceedings of the 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, vol. 2, pp. 481–484

  23. Yu ZS, Huang LS, Chen ZL, Li LJ, Zhao XX, Zhu YW (2008) Detection of Synonym-Substitution Modified Articles Using Context Information. In: Proceedings of 2nd International Conference on Future Generation Communication and Networking, vol 1, pp 134–139

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (Nos. 60973128, 61073191, 61070196, 61070195, 61103215, 61173141, 61173142, and 61232016), National Basic Research Program 973 of China (Nos. 2010CB334706, and 2011CB311808), 2011GK2009, GYHY201206033, 201301030, 0S2013GR0445 and PAPD fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingming Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiang, L., Sun, X., Luo, G. et al. Linguistic steganalysis using the features derived from synonym frequency. Multimed Tools Appl 71, 1893–1911 (2014). https://doi.org/10.1007/s11042-012-1313-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1313-8

Keywords

Navigation