Abstract
A linguistic steganalysis method is proposed to detect synonym substitution-based steganography, which embeds secret message into a text by substituting words with their synonyms. First, attribute pair of a synonym is introduced to represent its position in an ordered synonym set sorting in descending frequency order and the number of its synonyms. As a result of synonym substitutions, the number of high frequency attribute pairs may be reduced while the number of low frequency attribute pairs would be increased. By theoretically analyzing the changes of the statistical characteristics of attribute pairs caused by SS steganography, a feature vector based on the difference of the relative frequencies of different attribute pairs is utilized to detect the secret message. Finally, the impact on the extracted feature vector caused by synonym coding strategies is analyzed. Experimental results demonstrate that the proposed linguistic steganalysis method can achieve better detection performance than previous methods.
Similar content being viewed by others
Notes
Synset is defined as a set of words with identical or similar meanings
References
Atallah MJ, Raskin V, Crogan M, Hempelmann C, Kerschbaum F, Mohamed D, Naik S (2001) Natural language watermarking: design, analysis, and a proof-of-concept implementation. In: Proceedings of 4th International Workshop Information Hiding, Lecture Notes in Computer Science, Springer, Berlin, vol 2137, pp 185–199
Bolshakov A (2004) A method of linguistic steganography based on collocationally-verified synonymy. In: Proceedings of 6th International Workshop Information Hiding, Lecture Notes in Computer Sciences, Springer, Berlin, vol 3200, pp 180–191
Chang CC, Lin CJ (2010) LIBSVM: a library for support vector machines. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chapman MT, Davida GI (1997) Hiding the hidden: a software system for concealing ciphertext as innocuous text. In: Proceedings of the International Conference on Information and Communications Security, Lecture Notes in Computer Sciences, Springer, Berlin, vol 1334, pp 333–345
Chen ZL, Huang LS, Yu ZS, Yang W et al (2008) Linguistic steganography detection using statistical characteristics of correlations between Words. In: Proceedings of 10th International Workshop on Information Hiding, Lecture Notes in Computer Sciences, Springer, Berlin, vol 5284, pp 224–235
Chiang YL, Chang LP, Hsieh WT, Chen WC (2003) Natural language watermarking using semantic substitution for Chinese text. In: Proceedings of 2nd International Workshop Digital Watermarking, Lecture Notes in Computer Sciences, Springer, Berlin, vol 2939, pp 129–140
Google Terms of Service (2010) [Online]. Available: http://www.google.com/accounts/TOS?hl=en
Leech G, Rayson P, Wilson A (2001) Word frequencies in written and spoken english: based on the British National Corpus. Longman, London
Leech G, Rayson P, Wilson A (2010) Word frequencies in written and spoken english: based on the British National Corpus. [Online]. Available: http://ucrel.lancs.ac.uk/bncfreq/
Liu YL, Sun XM, Gan C, Wang H (2007) An efficient linguistic steganography for Chinese text. In: Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, pp 2094–2097
Liu YL, Sun XM, Liu YP, Li CT (2008) MIMIC-PPT: mimicking-based steganography for microsoft PowerPoint document. Inf Tech J 7(4):654–660
Luo G, Sun XM, Xiang LY, Liu YL, Gan C (2008) Steganalysis on synonym substitution steganography. J Comput Res Dev (Chinese) 45(10):1696–1703
Meral HM, Sankur B, Özsoy AS, Güngör T, Sevinç E (2009) Natural language watermarking via morphosyntactic alterations. Comput Speech Lang 23(1):107–125
Muhammad HZ, Rahman SMSAA, Shakil A (2009) Synonym based Malay linguistic text steganography. In: 2009 Conference on Innovative Technologies in Intelligent Systems and Industrial Applications, pp 423–427
Shirali-Shahreza MH, Shirali-Shahreza M (2008) A new synonym text steganography. In: Proceedings of the 4th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp 1524–1526
Taskiran CM, Topkara U, Topkara M, Delp EJ (2006) Attacks on lexical natural language steganography systems. In: Proceedings of the SPIE, Security, Steganography and Watermarking of Multimedia Contents VIII, vol 6072, pp 97–105
Topkara U, Topkara M, Atallah MJ (2006) The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In: Proceedings of the 8th Workshop on Multimedia and security. ACM Press, pp 164–174
Wang Y, Moulin P (2007) Optimized feature extraction for learning-based image steganalysis. IEEE Trans Inf Forensics Secur 2(1):31–45
Winstein K (2010) Lexical steganography through adaptive modulation of the word choice hash. [Online]. Available: http://alumni.imsa.edu/~keithw/tlex/lsteg.ps
Winstein K (2010) Tyrannosaurus lex. [Online]. Available: http://alumni.imsa.edu/~keithw/tlex/
WordNet (2010) [Online]. Available: http://wordnet.princeton.edu/
Yang JL, Wang JM, Wang CK, Li DY (2007) A novel scheme for watermarking natural language text. In: Proceedings of the 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, vol. 2, pp. 481–484
Yu ZS, Huang LS, Chen ZL, Li LJ, Zhao XX, Zhu YW (2008) Detection of Synonym-Substitution Modified Articles Using Context Information. In: Proceedings of 2nd International Conference on Future Generation Communication and Networking, vol 1, pp 134–139
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (Nos. 60973128, 61073191, 61070196, 61070195, 61103215, 61173141, 61173142, and 61232016), National Basic Research Program 973 of China (Nos. 2010CB334706, and 2011CB311808), 2011GK2009, GYHY201206033, 201301030, 0S2013GR0445 and PAPD fund.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiang, L., Sun, X., Luo, G. et al. Linguistic steganalysis using the features derived from synonym frequency. Multimed Tools Appl 71, 1893–1911 (2014). https://doi.org/10.1007/s11042-012-1313-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1313-8