Skip to main content
Top

2014 | OriginalPaper | Chapter

Semantic Features from Web-Traffic Streams

Author : Steve Hutchinson

Published in: Network Science and Cybersecurity

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We describe a method to convert web-traffic textual streams into a set of documents in a corpus to allow use of established linguistic tools for the study of semantics, topic evolution, and token-combination signatures. A novel web-document corpus is also described which represents semantic features from each batch for subsequent analysis. A (American-English) lexicon is used to create a canonical representation of each corpus whereby there is a consistent mapping of each TermID to the corresponding lexicon-word or token. Finally, representation of a corpus member as a ‘document’ is accomplished by combining the (http) request string with the concatenation of all responses to it. This representation thus allows association of the request string tokens with the resulting content, for consumption by document classification and comparison algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference C. Wang, D. Blei, D. Heckerman, Continuous Time Dynamic Topic Models (Princeton University, Princeton, 2008) C. Wang, D. Blei, D. Heckerman, Continuous Time Dynamic Topic Models (Princeton University, Princeton, 2008)
2.
go back to reference M. Hearst, Multi-Paragraph Segmentation of Expository Text (Computer Science Division, UC Berkeley, Berkeley, 1994) M. Hearst, Multi-Paragraph Segmentation of Expository Text (Computer Science Division, UC Berkeley, Berkeley, 1994)
4.
go back to reference R. Kern, M. Granitzer, Efficient linear text segmentation based on information retrieval techniques. MEDES 2009, Lyon, France, pp. 167–171, 2009 R. Kern, M. Granitzer, Efficient linear text segmentation based on information retrieval techniques. MEDES 2009, Lyon, France, pp. 167–171, 2009
5.
6.
go back to reference R. Futrelle, A. Grimes, M. Shao, Extracting structure from HTML documents for language visualization and analysis. Biological Knowledge Laboratory, College of Computer and Information Science, Northeastern University, in ICDAR (Intl. Conf. Document Analysis and Recognition), Edinburgh, 2003 R. Futrelle, A. Grimes, M. Shao, Extracting structure from HTML documents for language visualization and analysis. Biological Knowledge Laboratory, College of Computer and Information Science, Northeastern University, in ICDAR (Intl. Conf. Document Analysis and Recognition), Edinburgh, 2003
7.
go back to reference P. Wittek, S. Daranyi, Spectral composition of semantic spaces, in Proceedings of QI-11, 5th International Quantum Interaction Symposium, Aberdeen, UK, 2011 P. Wittek, S. Daranyi, Spectral composition of semantic spaces, in Proceedings of QI-11, 5th International Quantum Interaction Symposium, Aberdeen, UK, 2011
9.
go back to reference G. Stumme, A. Hotho, B. Berendt, Semantic Web Mining State of the Art and Future Directions (University of Kassel, Kassel, 2004) G. Stumme, A. Hotho, B. Berendt, Semantic Web Mining State of the Art and Future Directions (University of Kassel, Kassel, 2004)
10.
go back to reference J. Williams, S. Herrero, C. Leonardi, S. Chan, A. Sanchez, Z. Aung, Large in-memory cyber-physical security-related analytics via scalable coherent shared memory architectures. 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), 2011 J. Williams, S. Herrero, C. Leonardi, S. Chan, A. Sanchez, Z. Aung, Large in-memory cyber-physical security-related analytics via scalable coherent shared memory architectures. 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), 2011
11.
go back to reference P. Wittek, S. Daranyi, Connecting the dots: mass, energy, word meaning, and particle-wave duality, in QI-12, 6th International Quantum Interaction Symposium, Paris, France, 2012 P. Wittek, S. Daranyi, Connecting the dots: mass, energy, word meaning, and particle-wave duality, in QI-12, 6th International Quantum Interaction Symposium, Paris, France, 2012
Metadata
Title
Semantic Features from Web-Traffic Streams
Author
Steve Hutchinson
Copyright Year
2014
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-7597-2_14

Premium Partner