Skip to main content

Über dieses Buch

Software similarity and classification is an emerging topic with wide applications. It is applicable to the areas of malware detection, software theft detection, plagiarism detection, and software clone detection. Extracting program features, processing those features into suitable representations, and constructing distance metrics to define similarity and dissimilarity are the key methods to identify software variants, clones, derivatives, and classes of software. Software Similarity and Classification reviews the literature of those core concepts, in addition to relevant literature in each application and demonstrates that considering these applied problems as a similarity and classification problem enables techniques to be shared between areas. Additionally, the authors present in-depth case studies using the software similarity and classification techniques developed throughout the book.



Chapter 1. Introduction

This chapter introduces the major applications related to software similarity and classification. The applications include malware classification, software theft detection, plagiarism detection and code clone detection. The motivations for these applications are examined and an underlying theory is formalized. This theory is based on extracting signatures from programs, known as birthmarks, that are amenable to approximate matching that tells us how similar those programs are.
Silvio Cesare, Yang Xiang

Chapter 2. Taxonomy of Program Features

All programs have common features and abstractions which are used to create birthmarks. Features can be divided into syntactic and semantic groups. Syntactic features concern themselves with program structure and program form. Semantic features examine the meaning of the program. In this chapter we examine those syntactic and semantic features of programs. Syntactic Features include: (1) Raw Code, (2) Abstract Syntax Trees, (3) Variables, (4) Pointers, (5) Instructions, (6) Basic Blocks, (7) Procedures, (8) Control Flow Graphs, (9) Call Graphs, and (10) Object Inheritances and Dependencies. Semantic features include: (1) API Calls, (2) Data Flow, (3) Procedure Dependence Graphs, and (4) System Dependence Graphs.
Silvio Cesare, Yang Xiang

Chapter 3. Program Transformations and Obfuscations

Software feature extraction must cope with transformations that are intended to obscure, evolve, or rewrite the program. For example, malware polymorphism and metamorphism are transformations applied to the malicious code to evade signature detection. Robust signatures must identify the invariant birthmarks under these transformations. This chapter focuses on analysing these types of program transformations and obfuscations including compiler optimsations, recompilation, plagiarism, software theft, derivative works, malware packing, malware polymorphism and malware metamorphism.
Silvio Cesare, Yang Xiang

Chapter 4. Formal Methods of Program Analysis

Feature extraction is a necessary component to construct a birthmark, show similarity and classify a program as belonging to a particular class. Program analysis is an important component in feature extraction. The analysis reveals information on the syntax, semantics, and behaviour of the program being inspected. This chapter focuses on formal methods of program analysis which can be used for the purpose of property and feature extraction.
Silvio Cesare, Yang Xiang

Chapter 5. Static Analysis of Binaries

Static binary analysis is more difficult than if source code is available. In many cases, the analyses are unsound and behaviours are omitted to make problems feasible. Heuristics may be required to separate code and data in a disassembly or pointer behaviour may be weakly modelled to make statically analysing programs feasible. Nevertheless, static analysis of binaries is an important area of research with a number of practical applications including the detection of software theft and the classification and detection of malware. This chapter examines static analysis of binaries with the intent that properties and features of binary programs can be extracted to create useful birthmarks for software similarity and classification.
Silvio Cesare, Yang Xiang

Chapter 6. Dynamic Analysis

In the previous chapters we have examined static extraction of program features for the purpose of birthmark construction. Dynamic analysis is examined in this chapter. It is an alternative approach to static analysis that can be used for birthmark construction. Dynamic analysis concerns itself with analysing a running program. The program being run is typically isolated in an environment which allows its behaviour to be inspected. Typical behaviours that are extracted are the API call sequence. Instruction sequences, basic block sequences and control flow are amongst other behaviours that can also be identified.
Silvio Cesare, Yang Xiang

Chapter 7. Feature Extraction

In the previous chapters we have examined static and dynamic methods of program analysis. These features must be translated into mathematical representations and birthmarks to be useful. Furthermore, mathematical representations may be embedded in other mathematical types to make birthmarks more amenable to similarity comparisons and for use in classification algorithms. Another approach is to represent features using kernels. This allows for the use of classification algorithms including the support vector machine for complex data types. This chapter examines the mathematical representations that we use to describe program features.
Silvio Cesare, Yang Xiang

Chapter 8. Software Birthmark Similarity

Comparing birthmarks is necessary to identify similarities between software. If two birthmarks are similar, then the software is similar. Birthmarks may be compared to show similarity, or an alternative to showing similarity is to show dissimilarity or distance. Similarity measures and metrics exist for the different types of data such as strings, vectors, trees, graphs, etc. This chapter examines the different similarity measures and metrics for the different classes of birthmarks.
Silvio Cesare, Yang Xiang

Chapter 9. Software Similarity Searching and Classification

The ultimate problem of this book is to search for similar software to our query from a database and to classify a program as belonging to a particular class. This chapter examines how we transform the pair-wise similarity problem into a similarity search problem over a database. Moreover, we examine statistical classification of birthmarks to identify the class of software it belongs to.
Silvio Cesare, Yang Xiang

Chapter 10. Applications

This chapter surveys the application specific literature in software similarity and classification. It examines malware classification, software theft detection, plagiarism detection and code clone detection. We group the literature based on the class of program feature that is used to construct birthmarks. Finally, we critically analyse the approaches used.
Silvio Cesare, Yang Xiang

Chapter 11. Future Trends and Conclusion

This chapter looks at future trends in software similarity and classification research and engineering. We look at the technology becoming unified and its applications in cloud services and mobile platforms. Finally, we conclude the book with some final thoughts.
Silvio Cesare, Yang Xiang
Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!