Index of /rozprawy2/10992

Pełen tekst

(1)AGH University of Science and Technology in Krakow Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering D EPARTMENT OF A PPLIED C OMPUTER S CIENCE. P H D T HESIS. ´ P RZEMYSŁAW B EREZI NSKI , M.S C . E NG .. E NTROPY- BASED N ETWORK A NOMALY D ETECTION. S UPERVISOR : Marcin Szpyrka, Ph.D., D.Sc. AUXILIARY SUPERVISOR : Bartosz Jasiul, Ph.D., Lt. Col.. Krakow 2015.

(2) Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie Wydział Elektrotechniki, Automatyki, Informatyki i In˙zynierii Biomedycznej K ATEDRA I NFORMATYKI S TOSOWANEJ. ROZPRAWA DOKTORSKA. MGR IN Z˙ .. ´ P RZEMYSŁAW B EREZI NSKI. D ETEKCJA ANOMALII W RUCHU SIECIOWYM Z WYKORZYSTANIEM MIAR ENTROPIJNYCH. P ROMOTOR : dr hab. Marcin Szpyrka, prof. AGH P ROMOTOR POMOCNICZY: ppłk dr inz˙ . Bartosz Jasiul. Kraków 2015.

(3) Working on the Ph.D. has been a wonderful but sometimes overwhelming experience. I would like to express my sincere gratitude to all those who provided me the possibility to complete this Thesis. First and foremost, I would like to thank my supervisors prof. Marcin Szpyrka and dr Bartosz Jasiul for enabling and supporting preparation of this Dissertation and for ensuring the freedom of work. Their guidance helped me in all the time of research and writing of this Thesis. ´ Besides my supervisors, I would like to thank dr Joanna Sliwa and dr Rafał Piotrowski for the opportunity to work in many interesting cyber security projects. A special thanks goes to my labmates: dr Marek Małowidzki, Tomasz Dalecki, Michał Mazur and Robert Goniacz for their contribution to the software implemented during this research and inspiring discussions regarding not only cyber security. Last but not least, I would like to thank my family, my wife Marzena and my sons for all their love and encouragement. My sincere thanks also goes to my Mother for motivating me throughout my life. Przemysław Bereziński.

(4) Abstract. This Dissertation focuses on application of anomaly detection in the field of network intrusion detection. This is a very important issue as the number of cyber-attacks is alarmingly high and to make things worse it increases each year. Partially, this is due to the fact that widely used security solutions are ineffective against modern malicious software (malware). Damage from a malware, especially this which acts in botnets, can take many serious forms including loss of important data, reputation or money. Typically, botnet is a group of infected hosts (bots) operated by cybercriminals who are focused on making money. Recently, botnets are also used in a cyber warfare to conduct sabotage and espionage. Network anomaly detection is a very broad and heavily explored area. The first methods were proposed almost 40 years ago but the problem of finding a generic network anomaly detection method still remains unsolved. Dedicated methods for different types of network anomalies caused by malware can be found in the literature. Recently entropy-based methods for detection of various types of anomalies have gained a lot of attention. The use of entropy to detect botnet-like malware has not been investigated so far. The main goal of this Dissertation is to prove that entropy-based approach is suitable for detection of modern botnet-like malware in local networks and thus it can be used to complement existing signature-based solutions. In order to reach this goal and prove the claim of the Thesis, the Dissertation makes several original contributions. Comparison of different entropy measures to use in network anomaly detection is provided. Original network anomaly detection method based on parameterized entropies and supervised machine learning is proposed, implemented and verified with the representative semi-synthetic dataset prepared for this purpose due to the lack of realistic, complete and up-to-date datasets available. Moreover, analysis of proper parameters, suitable network features and right classifier to use with the method is conducted. Results of the verification shows that the proposed method with parameterized Renyi or Tsallis entropy acting together with classifier based on logistic regression allows to detect botnet-like malware with satisfactory level of detection rate while keeping low rate of false alarms. Comparable detection based on Shannon entropy or volume counters (number of flows, packets and bytes) turns out to be ineffective.. 4.

(5) Streszczenie. Przedstawiona rozprawa doktorska dotyczy detekcji anomalii w obszarze wykrywania włamań sieciowych. Tematyka ta jest bardzo waz˙ na, gdyz˙ liczba przeprowadzanych ataków cybernetycznych jest alarmujaco ˛ wysoka i co gorsza ro´snie z roku na rok. Jest to cz˛esćiowo spowodowane tym, z˙ e powszechnie stosowane rozwiazania ˛ ochrony cybernetycznej sa˛ nieskuteczne w detekcji aktualnego zło´sliwego oprogramowania. Szkody powodowane przez takie oprogramowanie, szczególnie to działajace ˛ w ramach botnetów, obejmuja˛ utrat˛e danych, reputacji czy pieni˛edzy. Typowo, botnet to grupa zainfekowanych hostów (botów) sterowanych przez przest˛epców cybernetycznych w celu uzyskania korzy´sci finansowych. Obecnie botnety sa˛ takz˙ e wykorzystywane w wojnie cybernetycznej do sabotaz˙ owania czy tez˙ szpiegostwa. Detekcja anomalii sieciowych to temat szeroki i mocno eksplorowany. Pierwsze metody pojawiły si˛e prawie 40 lat temu, ale problem znalezienia metod generycznych nie został do tej pory rozwiazany. ˛ Istnieja˛ metody dedykowane do okre´slonych typów anomalii zwiazanych ˛ ze zło´sliwym oprogramowaniem w tym metody bazujace ˛ na miarach entropijnych, które ostatnio ciesza˛ si˛e duz˙ a˛ popularno´scia.˛ Nikt do tej pory nie zastosował ich jednak do detekcji zło´sliwego oprogramowania typu botnet. Głównym celem niniejszej rozprawy jest dowiedzenie, z˙ e wykorzystanie miar entropijnych pozwala na detekcj˛e zło´sliwego oprogramowania typu botnet w sieciach lokalnych i podej´scie to moz˙ e być stosowane jako uzupełnienie obecnie wykorzystywanych metod bazujacych ˛ na sygnaturach. W celu potwierdzenia postawionej tezy w rozprawie przedstawiono oryginalny wkład w obecny stan wiedzy. Porównano kilka miar entropijnych pod katem ˛ ich zastosowania w detekcji anomalii sieciowych. Zaproponowano, zaimplementowano i zweryfikowano autorska˛ metod˛e bazujac ˛ a˛ na parametryzowanych entropiach i nadzorowanym uczeniu maszynowym. Weryfikacj˛e wykonano na podstawie własnego, reprezentatywnego zbioru danych, jako z˙ e dost˛epne zbiory okazały si˛e nierealistyczne, niekompletne i przestarzałe. Dodatkowo, dokonano analiz pod katem ˛ wła´sciwych warto´sci parametrów, stosownych cech ruchu sieciowego i odpowiedniego klasyfikatora dla zaproponowanej metody. Badania skuteczno´sci wykazały, z˙ e metoda wykorzystujaca ˛ parametryzowana entropie Renyiego lub Tsallisa wraz z klasyfikatorem bazujacym ˛ na regresji logicznej pozwala na skuteczne wykrywanie anomalii zwiazanych ˛ ze złosliwym oprogramowaniem typu botnet przy jednoczesnym zachowaniu niskiego poziomu fałszywych alarmów. Odpowiadajace ˛ detekcja bazujac ˛ a˛ na entropii Shannona lub podej´sciu wolumenowych bazujacym ˛ na prostych licznikach takich jak liczba przepływów, pakietów i bajtów okazuje si˛e nieskuteczna.. 5.

(6) Contents. Abstract ............................................................................................................................................. 4. Streszczenie....................................................................................................................................... 5. 1. Introduction..................................................................................................................................... 9. 1.1.. Motivation, Scope and Research Problem.............................................................................. 9. 1.2.. Goal and Plan of the Work .................................................................................................... 10. 1.3.. Original contribution ............................................................................................................. 11. 1.4.. Exclusions.............................................................................................................................. 12. 2. Related work................................................................................................................................... 13 2.1.. General overview of network anomaly techniques................................................................ 13. 2.2.. Closely related work.............................................................................................................. 14 2.2.1. Detection via network volume counters..................................................................... 15 2.2.2. Detection via network feature distributions ............................................................... 16. 2.3.. Existing Datasets ................................................................................................................... 18. 2.4.. Summary................................................................................................................................ 20. 3. Entropy-based network anomaly detector – preface.................................................................. 21 3.1.. Main features ......................................................................................................................... 22. 3.2.. Classification of the approach ............................................................................................... 22. 4. Entropy ........................................................................................................................................... 24 4.1.. Shannon entropy .................................................................................................................... 24. 4.2.. Parameterized entropy ........................................................................................................... 25. 4.3.. Comparison............................................................................................................................ 27 4.3.1. Binominal distribution ............................................................................................... 27 4.3.2. Uniform distribution................................................................................................... 29 4.3.3. Impact of frequent and rare events............................................................................. 29 4.3.4. Entropy of exemplary distributions............................................................................ 30. 5. Network flows ................................................................................................................................. 38 5.1.. Flows vs. packets ................................................................................................................... 38. 5.2.. Flow export............................................................................................................................ 39 6.

(7) CONTENTS. 7. 5.2.1. Operating principle .................................................................................................... 39 5.2.2. Problems and difficulties............................................................................................ 41 5.3.. NetFlow export setup............................................................................................................. 42. 6. Entropy-based network anomaly detector .................................................................................. 44 6.1.. Architecture ........................................................................................................................... 44. 6.2.. Implementation...................................................................................................................... 46. 7. Dataset............................................................................................................................................. 50 7.1.. Origin of the idea................................................................................................................... 50. 7.2.. Legitimate traffic ................................................................................................................... 50. 7.3.. Scenario 1 .............................................................................................................................. 53. 7.4.. Scenario 2 .............................................................................................................................. 54. 7.5.. Scenario 3 .............................................................................................................................. 57. 7.6.. Anomaly generator ................................................................................................................ 60. 8. Verification of the approach.......................................................................................................... 65 8.1.. Correlation ............................................................................................................................. 65. 8.2.. Performance evaluation ......................................................................................................... 66. 8.3.. Conclusions ........................................................................................................................... 76. 9. Conclusions and further work ...................................................................................................... 80 9.1.. Conclusions ........................................................................................................................... 80. 9.2.. Further work .......................................................................................................................... 82 9.2.1. On-line analysis in a real environment....................................................................... 82 9.2.2. Multi-classifier ........................................................................................................... 82 9.2.3. Multi-label approach .................................................................................................. 82 9.2.4. Dataset........................................................................................................................ 82. 9.3.. Publications ........................................................................................................................... 83. P. Bereziński Entropy-based Network Anomaly Detection.

(8) List of Abbreviations ACC – Accuracy AUC – Area Under a Curve BDR – Bayesian Detection Rate CEP – Complex Event Processing CybOX – Cyber Observable Expression DDoS – Distributed Denial of Service DNS – Domain Name System DoS – Denial of Service FDR – False Discovery Rate FN – False Negative FNR – False Negative Rate FP – False Positive FPR – False Positive Rate HIDS – Host-based Instrusion Detection System ICMP – Internet Control Message Protocol IDS – Intrusion Detection System IP – Internet Protocol IPFIX – IP Flow Information Export IRC – Internet Relay Chat NIDS – Network-based Intrusion Detection System NPV – Negative Predictive Value NTP – Network Time Protocol P2P – Peer-to-Peer PCA – Principal Component Analysis PPV – Positive Predictive Value PR – Precission Recall RDP – Remote Desktop Protocol ROC – Receiver Operating Characteristic RPC – Remote Procedure Call SNMP – Simple Network Management Protocol SQL – Structured Query Language STIX – Structured Threat Information Expression TCP – Transport Control Protocol TN – True Negative TNR – True Negative Rate TP – True Positive TPR – True Positive Rate UDP – User Datagram Protocol.

(9) 1. Introduction. This chapter introduces the reader to the subject of the Thesis. It is divided into four sections. Section 1.1 presents motivation, scope and briefly describes the research problem. It shows why it is an important issue in the field of Computer Science. Section 1.2 specifies the main goal of the research and presents the steps that were made in order to reach it. It familiarizes the reader with the outline of this Dissertation and presents contents of subsequent chapters. Section 1.3 emphasizes those results of the Thesis that are considered as the original contribution. Section 1.4 discusses issues that are deliberately not addressed in this research.. 1.1. Motivation, Scope and Research Problem Data mining is an interdisciplinary subfield of Computer Science involving methods at the intersection of artificial intelligence, machine learning and statistics [HTF09]. One of the data mining task is anomaly detection which is the analysis of large quantities of data to identify items, events or observations which do not conform to an expected pattern. Anomaly detection is applicable in a variety of domains, e.g. fraud detection [PLSG10], fault detection [Nai09], system health monitoring [MSOS07] but this Dissertation focuses on application of anomaly detection in the field of network intrusion detection. The first anomaly detection method for intrusion detection was proposed almost 40 years ago by Denning [Den87]. Today network anomaly detection is a very broad and heavily explored subject but the problem of finding a generic method for a wide range of network anomalies is still unsolved. There are some problems with anomaly detectors which have to be addressed. The main challenges are: high false alarm rates, long computation time, tuning and calibration and root-cause identification [Bra10]. Because of that anomaly detection techniques are rarely implemented in commercial Intrusion Detection Systems (IDS). Such systems mostly make use of the common signature-based (or misuse-based) technique. This approach is known of its shortcomings [LDZ05], [CLLL12], [GOB11], [JSl14a], [JSl14b]. Signatures describe only illegal patterns in network traffic, so a prior knowledge is required [LDZ05]. Signature-based solutions do not cope with evasion techniques and attacks yet unknown (0-days) [CLLL12], [JSl14a], [JSl14b]. Moreover, they are unable to detect a specific attack until a rule for the corresponding vulnerability is created, tested, released and deployed, which usually takes some time [GOB11]. As the widely used intrusion detection systems are often ineffective against a modern malicious software (malware), a proper network anomaly detection as one of the possible solutions to complement signature-based approach is 9.

(10) 1.2. Goal and Plan of the Work. 10. so essential. Recently, entropy-based methods which rely on network feature distributions have been of great interest [Eim08], [WP05], [NSA+ 08], [Tel12], [YKW11], [KBHJ08]. It is crucial to check if with entropy-based approach it is possible to successfully detect anomalous network activity caused by modern botnet-like malware [HP14]. This is a really important issue, as the number of such malware as well as the level of its sophistication increases each year [Sop14]. Botnet is a group of infected hosts (bots) controlled by Command and Control (C&C) servers operated by cyber-criminals and according to recent reports provided by cyber security organizations [Ver14], [Sym14], [Cer13], [Sop14] they are one of the most sophisticated and popular types of cybercrime today. Damage from such a malware can take many serious forms including loss of important data, reputation or money. Moreover, nowadays botnets are also used in a cyber warfare to conduct sabotage and espionage [SK14]. Entropy-based approach to detect anomalies caused by botnet-like malware in local networks is a not investigated area. Some entropybased methods proposed in the past, e.g. [TBSM09], [YKW11], [NSA+ 08] deal with massive spreads of rather old not botnet-like worms and different types of Distributed Denial of Service (DDoS) attacks in high-speed backbone networks controled by Internet Service Providers (ISP). In the work presented in this Dissertation we have tried to find the best way of using entropy in order to properly detect and categorize network anomalies which indicate existence of a botnet-like malware in local networks. This type of anomalies is often very small and hidden in a network traffic volume expressed by the number of flows, packets or bytes, so their detection via popular solutions and methods which rely mostly on a traffic volume changes, e.g. [NfS], [BKPR02], [MSHJSC+ 04], [Nto] is highly difficult.. 1.2. Goal and Plan of the Work The main goal of this Dissertation is to prove that: Entropy-based approach is suitable for detection of modern botnet-like malware in local networks based on network anomalies characteristic for such a malware. We will try to find the answer for the following questions: – Are entropy measures useful in the context of network anomaly detection? – Is it possible to effectively detect and classify small and low-rate anomalies connected with botnetlike malware activity in local networks by means of entropy? – Is entropy-based approach better than traditional volume-based approach? – Do parameterized entropies help to improve results obtained for Shannon entropy? – What is the proper set of parameters for entropies to successfully detect network anomalies? – Which network features should be taken into consideration in order to detect broad spectrum of anomalies connected with botnet-like malware? – Which popular classifiers work fine with entropy-based approach? It is assumed that the goal of this work can be reached in the following steps: 1. Preparation of a concept of original entropy-based network anomaly detection method. P. Bereziński Entropy-based Network Anomaly Detection.

(11) 1.3. Original contribution. 11. 2. Implementation of the method. 3. Preparation of original dataset (due to the lack of appropriate benchmarking data available). 4. Evaluation of the method. These steps are discussed in detail in the further part of the Thesis that is organized as follows: – Chapter 2 reviews related work in the area of network anomaly detection. General overview of the latest advances in this broad subject as well as a detailed review of anomaly detection techniques that are closely related to the approach proposed in this Dissertation are presented. Additionally, some comments on existing datasets for evaluating network anomaly detection systems are included. – Chapter 3 provides a brief overview of the approach taken to prove the Thesis and it introduces the reader to the proposed method. The main features as well as a general classification of the method are presented. – Chapter 4 introduces the definition of Shannon entropy and describes Renyi and Tsallis generalizations. Brief overview as well as comparison of entropy measures based on simulations is provided. – Chapter 5 describes the concept of network flows and provides comparison of this technique with widely used packet-based approach. Additionally, the NetFlow [Cla04] export setup prepared to interact with the proposed method is presented. – Chapter 6 presents the architecture of the proposed method. Detailed specification as well as results of implementation are given. – Chapter 7 refers to the dataset developed to evaluate performance of the proposed method. – Chapter 8 presents results of verification of the method. – Chapter 9 finishes this Dissertation providing conclusions and a short summary. It also outlines future work.. 1.3. Original contribution The approach proposed in this Dissertation is superior to state of the art in several aspects. The following issues are considered to be original contribution of the Thesis: 1. The use of entropy-based approach to detect botnet-like malware in local networks. 2. Concept and implementation of an original entropy-based network anomaly detection method. 3. Comparison of different entropy measures to use in entropy-based network anomaly detection. 4. Selection of a proper set of α-values for parameterized entropies and proper set of network features to successfully detect various network anomalies. P. Bereziński Entropy-based Network Anomaly Detection.

(12) 1.4. Exclusions. 12. 5. Comparison of performance of different classifiers to work with the proposed method. 6. Comparison of entropy measures with volume-based counters to use in network anomaly detection. 7. Preparation of the original dataset which includes anomalies specific for network activity of modern botnet-like malware. 8. Detailed performance evaluation of the method by means of both standard and novel (introduced for the purpose of this Thesis) metrics.. 1.4. Exclusions Network anomaly detection is a broad topic. Some of the issues that are deliberately not addressed in this Thesis are presented below. 1. This Thesis does not cover the aspects of detecting anomalies or attacks visible in IP packets and their payloads. This is mainly due to the fact that such anomalies are easly detectable with signature-based approach until the attack is not known or network traffic is not encrypted. 2. There is no empirical evaluation of the proposed method working on-line in real environment since it is planned for a future work. 3. There is no comparison of the method with other summarization techniques such as histograms or sketches in this Thesis. The main reason is lack of publicly available implementations of these methods. Moreover, such a comparison would be difficult and results could be inaccurate since the performance of these methods strongly depends on a proper tuning. 4. There is no evaluation of the proposed method with publicly available dataset as during preparing this Thesis none of them met all necessary requirements such as completeness, timeliness and correctness. This Thesis has been partially supported by the Polish National Centre for Research and Development, under the project no. PBS1/A3/14/2012 SECOR and the project no. 01.01.02-00-062/09 CybSecLab and by the European Regional Development Fund the Innovative Economy Operational Programme, under the project no. 01.01.02-00-062/09 INSIGMA.. P. Bereziński Entropy-based Network Anomaly Detection.

(13) 2. Related work. This chapter reviews related work in the area of network anomaly detection. The chapter starts with a general overview of the latest advances in this broad subject. Then, more details on anomaly detection techniques that are closely related to the approach proposed in this Dissertation are presented and comments are provided. Finally, some remarks on existing datasets for evaluating network anomaly detection systems are given.. 2.1. General overview of network anomaly techniques The. problem. of. anomaly. detection. in. network. traffic. has. been. extensively. stud-. ied. There are many surveys, review articles, as well as books on this broad subject. A great number of research on anomaly detection techniques is found in several books, e.g. [WFH11], [BK13], [Agg13], [HTF09]. In surveys such as [CBK09], [HA04], authors discuss anomaly detection in general and cover the network intrusion detection domain only briefly. In several review papers [ETGTDV04], [PP07], [Cal09], [CKS+ 09], [GTDVMFV09] various network anomaly detection methods have been summarized. Recent, well-structured and comprehensive survey on anomaly-based network intrusion detection in terms of general overview, techniques, systems, tools and datasets with a discussion of challenges and recommendations is presented by Bhuyan et al. [BBK13]. The review of network intrusion detection by Sperotto et al. [SSS+ 10] where valuable comparison of packet-based and flow-based approach is provided is another paper worth mentioning. From the aforementioned surveys it follows that the most effective methods of network anomaly detection include Principle Component Analysis, Wavelets, Markovian models, Clustering, Histograms, Sketches, and Entropies. To familiarize the reader with these techniques and to facilitate understanding of Section 2.2 a short description of each of them is presented below. Principle Component Analysis (PCA) is a popular dimension reduction technique in machine learning [HNG+ 07], [SCSC03], [LYW13]. PCA transforms a set of correlated random variables to a new coordinate system that is given by the principal components. Simply speaking, PCA is a technique where a set of correlated random variables is transformed into smaller set of uncorrelated ones. The uncorrelated variables are linear combinations of the original ones and can be used to express the data in a reduced form. Wavelet. transformation. is. one. of. the. techniques. of. time-frequency. transforma-. tions [LG09], [LTG08], [LWK10]. It is used for analyzing localized variations of power within 13.

(14) 2.2. Closely related work. 14. a timeseries. By decomposing a timeseries into time–frequency space, one is able to find the dominant modes of variability and determine how those modes vary in time. There are some important differences between well-known Fourier analysis [YZX+ 04] and wavelets. Fourier functions are localized in frequency but not in time. Small frequency changes in Fourier transform will produce changes everywhere in the time domain. Wavelets are local in both frequency and time. This localization is an advantage in many cases. Markov models are very useful for modeling sequences [YZB04], [SZH+ 13]. For a given system, a Markov model consists of a list of possible states, possible transition paths between those states and rate parameters of those transitions. The simplest Markov model is a Markov chain. It models the state of a system with a random variable that changes through time. The distribution for this variable depends only on the distribution of the previous state. A hidden Markov model [JP05] is a Markov chain for which the state is only partially observable. In other words, observations are related to the state of the system, but they are typically insufficient to precisely determine the state. Cluster analysis (or clustering) is a technique used to group objects of a similar kind into respective categories [SPBW12], [REHA13], [BSS+ 14]. This technique is based on unlabeled data. In machine learning, methods that use labeled samples are said to be supervised and methods which rely on unlabeled samples are said to be unsupervised [Alp10]. Clustering can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how clusters are identified. Usually, clustering-based techniques require distance computation between a pair of objects. Histograms, sketches and entropy-based approaches are methods that summarize random variable distributions, e.g. distribution of addresses or ports in the domain of network anomaly detection. Histogram-based methods divide the entire range of values of distributions into a series of small intervals called bins [KSD09], [SST+ 04]. Sketch-based approach relies on a set of histograms where the elements are assigned to bins using a set of different hash-functions [SLBK08], [BDWS09]. Entropy is a measure of the uncertainty connected with a random variable [Sha48]. In general the more random the variable the higher the entropy. Entropy summarizes a probability distribution with a single value, which can be conveniently used to compare certain qualitative differences of probability distributions. Entropy fits well to network anomaly detection, because some attacks or anomalies result in concentrating or dispersing probability distributions of network features [NSA+ 08], [TBSM09].. 2.2. Closely related work In this section a closer look at works strictly related to approach proposed in this Dissertation is taken. The review of detection methods based on summarizing network feature distributions via entropy, histograms and sketches is provided. Special attention is devoted to the methods employing different forms of entropy. Some comments related to noticed gaps are given. The section starts with the comparison of the network feature distributions approach to the older but still more popular detection via network volume counters. P. Bereziński Entropy-based Network Anomaly Detection.

(15) 2.2. Closely related work. 15. 2.2.1. Detection via network volume counters In the past, network anomalies were treated as deviations in the traffic volume. Simple counters such as number of flows, packets (total, forwarded, fragmented, discarded) and bytes (per packet, per second) were used. These counters can be derived from network devices via Simple Network Management Protocol (SNMP) [HPW02] or NetFlow [Cla04], [SBCQ09]. Barford et al. [BKPR02] presented wavelet analysis to distinguish between predictable and anomalous traffic volume changes using a very basic set of counters from NetFlow and SNMP data. They used the advanced signal analysis technique combined with very simple metrics, i.e. number of flows, packets and bytes. The authors reported some positive results in detection of high-volume anomalies such as network failure, bandwidth flood and flash crowd. Kim et al. [MSHJSC+ 04] proposed a method where many different Distributed Denial of Service (DDoS) attacks are described in terms of traffic patterns in a flow characteristics. In particular, the authors focused on counters like: number of flows, packets, bytes, the flow and packet sizes, average flow size and number of packets per flow. In a presented TCP SYN flood example, the following pattern has been applied: a large number of flows, yet small number of small packets and no constraints on the bandwidth and the total amount of packets. This pattern differs significantly from the one generated for an ICMP/UDP flooding attack, where high bandwidth consumption and a large number of packets is involved. Although the authors reported some good results, they also mentioned that common legitimate peer-to-peer (P2P) traffic may result in some false alarms in their approach. A threshold-based detector measuring the deviation from a mean value present in a traffic collection algorithm for frequent collection of SNMP data was proposed by Lee et al. [LPKL09]. To assess the algorithm, the authors examined how it impacts detection of volume anomalies. Only some minor differences were reported in comparison to the original traffic collection algorithm. Casas et al. [CFVN09] introduced an anomaly detection algorithm based on SNMP data which deals with abrupt and large traffic changes. The authors proposed a novel linear parsimonious model for anomaly-free network flows. This model makes it possible to treat the legitimate traffic as a nuisance parameter, to remove it from the detection problem and to detect the anomalies in the residuals. Authors reported that with this approach they slightly improved the previously introduced approach based on PCA in terms of false alarms. Many commercial and open source solutions that rely on SNMP or NetFlow counters are available on the market, e.g. NFSen [NfS], NtopNg [Nto], Plixer Scrutinizer [Scr], Peassler PRTG [Prt], and Solarwinds Network Traffic Analyzer [Sol]. All of them provide more or less the same functionality: – browsing and filtering network data; – statistics overview, e.g. top-talkers, i.e. hosts or services that exchanged most traffic; – reporting, e.g. bandwidth reports, i.e. which user exchanged how much traffic; – alerting when traffic thresholds are exceeded or some rules describing anomalous behavior are matched. P. Bereziński Entropy-based Network Anomaly Detection.

(16) 2.2. Closely related work. 16. Several solutions available on the market, e.g. Invea-Tech FlowMon [Floc] or AKMA Labs FlowMatrix [Floa] offer some anomaly detection methods which mostly rely on predefined set of rules for detection of undesirable behavior patterns, and some simple long-term network behavior profiles in terms of services, traffic volume and communication sides. Although vendors classify their solutions as anomaly detection, usage of rule-based heuristic describing well known patterns corresponds more to the signature-based approach. Concluding this subsection, we noticed that although there are many methods that rely on counters, their capabilities are limited. The main problem with a counter-based approach it mostly rely on traffic volume . Nowadays, many network attacks or anomalies such as low-rate DDoS, stealth scanning or botnet-like worm propagation and communication do not result in substantial traffic volume change. The presented counter-based methods handle well large and abrupt traffic changes such as bandwidth flooding attacks or flash crowds, but a large group of anomalies which do not cause changes of volume remains undetected. Moreover, there is also a practical issue connected with counters reported by Brauckhoff et al. [BTW+ 06] who stated that packets sampling used by many routers to save resources when collecting data can influence a counter-based anomaly detection metrics, but does not significantly affect the distribution of network features.. 2.2.2. Detection via network feature distributions Network anomaly detection via network feature distributions is becoming more and more popular. Several feature distributions, i.e. header-based (addresses, ports, flags), volume-based (host or service specific percentage of flows, packets and bytes) and behavior-based (in/out connections for particular host) have been suggested in the past [LCD05], [NSA+ 08], [TBSM09]. However, it is unclear which network feature distributions perform best. Nychis in [NSA+ 08], based on his results of pairwise correlation, reported dependencies between addresses and ports and recommended the use of volume-based and behavior-based feature distributions. In contrast, Tellenbach in [TBSM09] found no correlation among header-based features. In this Dissertation, an original results of network features correlation are presented and some interesting conclusions are given. Shannon Entropy Entropy as the measure of uncertainty can be used to summarize feature distributions in a compact form, i.e. single number. Many forms of entropy exist, but only a few have been applied to network anomaly detection. The most popular is the well-known Shannon [Sha48] entropy. Application of Shannon measures such as relative entropy and conditional entropy to conduct network anomaly detection were proposed by Lee and Xiang [LX01]. Also, Lakhina et al. [LCD05] made use of Shannon entropy to sum up feature distributions of network flows. By using unsupervised learning, the authors showed that anomalies can be successfully clustered. Wagner and Plattner [WP05] made use of the Kolmogorov Complexity, which is related to Shannon entropy [GV03], [TMSA11], in order to detect worms in network traffic. Their work mostly focuses on implementation aspects and scalability and does not propose any specific analysis techniques. The authors reported that the method is able to detect worm outbreaks and massive scanning activities in a near real time. Ranjan et al. [RSN+ 07] suggested another worm deP. Bereziński Entropy-based Network Anomaly Detection.

(17) 2.2. Closely related work. 17. tection algorithm which measures Shannon entropy ratios for traffic feature pairs and issues an alarm on sudden changes. Gu et al. [GMT05] made use of Shannon maximum entropy estimation to estimate the network baseline distribution and to give a multi-dimensional view of network traffic. The authors claim that with their approach they were able to distinguish anomalies that change the traffic either abruptly or slowly. Iglesias et al. [IZ14] proposed a fast, lightweight method to distinguish different attack types observed in the IP darkspace monitor. The method is based on Shannon entropy measures of network features and machine learning techniques. The explored data belongs to a portion of the Internet background radiation from a large IP darkspace. Generalized entropy Besides Shannon entropy, several generalizations of entropy have been recently introduced in the context of network anomaly detection. Einman in [SEB07], [ESB05], [Eim08] reported some positive results of using T-entropy [TNS+ 05] for intrusion detection based on analysis of packets. T-entropy can be estimated from a string complexity measure called T-complexity [TNS+ 05]. String complexity is a minimum number of steps required to construct a given string. In contrast to entropy, where probabilities (estimated from frequencies) can be permuted, in a complexity-based approach, the order matters. A string is compressed with an algorithm and the output length is used to estimate the complexity. Finally, the complexity becomes an estimate for the entropy. Because in this approach sequence of events is crucial, it fits to the fine-grinded methods of network data analysis such as full packet or packet header inspection. The problem is, that this type of inspection is not scalable in the context of network speed. Some details about T-entropy are presented in our paper [PBPC12]. A parameterized generalization of entropy has also been recently reported as very promising. The Shannon entropy assumes a tradeoff between contributions from the main mass of the distribution and the tail. With the parameterized Tsallis [Tsa88] or Renyi [Ren70] entropy, one can control this tradeoff. In general, if the parameter denoted as α has a positive value, it exposes the main mass, if the value is negative – it refers to the tail. Ziviani et al. [ZGMR07] investigated Tsallis entropy in the context of the best value of α parameter for DoS attacks detection. They found that α-value around 0.9 is the best for detecting such attacks. Shafiq et al. [SKF08] did the same for port scan anomalies caused by malware. He reported that α-value around 0.5 is the best choice to detect scan anomalies. A comparative study of the use of the Shannon, Renyi and Tsallis entropy for attribute selecting to obtain an optimal attribute subset, which increases the detection capability of decision tree and k-means classifiers was presented by Lima et al. [LAS12]. The experimental results demonstrate that the performance of the models built with smaller subsets of attributes is comparable and sometimes better than that associated with the complete set of attributes for DoS and scan attack categories. The authors found, that for the DoS category, Renyi entropy with α-value around 0.5 and Tsallis entropy with α-value around 1.2 are the best for decision tree classifier. We believe that, the proper choice of the α-value depends either on the anomaly or the legitimate traffic used as a baseline, or for both, since none of the authors mentioned above reported similar results. Thus, goals such as finding the proper value of parameter for entropy in order to improve detection of particular group of anomalies will remain unachieved. Some authors, e.g. Tellenbach et al. [TBSM09], [TBS+ 11], [Tel12] employed a set of α-values in their methods. The authors proposed the Traffic Entropy Telescope prototype based on Tsallis entropy capable to detect a broad spectrum of anomalies in a backbone traffic P. Bereziński Entropy-based Network Anomaly Detection.

(18) 2.3. Existing Datasets. 18. including fast-spreading worms (not so common nowadays), scans and different form of DoS/DDoS attacks. Although Tsallis entropy seems to be more popular than Renyi entropy in the context of network anomaly detection, the latter was also successfully applied in detection of different anomalies. An example is the work by Yang et al. [YKW11] who employed Renyi entropy to early detection of low-rate DDoS attacks, and Kopylova et al. [KBHJ08] who reported positive results of using Renyi conditional entropy in detection of selected worms. We believe that with parameterized entropy some limitations of Shannon entropy caused by small descriptive capability [Tel12] which results in a little ability to detect typical small or low-rate anomalies can be overcome. Moreover, we think that with properly chosen set of α-values this detection will be accurate in terms of low number of false alarms and high detection rate. In this Thesis we present original results of our research on the proper set of α-values as well as original research on the most suitable entropy type. Other techniques Apart from entropy, some other feature distributions summarization techniques are successfully used in the context of network anomaly detection, namely sketches and histograms. Soule et al. [SST+ 04] proposed a flow classification method based on modeling network flow histograms using Dirichlet Mixture Processes for random distributions. The authors validated their model against three synthetic test cases and achieved almost 100% accuracy. In [SLBK08], Stoecklin et al. introduced a two-layered sketch anomaly detection technique. The first layer models typical values of different feature components, e.g. typical number of flows connecting to a specific port while the second layer evaluates the differences between an observed feature distribution and a corresponding model. The authors claim that the main strength of their method is the construction of fine-grained models that capture the details of feature distributions, instead of summarizing it into an entropy value. A more general approach was presented by Kind et al. [MSHJSC+ 04]. In their method, histogram-based baselines were constructed from some essential network feature distributions such as addresses and ports. This work was augmented by Brauckhoff et al. in [BDWS09], who applied association rule mining, in order to identify flows representing anomalous network traffic. Although the non-entropic feature distributions summarization techniques seem to work fine, proper tuning is the main problem with them [Tel12]. The performance of detection depends, to a great extent, on the accuracy of a bin size. This may be difficult to set and control while network traffic changes.. 2.3. Existing Datasets One of the main problems in network anomaly detection is the lack of good and publicly available datasets for evaluation purposes. The authors of research in this area have noticed this situation [CRKM11], [Owe10], [EDD+ 13], [GGSZ14]. Some of the research works employ "what is available", that is, datasets that are outdated (from the point of view of both legitimate traffic and anomalies they contain); some works are based on own datasets, prepared for the sole purpose of evaluating a proposed method "somehow" – as the dataset creation was not the goal in itself, its quality is usually limited. In our paper [MBM15] a detailed review of the existing datasets is presented, requirements are defined and dataset preparation methods are described. Real network traces are the most valuable P. Bereziński Entropy-based Network Anomaly Detection.

(19) 2.3. Existing Datasets. 19. but because of privacy issues they are rarely published. One possible solution for privacy is anonimization [CMRB09], [KAA+ 06], [FAAM07]. The goal of anonymization is to preserve the structure of the data while at the same time preserve privacy policies. Finding the right balance sometimes may be a difficult task [SSTG12]. Another problem with real traces is a proper labeling, which in many cases has to be done manually. Real traffic traces can be found in some publicly available repositories, such as Internet Traffic Archive [ITA], LBNL/ICSI Enterprise Tracing [LBN], SimpleWeb [Sim], Caida [Cai], MOME [MoM], WITS [WIT], UMASS [UMa]. Unfortunately, these traces are usually old, unlabeled and not dedicated to anomaly detection. Alternative approaches cover synthetic or semisynthetic datasets. To build such dataset, a deep domain knowledge and appropriate methods and tools are required in order to get realistic data. According to Brauckhoff et al. [BWM08], a realistic simulation of legitimate traffic is largely an unsolved problem today and combining synthetic anomalies with real, background traffic traces is one of the solutions. In [BWM08] and then in [Bra10] she introduced the FLAME tool which allows injection of hand-crafted anomalies into a given legitimate traffic flow trace. This tool is freely available but the current distribution does not include any models reflecting anomalies. Another interesting concept was introduced by Shiravi et al. [SSTG12]. The authors proposed to describe network traffic (not only flows) by a set of so-called α and β profiles which can subsequently be used to generate a dataset. The α-profiles consist of actions which should be executed to generate a given event in the network (such as attack) while in β-profiles certain entities (packet sizes, number of packets per flow) are represented by a statistical model. Regrettably, this solution is not freely available. Lack of traces of botnet-like malware behavior in available network datasets questions their timeliness. This type of traces should be included in contemporary datasets and researches should address anomalies typical for botnet-like malware in their methods as nowadays they are one of the main threat. The number of datasets containing botnet-like malware anomalies is limited. Worth mentioning are these prepared by Shiravi et al. [STG+ 11] and Garcia et al. [GGSZ14]. The first one is a mixture of malicious and non-malicious datasets. Unfortunately only one host in this datasets is infected with a botnet-like malware. The second dataset which has been made public recently is much richer and consist of traces of 13 different scenarios of running bots from 7 different families. It is obtained by running real (mostly unmodified) malware on a subnetwork of infected hosts in a lab environment. This traffic has been mixed with background traffic coming from real network. A controversial (but beneficial from the point of view of the resulting dataset) decision was not to restrict botnet communication with the Internet in any way. For privacy reasons, the dataset contains NetFlow data; additionally, full packet capture of botnet activity is included. The dataset is carefully labeled, although the whole traffic from infected hosts was marked as hostile. Unfortunately this dataset was unavailable while preparing this Thesis. An interesting dataset has been also prepared by Sperotto et al. [SSVP09]. This dataset is based on data collected from a real honeypot (an isolated and monitored trap) which was running for several days. The honeypot featured common network services such us HTTP, SSH and FTP. The authors gathered about 14 million malicious network flows and most of them referred to activity of web and network scanners. Some details about particular anomalies in this dataset are also presented in our paper [BPMP14]. Even though some valuable datasets are emerging, many researchers still make use of very old and criticized DARPA [HLF+ 01] dataset and its modified versions, namely, KDD99 [KDD] and NSL-KDD [TBLG09]. Besides strong P. Bereziński Entropy-based Network Anomaly Detection.

(20) 2.4. Summary. 20. criticism by McHugh [McH00], Mahoney et al. [MC03] or Thomas [TSB08] for being unrealistic and not balanced, nowadays DARPA datasets are simply out of date in the context of network services and attacks.. 2.4. Summary As one can see, network anomaly detection is a very broad and heavily explored area. The problem of a generic anomaly detection method for network anomalies is still unsolved. The widely used security solutions are ineffective against modern botnet-like malware. Feature distribution approach is very promising. To summarize feature distributions application of entropy seems to be the best choice. Entropy fits well to network anomaly detection, because some network attacks or anomalies result in concentrating or dispersing probability distributions of network features but do not result in significant traffic volume change. It seems that with parameterized entropy some limitations of Shannon entropy caused by small descriptive capability, which results in a little ability to detect typical small or low-rate anomalies, can be overcome. Usage of a broad spectrum of α-values seems to be crucial because unlike Ziviani, Shafiq or Lima we do not believe that it is possible to find a single α-value that fits to particular anomaly type. None of the authors adopt entropy to detect anomalies indicating botnet-like malware. Current methods are dedicated to detecting massive worm spreads (not popular nowadays) and DDoS attacks in high speed networks. The problem of finding a proper set of α-values, proper set of network feature and proper classification (not just detection) method in order to find not only massive but also small and low-rate anomalies, such as these typical to botnet-like behavior in local networks, remains intact. This may contribute to the current state of the art in a botnet detection which is limited to some non-entropic methods, e.g. method proposed by Livadas et al. [LWLS06] who proposed a machine learning technique to identify the C&C traffic of IRC-based botnets, Francois et al. [FWB+ 11] who presented a system that uses the PageRank algorithm to detect different families of peer-to-peer botnets via network flows and Bilge et al. [BBR+ 12] who proposed advanced knowledge-based botnet hunting system named DISCLOSURE. The possibility of use of parameterized entropies for detection of anomalies connected with botnet-like malware has been confirmed in the following chapters. Because of the lack of a realistic, up-to-date and representative datasets, additional effort to develop labeled traces based on real legitimate traffic and synthetic anomalies [BSJM14] reflecting botnet-like activity in local network had to be also taken.. P. Bereziński Entropy-based Network Anomaly Detection.

(21) 3. Entropy-based network anomaly detector – preface. In order to prove the claim of the Thesis, an entropy-based network anomaly detection module named Anode has been proposed. It is developed to cooperate with the existing signature-based or known pattern-based security solutions such as the popular Intrusion Detection Systems, e.g. Snort [Roe99], Bro [Pax99] as well as Flow-based Network Traffic Analyzers, e.g. NfSen [NfS], NtopNg [Nto]. We used such sulutions in SOPAS system [CKP+ 11], [BlPJ12], [JPB+ 12] developed to protect a set of connected heterogenous systems which are not centrally managed. Currently, Anode is a component of the anomaly detection and security event data correlation system developed in SECOR [JSl14a] project which is SOPAS’ successor. In SECOR, Anode is expected to detect network anomalies with acceptable False Positive Rate [Faw06] and high True Positive Rate [Faw06], categorize anomalies and report some details (timestamps, related addresses and ports) to the correlation engine which correlates events coming from different anomaly detection modules and external sensors, such as the aforementioned Snort, in order to improve detection and limit false alarms. SECOR anomaly detectors are not only limited to network. For example, one of the components named PRONTO [JSl14a], [JSl14b] detects obfuscated malware at infected hosts. General operating principle of Anode is presented in Fig. 3.1.. Figure 3.1: Anode – Entropy-based network anomaly detection module Anode analyzes network flows. Various network feature distributions based on flows, e.g. addresses, ports, are summarized by means of entropy. There are two phases: training and detection. In the training phase, a profile of legitimate traffic is built and a model for classification is prepared. In the detection 21.

(22) 3.1. Main features. 22. phase, current observations are compared with the model. An abnormal dispersion or concentration for different network feature distributions indicates anomaly. Extraction of anomaly details is also assumed – related ports and addresses are obtained by looking into the top contributors to the entropy value. A much more detailed description of the architecture is provided in Chapter 6.. 3.1. Main features The main features of Anode are presented below: – off-line and on-line analysis of network flows within fixed time intervals; – supervised machine learning with training and detection phases; – multi-class classification; – summarization of network feature distributions with parameterized Tsallis or Renyi entropy; – use of selected range of α-values for entropy instead of single value which fits well; – use of selected set of network features in order to detect a broad spectrum of anomalies; – use of fine-grained legitimate network traffic profile; – anomaly evidence extraction by reporting ip addresses and ports of attackers and victims.. 3.2. Classification of the approach On the basis of the main features, according to Figure 3.2, one can classify our approach as: – anomaly detection; – Network-based Intrusion Detection System (NIDS); – having a centralized architecture; – with a detection module fed up by the network traffic data; – analyzing incidents off-line and on-line.. P. Bereziński Entropy-based Network Anomaly Detection.

(23) 23. Figure 3.2: Features of detection methods (based on [Ren11]). 3.2. Classification of the approach. P. Bereziński Entropy-based Network Anomaly Detection.

(24) 4. Entropy. This chapter presents an introduction to the theoretic fundamentals of entropy. It starts with a brief overview of Shannon entropy. Next, the parameterized generalizations are presented – this part is especially important as we decide to use this form of entropy in the approach presented in this Dissertation. Finally, a comparison of entropy measures based on simulations is provided.. 4.1. Shannon entropy Definition of entropy as a measure of disorder comes from thermodynamics and was proposed in the early 1850s by Clausius [CH67]. In 1948 Shannon [Sha48] adopted entropy to information theory. In information theory, entropy is a measure of the uncertainty associated with a random variable. The more random the variable, the bigger the entropy, and in contrast, the greater certainty of the variable, the smaller the entropy. For a probability distribution p(X = xi ) of a discrete random variable X, the Shannon entropy is defined as: Hs (X) =. n X. p(xi ) loga. i=1. 1 p(xi ). (4.1). X is the feature that can take values {x1 ...xn } and p(xi ) is the probability mass function of outcome xi . The entropy of X can be also interpreted as the expected value of loga. 1 p(X). where X is drown ac-. cording to probability mass function p(x). Depending on the base of the logarithm, different units can be used: bits (a = 2), nats (a = e) or hurtleys (a = 10). For the purpose of network anomaly detection, sampled probabilities estimated from a number of occurrences of xi in a time window t are typically used. The value of entropy depends on randomness (it attains maximum when probability p(xi ) for every xi is equal) but also on the value of n. In order to measure randomness only, normalized forms have to be employed. For example, an entropy value can be divided by n or by maximum entropy defined as loga (n). Some important properties of Shannon entropy are listed below. More properties can be found in [Kar03] and [Csi08]. – Nonnegativity ∀p(xi )∈[0,1] Hs (X) ≥ 0 – Symmetry Hs (p(x1 ), p(x2 ), ...) = Hs (p(x2 ), p(x1 ), ...) – Maximality Hs (p(x1 ), ..., p(xn )) ≤ Hs ( n1 , ..., n1 ) = loga (n) – Additivity Hs (X, Y ) = Hs (X) + Hs (Y ) if X and Y are independent variables 24.

(25) 25. 4.2. Parameterized entropy. If not only the degree of uncertainty is important but also the extent of changes between assumed and observed distributions, denoted as q and p respectively, a relative entropy, also known as the KullbackLeibler divergence [Kul59], [Csi08] can be used:. DKL (p||q) =. n X. p(i) loga. i=1. p(i) q(i). (4.2). This definition is not symmetric, i.e. DKL (p||q) 6= DKL (q||p) unless p = q. To measure how much uncertainty is eliminated in X by observing Y the conditional entropy (or equivocation) [CT06] may be employed: m X n X. HS (X|Y ) =. p(xi , yj ) loga p(xi |yj ). (4.3). i=1 j=1. 4.2. Parameterized entropy The Shannon entropy assumes a tradeoff between contributions from the main mass of the distribution and the tail [MD08]. To control this tradeoff, two parameterized Shannon entropy generalizations were proposed by Renyi (1970s) [Ren70] and Tsallis (late 1980s) [Tsa88] respectively. In general, if the parameter denoted as α has a positive value, it exposes the main mass (the concentration of events that occur often), if the value is negative – it refers to the tail (the dispersion caused by seldom events). Both parameterized entropies (Renyi and Tsallis) are derived from the Kolmogorov-Nagumo generalization of an average [Mar05], [W˛e12]:. hXiφ = φ. −1. n X. ! p(xi )φ(xi ) ,. (4.4). i=1. where φ is a function which satisfies the postulate of additivity (only affine or exponential functions satisfy this) and φ−1 is the inverse function. Due to affine transformations φ(xi ) → γ(xi ) = aφ(xi ) + b (where a and b are numbers), the inverse function φ(xi ) is expressed as γ −1 (xi ) = φ−1 ( xia−b ) Renyi proposed the following function φ:. φ(xi ) = 2(1−α)xi. (4.5). Renyi entropy can be obtained from the Shannon entropy with the following transformations:. HRα (X) = φ−1. n X. ! p(xi )φ(− log2 p(xi )). i=1. Given φ(xi ) = 2(1−α)xi and φ−1 (xi ) =. 1 (1−α). log2 xi. P. Bereziński Entropy-based Network Anomaly Detection. (4.6).

(26) 26. 4.2. Parameterized entropy. 1 HRα (X) = log2 1−α 1 log2 = 1−α 1 = log2 1−α 1 log2 = 1−α. n X i=1 n X i=1 n X i=1 n X. ! p(xi )2−(1−α) log2 p(xi ) ! log2 p(xi )(α−1). p(xi )2. (4.7). ! p(xi )p(xi ). (α−1). ! p(xi )α. i=1. After transformation, a well-known form of Renyi entropy is obtained: 1 HRα (X) = loga 1−α. n X. ! p(xi )α. (4.8). i=1. The Renyi entropy satisfies the same postulates as the Shannon entropy and there are the following relations between these two: HRα1 (X) ≥ HS (X) ≥ HRα2 (X). 1 loga α→1 1 − α lim. (4.9). where α1 < 1 and α2 > 1 ! n n X X α = Hs (X) = p(xi ) loga p(xi ) i=1. i=1. 1 p(xi ). (4.10). Tsallis proposed the following function φ: 2(1−α)xi − 1 1−α After transformation, a well-known form of Tsallis entropy is as follows: φ(xi ) =. 1 HT α (X) = 1−α. n X. (4.11). ! p(xi )α − 1. (4.12). i=1. As it can be seen this entropy is non logarithmic. There are the following relations between the Shannon and the Tsallis entropy: HT α1 (X) ≥ HS (X) ≥ HT α2 (X). 1 α→1 1 − α lim. n X. where α1 < 1 and α2 > 1 ! p(xi )α − 1. = log 2Hs (X) = log 2. i=1. (4.13). n X i=1. p(xi ) loga. 1 p(xi ). (4.14). Moreover, the Tsallis entropy is nonextensive, i.e. it satisfies only pseudo-additivity criteria. For an independent discrete random variables X,Y : HT α (X, Y ) = HT α (X) + HT α (Y ) + (1 − α)HT α (X) + HT α (Y ). It means that:. P. Bereziński Entropy-based Network Anomaly Detection. (4.15).

(27) 27. 4.3. Comparison. HT α (X, Y ) > HT α (X) + HT α (Y ) for α ∈ (−∞, 1) and HT α (X, Y ) < HT α (X) + HT α (Y ) for α ∈ (1, ∞) To summarize parameterized (Renyi and Tsallis) entropies. Both of them: – expose concentration for α > 1 and dispersion for α < 1; – converge to the Shannon entropy for α → 1.. 4.3. Comparison In order to understand, compare and successfully apply parameterized entropies in our approach, some simulation experiments were conducted. Firstly, a comparison of Shannon, Renyi and Tsallis entropy of a binominal probability distributions was performed. Then, calculated entropies for a uniform distribution were compared to check how they depend on a number of equal probabilities and α-values. Next, the impact of rare and frequent events on the entropy for different α-values was examined. Finally, we looked at exemplary network feature distribution of addresses and ports in order to summarize them with Renyi and Tsallis entropy.. 4.3.1. Binominal distribution Shannon, Renyi and Tsallis entropy for a binominal probability distribution where the probability of success is p, and the probability of failure is 1−p is depicted in Fig. 4.1, Fig. 4.2 and Fig. 4.3 respectively. 1 0.9 0.8 HS. 0.7 0.6 0.5 0.4 0.3 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 P Figure 4.1: Shannon entropy – binominal distribution. It is noticeable that maximum entropy for Shannon is obtained when p = 1 − p. Renyi and Tsallis converge to the Shannon entropy for α → 1. Note: according to Eq. 4.14 values of Tsallis entropy need P. Bereziński Entropy-based Network Anomaly Detection.

(28) 28. 4.3. Comparison. 3. α = −2 α = −1 α=0. 2.5. α=1 α=2. HRα. 2 1.5 1 0.5 0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 P Figure 4.2: Renyi entropy of several α-values – binominal distribution. 3. α = −0.5 α = −0.1 α=0 α=1. 2.5. α=2. HT α. 2 1.5 1 0.5 0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 P Figure 4.3: Tsallis entropy of several α-values – binominal distribution. P. Bereziński Entropy-based Network Anomaly Detection.

(29) 29. 4.3. Comparison. Shannon = Renyi α ∈ (−∞, ∞) Tsallis α = −0.1. 10. Tsallis α = 2. H(X). 8 6 4 2 0 2. 3. 4. 5. 6 n. 7. 8. 9. 10. Figure 4.4: Shannon, Renyi and Tsallis entropy – uniform distribution. to be multiplied by. 1 log 2. to get the similar to Shannon curve for α → 1. For α ≥ 1 Renyi and Tsallis. entropy behaves similar to Shannon as both reach maximum for p = 1 − p, although Tsallis maximum entropy changes with α, while Renyi maximum entropy is always equal to 1. For α ≤ 1 Tsallis and Renyi entropy curves are concave as in this case low probabilities are exposed.. 4.3.2. Uniform distribution Shannon, Renyi and Tsallis entropy for a uniform probability distribution is depicted in Fig. 4.4. In this distribution maximum entropy (case when probabilities are equal) is calculated for different n representing number of equal probabilities. As it can be seen entropy always grows with n. Renyi entropy grows similarly to Shannon, no matter which α-value is used. Tsallis entropy behaves differently as it depends not only on n but also on α.. 4.3.3. Impact of frequent and rare events Example Let us assume a discrete random variable X = addresses observed in network within last 1 min. X = {“10.1.0.1”, “10.1.0.2”, ”10.1.0.3”, ”10.1.0.4”, ”10.1.0.5”}, and the following number of occurrences for the subsequent addresses F req = {96, 1, 1, 1, 1}. Based on frequencies let us estimate the following probability distribution of X (see Table. 4.1). Let us examine what is an impact of a frequent event p(X = “10.1.0.1”) = 0.96 and rare event p(X = “10.1.0.2”) = 0.01 on the Renyi and Tsallis entropy when α = −2 and α = 2 values are used. To measure the impact of these events, we can check results of expotential expression p(xi )α existing in both Renyi and Tsallis formulas [Eq. 4.8, Eq. 4.12]. The results are presented in Table. 4.2. P. Bereziński Entropy-based Network Anomaly Detection.

(30) 30. 4.3. Comparison. Table 4.1: Probability distribution of X. X. “10.1.0.1”. “10.1.0.2”. ”10.1.0.3”. ”10.1.0.4”. ”10.1.0.5”. p(X = x). 0.96. 0.01. 0.01. 0.01. 0.01. Table 4.2: Impact of frequent and rare events on the value of parameterized entropy. HH. α HH H HH p(xi ). -2. 2. 0.96. 1.08. 0.92. 0.01. 10000. 0.0001. As it can be seen the impact of frequent events (expressed by p(xi ) = 0.96) on the entropy is greater than impact of rare events (expressed by p(xi ) = 0.01) when positive α-values are used and in contrast, the impact of rare events is greater than that of frequent events when negative α-values are used.. 4.3.4. Entropy of exemplary distributions In this section, an analysis of entropy value for sample distributions reflecting both legitimate and anomalous network traffic is performed. The aim of this experiments is to show, how via entropies, highlight concentration (frequent events forming the main mass) and dispersion (rare events forming the tail) caused by typical network anomalies such as port and network scans. This type of anomalies are specific for botnet-like malware. More details about scan anomalies can be found in [BI08] and [MFF14]. Experiments help to understand how parameterized entropies differ from Shannon entropy. Moreover, it allows to learn how Renyi, Tsallis and Shannon entropies differ in a context of sensitivity. Before we start analyzing distribution characteristic for anomalies, let us start with a very basic example with even, concentrated and dispersed distribution as presented in Fig. 4.5. On Y axis we have a number of occurrences of certain instances, e.g. addresses or ports which appear on X axis. Now let us calculate Tsallis, Renyi and Shannon entropy for each distribution. The results of this calculation are presented in Table 4.3. The change of entropy value in reference to even distribution for concentrated and dispersed distribution is presented in Table 4.4. As it can be seen Shannon and parameterized entropies behave similarly when positive α-values for parameterized entropies are used. Higher concentration reflects a decrease in the entropy while higher dispersion reflects an increase in the entropy value. For this case Renyi entropy seems to be the most sensitive. For negative α-values situation is slighty different. Parameterized entropies differ from Shannon because for both concentration and dispersion the value of entropy increases. This higher value of entropy for more concentrated distribution is due to the fact that in this case estimated (based on number of occurrences) probabilities in the tail are lower and more exposed by negative α-value. In general, for a negative α-values Tsallis entropy is far P. Bereziński Entropy-based Network Anomaly Detection.

(31) 31. 4.3. Comparison. Even. Occurrences. 3. 2. 1 Instances Concentrated. Occurrences. 30. 20. 10. 1 Instances Dispersed. Occurrences. 3. 2. 1 Instances Figure 4.5: Even, concentrated and dispersed distribution. P. Bereziński Entropy-based Network Anomaly Detection.

(32) 32. 4.3. Comparison. more sensitive than Renyi and Shannon. Table 4.3: Entropy values for even, concentrated and dispersed distributions. Shannon. Renyi α = 2. Renyi α = −2. Tsallis α = 2. Tsallis α = −2. even. 3.78. 3.64. 4.07. 0.92. 1581. concentrated. 2.79. 1.82. 4.67. 0.72. 5508. dispersed. 4.82. 4.7. 5. 0.96. 10968. Table 4.4: Entropy value change in reference to even distribution. Shannon. Renyi α = 2. Renyi α = −2. Tsallis α = 2. Tsallis α = −2. concentrated. −26%. −50%. +14%. −22%. +248%. dispersed. +27.5%. +29%. +22%. +4%. +594%. Now, suppose we have the following distribution of source and destination addresses as well as destination ports for 1 minute of legitimate network traffic – Fig. 4.6. Again, on Y axis we have a number of occurrences of particular addresses or ports which appear on X axis. As we see, all distributions are quite even. Let us summarize these distributions by calculating Tsallis, Renyi and Shannon entropy – Table 4.5. Table 4.5: Entropy value for addresses and ports distributions - legitimate traffic. Shannon. Renyi α = 2. Renyi α = −2. Tsallis α = 2. Tsallis α = −2. src IP addresses. 4.79. 4.57. 5.44. 0.96. 27437. dst IP addresses. 4.65. 4.18. 5.48. 0.94. 29925. dst ports. 3.75. 2.85. 5.42. 0.86. 26164. Now let us simulate two different types of anomalies in this traffic. In order to do it, we have to inject some characteristic concentration or dispersion to particular distributions. Port scan Typically, during a port scan, concentration in addresses and dispersion in ports is observable. Let us modify our distribution to simulate this type of anomaly. Suppose that a single host was scanned and the number of scanned ports was equal to 50. Modified distributions are depicted in Fig. 4.7. Now let us recalculate the entropy – Table 4.6 and compare new results with these for the legitimate traffic – Table 4.7. As it can be seen each entropy properly reported (as a value change) a concentration in source and destination addresses and dispersion in destination ports, although sensitivity of each entropy was different. For the concentration, the most significant change was obtained for Renyi with positive αP. Bereziński Entropy-based Network Anomaly Detection.

(33) 33. 4.3. Comparison. Source IP addresses 5. Occurences. 4 3 2 1 Instances Destination IP addresses. Occurences. 10. 5. 1 Instances Destination ports 20. Occurences. 15 10 5 1 Instances Figure 4.6: Addresses and ports distributions – legitimate traffic. P. Bereziński Entropy-based Network Anomaly Detection.

(34) 34. 4.3. Comparison. value (about 50% decrease) and Tsallis with negative α-value (more than 200% increase). Dispersion in destination ports was the most distinctly exposed by negative α-values of Tsallis entropy (as more than 100 % increase). Table 4.6: Entropy value for addresses and ports distributions – port scan. Shannon. Renyi α = 2. Renyi α = −2. Tsallis α = 2. Tsallis α = −2. src IP addresses (conc.). 4.79. 4.57. 5.44. 0.96. 27437. dst IP addresses (conc.). 4.65. 4.18. 5.48. 0.94. 29925. dst ports (disp.). 3.75. 2.85. 5.42. 0.86. 26164. Table 4.7: Entropy value change in reference to legitimate traffic distributions – port scan. Shannon. Renyi α = 2. Renyi α = −2. Tsallis α = 2. Tsallis α = −2. src IP addresses (conc.). −23%. −50%. +10%. −17%. +217%. dst IP addresses (conc.). −22%. −46%. +10%. −16%. +217%. dst ports (disp.). +48%. +54%. +28%. +10%. +104%. Network scan Typically, during a network scan, concentration in source addresses and destination ports as well as dispersion in destination addresses is observable. Let us modify our distribution to simulate this type of anomaly. Suppose that a single host scanned 100 hosts to check if particular serviceon these hosts is running. Modified distributions are depicted in Fig. 4.8. Now let us recalculate the entropy – Table 4.8 and compare new results with these for the legitimate traffic – Table 4.9. As it can be seen, each entropy properly reported (as a value change) concentration in source addresses and destination ports as well as dispersion in destination addresses, although similarly as in the previous example sensitivity of each entropy was different. For both concentration and dispersion the most significant change was obtained for Tsallis with negative α-value (more than 500% increase and more than 3500% increase respectively). Table 4.8: Entropy value for addresses and ports distributions – network scan. Shannon. Renyi α = 2. Renyi α = −2. Tsallis α = 2. Tsallis α = −2. src IP addresses (conc.). 2.83. 1.4. 6.35. 0.62. 180164. dst IP addresses (disp.). 6.83. 6.37. 7.21. 0.99. 1093036. dst ports (conc.). 2.43. 1.35. 6.33. 0.61. 171809. Different network anomalies cause concentration or dispersion in different network feature distributions. Not only the aforementioned addresses and ports can be used. It should be augmented by others, P. Bereziński Entropy-based Network Anomaly Detection.

(35) 35. 4.3. Comparison. Source IP addresses 5. Occurrences. 4 3 2 1 Instances Destination IP addresses. Occurrences. 10. 5. 1 Instances Destination ports. Occurrences. 20 15 10 5 1 Instances Figure 4.7: Addresses and ports distributions – port scan. P. Bereziński Entropy-based Network Anomaly Detection.