Index of /rozprawy2/10715

Pełen tekst

(1)Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie Wydział Elektrotechniki, Automatyki, Informatyki i In˙zynierii Biomedycznej K ATEDRA I NFORMATYKI S TOSOWANEJ. ROZPRAWA DOKTORSKA. MJR MGR IN Z˙ .. BARTOSZ JASIUL. M ODELOWANIE WYBRANYCH ATAKÓW CYBERNETYCZNYCH Z WYKORZYSTANIEM ONTOLOGII I SIECI P ETRIEGO. P ROMOTOR : dr hab. Marcin Szpyrka, prof. AGH. Kraków 2013.

(2) AGH University of Science and Technology in Krakow Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering D EPARTMENT OF A PPLIED C OMPUTER S CIENCE. P H D T HESIS. M AJ . BARTOSZ JASIUL , M.S C . E NG .. M ODELING OF S ELECTED C YBER T HREATS WITH O NTOLOGY AND P ETRI N ETS. S UPERVISOR : Marcin Szpyrka, Ph.D., D.Sc.. Krakow 2013.

(3) I would like to express my sincere gratitude to several individuals without whom I would not be able to complete this Thesis successfully. First and foremost, I would like to thank Professor Marcin Szpyrka whom I have cooperated with for the last two years. I will never forget our first meeting. I was after reading his book about Real Time Colored Petri nets and on four pages I presented him what I was planning to realize in my dissertation. He agreed at once to supervise my work and the results of our cooperation are visible in this Thesis. I would like to thank for the atmosphere of research he created, for spending time on verification of my work and discussions even late in the evening. ´ I sincerely thank Joanna Sliwa, who has always motivated me for realization of this Thesis, that she found time for scientific support and involvement in my researches. I greatly appreciate assistance of Rafał Piotrowski, his valuable remarks, advice and evaluation of the results of my work. I also thank Kamil Gleba and Paweł Skarz˙y´nski for their help in development of cyber defence applications. I am also grateful to Beata Sobiech for proofreading of this Thesis. Finally, I would like to thank my family, wife Mariola, daughter Ula, and son Adam, for understanding that lately I have to sacrifice my family life for completing these researches. Bartosz Jasiul.

(4) Contents. 1. Introduction..................................................................................................................................... 6. 1.1.. Motivation............................................................................................................................... 6. 1.2.. The problem overview ............................................................................................................ 7. 1.3.. Aim ......................................................................................................................................... 8. 1.4.. Claim ...................................................................................................................................... 8. 1.5.. Work outline ........................................................................................................................... 8. 2. Related work................................................................................................................................... 10 2.1.. Malicious software ................................................................................................................ 10 2.1.1. Malware features........................................................................................................ 12 2.1.2. Malware characteristics.............................................................................................. 13. 2.2.. Evading virus detection technologies .................................................................................... 15. 2.3.. Malware detection techniques based on ontology and CP-nets – an overview ..................... 15. 3. PRONTO – malware hunting tool – preface ............................................................................... 18 3.1.. Approach to malware detection ............................................................................................. 18. 3.2.. The idea of PRONTO module ............................................................................................... 19. 3.3.. PRONTO module classification ............................................................................................ 20. 4. Ontology.......................................................................................................................................... 22 4.1.. Ontology definition................................................................................................................ 22. 4.2.. Semantic models.................................................................................................................... 23. 4.3.. Rules ...................................................................................................................................... 30. 4.4.. Ontology applications............................................................................................................ 31. 5. Colored Petri nets........................................................................................................................... 34 5.1.. Formal definition of non-hierarchical CP-nets ...................................................................... 34. 5.2.. Places ..................................................................................................................................... 35. 5.3.. Transitions and arcs ............................................................................................................... 37. 5.4.. Hierarchical CP-nets.............................................................................................................. 38. 5.5.. Applications of CP-nets......................................................................................................... 41. 6. The architecture of the solution .................................................................................................... 43 4.

(5) CONTENTS. 5. 6.1.. PRONTO module design ....................................................................................................... 43. 6.2.. PRONTOlogy – events filtering............................................................................................. 46 6.2.1. Ontology model.......................................................................................................... 46 6.2.2. PRONTOlogy engine ................................................................................................. 49. 6.3.. PRONTOnet – malware tracking........................................................................................... 50 6.3.1. An approach to malware tracking .............................................................................. 50 6.3.2. Utilization of CP-net models for malware tracking ................................................... 51. 7. Verification of the modeling approach ......................................................................................... 56 7.1.. Verification of the model ....................................................................................................... 56 7.1.1. PRONTOlogy.owl evaluation..................................................................................... 56 7.1.2. Evaluation of cyber attack CP-net models construction ............................................ 60. 7.2.. Cyber attacks detection – an experiment............................................................................... 65 7.2.1. Data acquisition.......................................................................................................... 65 7.2.2. Scenarios of malware detection ................................................................................. 68. 8. Conclusions and further works..................................................................................................... 80 8.1.. Conclusions ........................................................................................................................... 80. 8.2.. Further works......................................................................................................................... 81. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(6) 1. Introduction. This chapter introduces the reader to the subject of the Thesis. Firstly, it presents motivation and briefly introduces the scope of the problem. Then, it defines the aim and the claim of the Thesis. Finally, it familiarizes the reader with the outline of this Dissertation and presents contents of subsequent chapters within which the claim is proven.. 1.1. Motivation Computer system security is based on the main three pillars: confidentiality (C), integrity (I), and availability (A). It is supported in terms of CIA provision in majority by cryptographic functions. Confidentiality is realized by encryption, physical protection, and separation of sensitive information from those generally accessible. Integrity can be reached by using various hash functions, signatures, checksums or meta-labels. Availability refers to the accessibility of system functions and stored data to eligible users. Additionally, there is often the requirement for the system to be operating correctly for a certain percentage of time. These core principles of information and system security are extended by non-repudiation, authenticity and privacy. Overwhelming number of computer systems are connected to each other by global network – Internet, which allows to produce results beyond those achievable by the individual systems alone [Buc05]. Outcomes of cooperative work and accessibility of information are perceived and appreciated probably by all its users. The advantages of this technology are available, unfortunately, also for hostile goals. The number of cyber threats arises rapidly [AG10], [GN11], [Nam12] from 23 680 646 in 2008 [Gos09] to 1 595 587 670 in 2012 [MN13], and this is nowadays one of the most vexing problems in computer system security [CDPMM09]. At the end of 2012 Kaspersky Lab, the Russian producer of antivirus software, reported that [Rai12] it currently detects and blocks more than 200 000 new malicious programs every day, a significant increase from the first half of 2012, when 125 000 malicious programs were detected and blocked each day on average. Although awareness about necessary security appliances seems to be common and the tools used for that purpose are getting more and more advanced, the number of successful attacks targeted on computer systems is growing [TAEC13]. They are mostly related to denial of offered services, gaining access or stealing private data, financial fraud, etc. Moreover, the evolution towards cloud computing, increasing use of social networks, mobile and peer-to-peer networking technologies that are intrinsic part of our 6.

(7) 1.2. The problem overview. 7. life today, carrying many conveniences within our personal life, business and government, gives the possibility to use them as tools for cyber criminals and potential path of malware propagation [ADR+ 10]. Computer systems are prone to cyber attacks even though a number of security controls are already deployed. Cyber criminals are focused on finding a way to bypass security controls and gain access into the protected network. For that reason organizations, companies, governments and institutions as well as ordinary citizens all over the world are interested in detection of all attempts of malicious actions targeted on their computer networks and single machines. Malicious activity detection usually starts with application of various techniques. The success rate of the applied methods for malware detection depends on the reliability of the malware model. Usually they are based on the code signatures. Security controls (e.g. antivirus tools) might be maladjusted because signatures of new threats are not identified yet. Hackers often use existing parts of code in order to implement new types of malware. This allows, in return, to quickly develop signatures of new dangerous software. Therefore, the more signatures are deployed the more malicious codes are identified. On the other hand, one of the methods of misleading the signature-based detection systems is code obfuscation, the aim of which is generating – from already existing code – a new application that cannot be assessed yet as risky by security controls. This technique is simple to be used and potentially successful, so that also successful countermeasures are necessary. One of the examples is to follow behaviors of malicious software in order to identify them and eliminate from the protected system. This Thesis proposes the response to the current needs of both individual users and huge international organizations in terms of behavioral analysis of malware.. 1.2. The problem overview According to Nomura Research Institute annual report on cyber security trend in 2012 [TMM12] a hundred percent organizations had antivirus products installed. Despite this, according to the report, about thirty percent of organizations are systematically infected by malware. The reason for this situation is not – as it might be expected – inappropriate update of operating systems and antivirus definition files, but lack of all signatures for existing threats. Equally, Kaspersky Lab estimated that in 2012 around 200 000 unique malware were detected every day. The mass part of them had utilized existing parts of malicious codes. This simplicity of development of the new malicious code from existing ones and effectiveness of obfuscation mechanisms make the attacker armed with a powerful weapon. Moreover, according to the study conducted in 2012 by the Verizon RISK Team with cooperation from many national federal organizations, including e.g. Australian Federal Police, Irish Reporting and Information Security Service, and United States Secret Service [VAI]: – 54% of malware took months to discover, – 29% of malware took weeks to discover, – 13% of malware took days to discover. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(8) 8. 1.3. Aim. This report shows how important is introducing new techniques that speed up the process of malware detection to hours. Authors of the report [TMM12] indicate that antivirus products should be supported by malware behavioral analysis tools in order to detect those of attacks for which signatures were not established. An existing example of appliance that uses behavioral analysis for advanced persistent threats detection is Digital DNA by HBGary that extends the capabilities of McAfee Total Protection antivirus [McH]. Detailed technical specifications of this solution have not been released for public. The product brochure provides that multiple low level behaviors are identified for every running program or binary. This leads to conclusion that each application is observed from behavioral perspective. McAfee is proud that the solution allowed to detect during the last year more 0-day attacks than the previous five years combined. This indicates the scale of new malware development and efficacy of the behavioral approach.. 1.3. Aim The aim of the Thesis is to propose, develop and verify a Method of modeling cyber threats directed at computer systems. Moreover, the goal is to prove that the Method enables to create models resembling the behavior of malware that support the process of selected cyber attacks detection. Proposed approach to modeling of cyber attacks is based on ontology and Colored Petri nets (abbr. CP-nets). This Thesis is addressed to cyber defence researchers, security architects and developers solving up-to-date problems regarding detection and prevention from advanced persistent threats.. 1.4. Claim The Thesis is to prove the following claim: The malware modeling method based on ontology and Colored Petri nets enables to detect cyber attacks the code of which has been obfuscated. The claim has been proven by performing the following tasks: 1) Development and verification of cyber threats ontology and reasoning rules. 2) Showing that the ontological model and reasoning rules enable identification of single cyber incidents among regular activities. 3) Modeling of cyber attacks directed at computer system with utilization of Colored Petri nets. 4) Verification of the method combining ontology and CP-net models reflecting cyber threats in order to prove that it is applicable for detection of attacks on the monitored computer systems.. 1.5. Work outline Chapter 2 presents malware types, their features and characteristics. This is followed by description of the methods for evading virus technologies. This chapter also presents described in literature and B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(9) 1.5. Work outline. 9. considered beneficial techniques for cyber attack detection limited to behavioral analysis and utilization of ontology and Petri nets. Chapter 3 provides the overview of the approach taken to prove the Thesis and introduces the reader to the proposed Method. Additionally, it presents the classification of the Method among other existing ones. Chapter 4 introduces the definition of ontology and reasoning rules. It also shows their use in computer systems and briefly presents their wide scope of application in different scientific and practical fields. Chapter 5 formally defines Colored Petri nets with the examples that allow to quickly understand their nature and appliance. It also presents applications of CP-nets in many areas. Chapter 6 describes the architecture of the proposed solution. It introduces the reader to the concept of how ontology and CP-nets are utilized in order to model malicious actions in the monitored system and presents the approach to their application in threats’ tracking tools. Chapter 7 presents verification of the proposed Method and describes practical scenarios of malware detection with the use of the developed tools. Chapter 8 briefly summarizes achieved results, presents conclusions and outlines future work.. Acknowledgment This Thesis has been partially supported by the National Centre for Research and Development project no. PBS1/A3/14/2012 "Sensor data correlation module for detection of unauthorized actions and support of decision process" and the European Regional Development Fund the Innovative Economy Operational Programme, under the INSIGMA project no. 01.01.02-00-062/09.. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(10) 2. Related work. This chapter introduces the reader to broad range of malicious software that threaten computer systems security. It presents classification of malware types from different viewpoints, their characteristics as well as the problem of evading anti virus technologies. Finally, it briefs general results of the-state-ofthe-art analysis in terms of cyber attacks detection.. 2.1. Malicious software The term malware, in the area of computer science, is defined as a malicious code that executes unwanted and possibly dangerous activities on a computer system. All malicious activities detected by antivirus tools are classified into particular groups. They may vary depending on the chosen classification approach. Classification proposed in this Thesis is based on two interesting sources that have different points of view in this area, i.e.: the book titled "Practical Malware Analysis. The Hands-On Guide to Dissecting Malicious Software" [SH12] and Kaspersky Lab Classification Tree [Lab13]. Malicious software has been divided by Kaspersky Lab into the following main classes: Malware, AdWare, RiskWare, and PornWare. Further on, the main class – Malware – consists of the following disjoined sub-classes: – Viruses and Worms – malicious programs that self-replicate on computers or via computer networks without the user being aware. – Trojans – malicious programs that perform actions, which are not authorized by the user: they delete, block, modify or copy data, and they disrupt the performance of computers or computer networks. Unlike viruses and worms, the threats that fall into this category are unable to make copies of themselves or self-replicate. – Suspicious Packers – malicious programs compressed or packed using a variety of methods combined with file encryption in order to prevent reverse engineering of the program and to hinder analysis of program behavior with proactive and heuristic methods. – Malicious Tools – programs designed to automatically create viruses, worms, or Trojans, conduct DoS attacks on remote servers, hack other computers, etc. It should be noted that the Malware class includes software that could possibly have different target and influence, whereas the three other main classes in Kaspersky Lab classification are targeted to: – display advertisements (usually in the form of banners), redirect search requests to advertising websites, and collect marketing-type data about the user (AdWare); 10.

(11) 11. 2.1. Malicious software. – cover legitimate programs, which can cause damage when they fall into the hands of malicious users (and are used to delete, block, modify, or copy data, or disrupt the performance of computers or networks) (RiskWare); – display pornographic material to the user (PornWare).. Downloaders and Launchers Backdoors Credential Stealers Persistence Mechanisms Privilege Escalation User-Mode Rootkits Viruses/ Worms. Trojans. Suspicious Packers. Malicious Tools. Figure 2.1: Overlapping of malware classifications At it is shown, Kaspersky Lab definition focuses on the characteristics of the malicious code and its possibilities in terms of self dissemination and code composition. This however is not the only interesting approach. A different classification is presented in the book "Practical Malware Analysis ..." [SH12], where other classes of malicious characteristics were listed. These are: – Downloaders and Launchers that download or launch other malicious code, commonly installed by attackers when they first gain access to a system; – Backdoors that install themselves onto a computer to allow the attacker access; – Credential Stealers that collect information from a victim’s computer and usually send it to the attacker; – Persistence Mechanisms that are used to maintain the malware for a long time in the infected computer; – Privilege Escalation that installs itself onto a privileged account; – User-Mode Rootkits that conceal the existence of other code, usually paired with other malware, such as a backdoor, to allow remote access for the attacker and make the code difficult for the victim to detect. Substantially, this classification is focused mainly on the method for bypassing the border security controls and gaining access to the system, which is crucial in terms of malware behavior. Nevertheless, these two classifications in the opinion of the Author of this Thesis are overlapping as presented in Figure 2.1. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(12) 12. 2.1. Malicious software. 2.1.1. Malware features Malware can be analyzed from different perspectives. The first is their destructiveness. Destructive malware are programs with malicious intent that are implemented in order to disturb, make loss or theft. These programs are viruses, worms, botnets, spyware, trojan horses, rootkits, and backdors. Malware that is designed only to advertise products or attract to visit websites are called disturbers. They include spam and adware. The second type of malware type can be addressed by its objective. According to ITU-T recommendation "X.805: Security architecture for systems providing end-to-end communications" [IT03] cyber attacks can cause the following results: – destruction (an attack on availability) – extinction of information/systems/services/networks; – disclosure (an attack on confidentiality) – unauthorized access to an asset; – corruption (an attack on integrity) – unauthorized tampering with an asset; – removal (an attack on availability) – theft, removal or loss of information and/or resources; – interruption (an attack on availability) – information and/or network becomes unavailable or unusable. In Table 2.1 cyber threats (according to X.805) were mapped to the security dimensions. Table 2.1: X.805 security threats mapped to the security dimensions Security dimension. X.805 Security Threats Destruction. Corruption. Removal. Disclosure. X. X. X. X. X. X. X. X. X. X. Data confidentiality. X. X. Communication security. X. X. Access control Authentication Non-repudiation. X. Data integrity. X. Availability. X. X. Interruption. X. Privacy. X X. Malware analysis can be realized also from the operational perspective. It can be perceived as [TS08]: – tangible: when malware causes destruction in the victim’s machine; – intangible: malware does not cause any destruction but may cause operations such as theft or duplication; – manual adjustment: target of malware is determined manually by the attacker; B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(13) 2.1. Malicious software. 13. – self-propagation: malware choses target randomly and propagates itself from one machine to another; – single operation: target of malware is only one computer or infrastructure; – network operation: malware has more than one victim and executes multiple operations. From the establishment method perspective malware can have the following status: – centralized: the attack is executed from a single point of operation; – distributed: the attack is run from multiple/parallel sources (e.g. DDoS attack); – local: malware installed on the machine executes malicious activity on it; – remote: malware attacks other machines than it is installed on. From the communication perspective malicious software can be: – autonomic: malware does not communicate with its creator; – dependent: malware needs to communicate with its creator; – centrally controlled: malware communicates with command and control center (C&C) in order to download orders and additional code; – without central control: malware does not communicate with C&C. The above analysis allowed to classify malware types from five perspectives: destructiveness of malicious software, their objectives, realized operations, establishment method and types of communication.. 2.1.2. Malware characteristics As mentioned in section 1.1 malware realizes malicious activities on the victim’s machine/system. It can cause various damages and disorders like theft, removal of system security controls, destruction of files, etc. Since the subject of this Thesis refers straightforward to the malware behavior, a short presentation of selected malware types is introduced in this section in order to show their specifications. Viruses Virus is an autonomous code that can self-replicate on computers or via computer networks. Viruses are programs that inject themselves to other files in order to be perceived as legitimate programs. This allows them to propagate and execute themselves without the user being aware of. Worms Worms, similarly to viruses, are programs that have the possibility to self-replicate on computer or via computer networks without being noticed by the users. A copy of a worm can self-replicate, too. The difference between viruses and worms is the number of methods of self-replication – worms have more than one, while viruses utilize only one method. Moreover, worms can be easily spread over a wide or local area networks without the need to be attached to a specific file. This makes the worm independent from the carrier. For instance, a worm can be embedded on a website making every visitor a potential victim. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(14) 2.1. Malicious software. 14. Spyware Spyware is a program developed to steal, make copy or inform the author about activities of the users of infected machine. The most popular spyware are keyloggers [Bal11] that are responsible for saving passwords and keys to protected user resources (e.g. private data, bank accounts) and reporting them back to an outside source. Adware First, adware was used to advertise paid licenses of software when a free of charge software was installed. Later on, this possibility was utilized by hackers in order to attract users to visit paid content of web pages (often pornographic sites). This type of malware becomes difficult to be uninstalled when hackers have used advanced techniques that block any activity focused on modification of such a software. To get rid of such an unwanted code it is recommended to use free software frequently distributed by well known antivirus companies. Trojan horses Trojan horses or simply trojans are programs that realize activities not authorized by the user. They delete, block, modify data, and frequently – negatively affect the performance of infected machines and networks. In opposite to viruses and worms, trojans are unable to self replicate and disseminate within the computer network. Rootkits The task of rootkits is to hide the existence of malicious applications from users or programs detecting escalation of access privileges. They are usually installed after a hacker or malicious software reaches administrator account (so called root) privileges. Botnets Botnets are sets of cooperating programs on various infected machines performing specific orders or tasks that they were designed for. Botnets are often used for DDoS attacks in order to make a website or a network impossible to be reached by legitimate users. The above mentioned malicious software types perform their activities in majority of cases in five phases of a cyber attack: – first contact: malware must find a way to contact with users; – local execution: threats use a diversity of ways to enter a system and begin to write files on disk and modify the system in order to set up a base for downloading or execution of the destructive code; – establish presence: cyber attacks use several tricks to hide themselves from detection before even beginning of their malicious work; – malicious activity: cyber attacks start to realize their business according to the intent of their developers, e.g. stealing passwords, bank frauds, selling fake antiviruses or programs; – dissemination: malicious software distributes itself to other machines and systems in order to bring higher profit to the attacker. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(15) 2.2. Evading virus detection technologies. 15. The most important and expensive phase for attackers is establishing presence of the malicious code on the selected operating system or particular victim’s machine. The next section discusses how attackers approach and try to execute the attack without being noticed by particular system stakeholders. In this Thesis it is assumed that the proposed method supports the process of malware signatures development for malicious software components that were obfuscated as well as for 0-day attacks that use particular part of known destructive codes modified in order to make an attack successful.. 2.2. Evading virus detection technologies The method of evading antivirus tools is generally called obfuscation. It is a technique aimed at generating new software that realizes the same functions as the original one but does not have its specific code signatures. It can be realized by modification of java scripts, additional loops in the code that return to the point of execution (zero loops), encryption techniques run at program execution, etc. The list of obfuscation techniques includes, but is not limited to: – Parasitic obfuscation that is used to append, prepend, or insert code into data sections of files on disk [BKM07]. – Self-modification that allows malware to modify its code during every infection. Thus, each infected file contains different variant of the virus [LD03]. – Polymorphic coding that is an obfuscation that consists in infecting files with an encrypted copy of the virus [Auc96]. At each time an encryption key or even encryption method can be modified, therefore virus codes are different from one another in infections causing their signatures to be hard to detect [CPA+ 08]. If some part of code remains the same, an anti-virus tool can decrypt the code using an emulator. However, it is not always a successful technique. It allows to detect some malware and produce new signatures for them. – Metamorphic coding that is a technique of rewriting the functions of software every time in a different way [BM08], [RMI12]. Viruses that utilize this technique are very large and complex. Metamorphism makes viruses almost undetectable by signature-based tools. Obfuscation techniques are very successful in hiding malicious code against byte-level content analysis [KWLP05], [KM06] and static analysis methods [CJS+ 05], [Fla04], [CJ03] which make cyber attacks undetectable. Significant effort is made by cyber criminals in order to thwart detection by anti-malware tools. Moreover, methods of evading antivirus products will be developed as long as cyber crimes are profitable.. 2.3. Malware detection techniques based on ontology and CP-nets – an overview Great effort has been made lately in static analysis of malicious codes because this technique generally has brought good accuracy in malware detection [KKB+ 06], [KRFV04], [SYS+ 08]. Even though B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(16) 2.3. Malware detection techniques based on ontology and CP-nets – an overview. 16. it is an appropriate technique [KRV04], the most difficult problem it faces is difficulty to handle obfuscated binaries [Szo05]. Additionally, obfuscation techniques are perceived as NP-hard for static analysis [MKK07]. On the other hand, dynamic malware analysis is directed at reaching reliable information about executed malicious codes. Dynamic malware analysis may be based on setting up behavior clusters from sequences and measuring distances between single events [BAMJ07], [LM06]. However, this approach suffers from the lack of external rules for data analysis. According to [CJK07], [RHWD08] a successful method of dynamic malware analysis is comparison of specifications of malicious behavior with hooked processes at application level. The approach to malware modeling proposed in this Thesis is based on utilization of both ontology and Colored Petri nets used during the dynamic malware analysis. It must be emphasized that there exist related works that separately utilize ontology and CP-nets in detection or modeling of cyber threats, although they differ remarkably in the attitude to malware modeling. For instance, graph knowledge models and ontologies were used for modeling and reasoning over network attacks and attack prediction [SA12]. Knowledge representation methods and modeled network attacks with their prerequisites and consequences were used to provide description logic reasoning and inference over attack domain concepts. This way the ontology-based system was proposed to predict potential attacks using inference and observation of information provided by sensors. An ontology-based approach to instantiate security policy and reactions to network attacks was proposed to map alerts into attack contexts [CBCdV+ 08]. This solution was used to identify the policies to be applied in the network to prevent from the threat. Ontologies in this case were utilized to describe alerts, and inference rules were performed for mapping alerts into possible attacks and adequate policy rules. In general, security rules and policies can be applied both in hardware unified firewall systems [Nal07] as well as in web security systems [NL03], [ARS+ 10]. Ontologies and knowledge representation as a semantic model was successfully used for indirect association analysis to extract useful information about terrorist social network [TCK10]. Ontological filtering was adapted to transform semantic representation of a terrorist network into a set of complex networks. Then, for further processing, structured graph [TK10] was produced. This allowed to investigate terrorist social network and find relations between criminals. It is also worth to mention that ontologies were successfully adapted for detection of cyber attacks in the network traffic within the Federated Cyber Defence System (FCDS) [JPB+ 12] developed by Polish Military Communication Institute, ITTI Ltd. and CERT Poland. The Author of this Thesis was one of the major architects and developers of this system. Ontologies in FCDS were also utilized to produce so called general decision rules [CKP+ 11], [CK11] that mitigate consequences of attacks. These general decision rules were translated to the language of particular network or software security appliance in order to execute the rule and take appropriate action. For instance, system reaction units were able to block or redirect traffic to a trap or back to the attacker, disable affected network service (e.g. web service) or notify the administrator [PJS+ 11] that sensors have detected suspicious activities in the network traffic. Likewise for ontology, Colored Petri nets were successfully adapted for identification of cyber threats. In the work [KS94] authors observed that mathematical representation of Petri nets allows for B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(17) 2.3. Malware detection techniques based on ontology and CP-nets – an overview. 17. modeling of computer misuse. Proposed mechanisms consisted in representation of known attack as a sequences of events. In this case the attack was presented as a Petri net graph. Comparing misuses with the Petri net graph allowed for detection of unwanted actions. Colored Petri nets were also utilized for detection of DoS attacks in Wide Area Networks [Hea09]. In this case Colored Petri nets were adapted to model router network connections in the area of The United States. It was proved that modifications in the network infrastructure made by DoS attacks can be detected by comparison of the current state to the modeled one. Moreover, this method was proposed as an early warning system against network attacks. Additionally, it can support development of network infrastructure security strategies. Next major contribution in utilization of Colored Petri nets was identified in work [TSD10a], [TSD10b] supported by US Air Force Office of Scientific Research. This outstanding research presents a new approach to formal specification of the malicious functionalities based on activity diagrams defined in an abstract domain. It introduces abstract functional objects that, along with system objects, could be used for creating generic specifications covering multiple functionality realizations. Methodology proposed in the work utilizes Colored Petri nets for recognition of functionalities at the system call level. As we can see, application of ontology and CP-nets touches many disciplines where modeling of system behavior is critical. This particular usage is also crucial in this Thesis for ontology and CP-nets application in cyber defence, what has been proven in the following chapters.. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(18) 3. PRONTO – malware hunting tool – preface. This chapter introduces the reader to the Method of malware modeling proposed in this Thesis and describe the approach to detection of malicious activity. Additionally, it presents the classification of the Method among other existing ones.. 3.1. Approach to malware detection On the basis of the problem stated in [Bon98], let us suppose that an existing malware called Ann has distinctive code features {f1 , f2 , f3 } and signatures {s1 , s2 , s3 } that are known for antivirus tools. Then, malware Ann is modified by someone using obfuscation methods (see par. 2.2) in such a way that as a result the codes Bob and Dan are developed. The new malware Bob and Dan have the same features {f1 , f2 , f3 } like Ann although different code signatures: {s1 , s4 , s5 } and {s6 , s4 , s5 } adequately. Thus, signature-based detectors can easily detect malware Ann and some of them will detect Bob, because it has one signature typical for Ann. Neither of signature-based tools can detect Dan code as suspicious or malicious because it does not have signature similar to the existing ones. Now, let us assume that in some system malware Dan is executed and performs its malicious activity. Anti-virus scanners are unable to detect this malware, even though sensors spread in this system can deliver information about its activity. These activities observed independently are treated as regular actions of a user or software. However, identification of those events that are distinctive for this malware in system logs can prove that the system is infected by a particular modification of Ann code. In the proposed Method the basis for detection of malicious behavior are models of malware activities which reflect system resources modifications, affected components of infected systems, data exchanged with other malware or control stations, used protocols, etc. Moreover, the Method identifies suspicious events in system logs and maps them onto stored malware characteristics in the form of Colored Petri net models. For this purpose, there has been proposed a novel tool called PRONTO, which traces system logs and matches sensor data with modeled malware activities. Regarding the fact that in case of system log analysis, large number of sensor data must be processed, it is proposed that ontology reasoning is used for identification and classification of events as suspicious, malicious or regular behavior. 18.

(19) 19. 3.2. The idea of PRONTO module. 3.2. The idea of PRONTO module In order to prove the claim of the Thesis there has been proposed a behavior-oriented malware hunting tool, so called PRONTO, that could be used in parallel to existing signature-based tools. The main requirement for the presented Method is that the malware was not recognized yet by the signature mechanisms. The aim therefore is to track its suspicious activities in order to find it while running in the system.. PRONTO – malware hunting tool Reasoning. Malware description CP-net models. Knowledge base. Process monitor Registry monitor. Lifting. SQWRL query results. File monitor Network monitor . . .. Events (xsd defined). Registered markings. PRONTOlogy engine. Attack vector. Alarm: malware recognition. PRONTOnet engine. Sensors Stage 1 – events filtering. Stage 2 – threats tracking. Figure 3.1: Concept of PRONTO module The two main threads the Method is composed of are (see Figure 3.1): – Filtering of the system events registered by the system monitors (sensors) to discover the main features of the hostile activity. These features are related to particular objects and actions triggered on that objects – e.g. registry (add entry, modify entry, delete registry entry, etc.), process (start, stop process, etc.), file (copy, delete, run, open, close file, etc.), domain (connect to, etc.), IP address (connect to, etc.); – Tracking suspicious activity in order to discover malicious exploits running in the system. Filtered events are correlated in order to find similarities with the stored malware activities modeled in the form of Colored Petri nets. The result of malware tracking is the alarm that contains information vector about malicious activity, similarity to the known attacks and list of incidents that affected the system. The first component is related to capturing events from sensors and analyzing them with an expert system that uses – defined for the purpose of the Method – comprehensive ontology, so called PRONTOlogy. Registered events in the form of XML objects are sent to the PRONTOntology engine and lifted to add entries to the Knowledge Base. PRONTOlogy describes events registered by system B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(20) 3.3. PRONTO module classification. 20. monitors and is able, on the basis of rule engine and inference, with the use of specially defined rules, to classify an event as potentially suspicious, malicious or regular. As a result, markings of the modeled malware in the form of CP-nets are delivered for further analysis. The main element of the threats tracking component of the Method is PRONTOnet. It provides formal model of malware behavior and allows to track suspicious activities potentially assigning them to a class of known malware types or identifying an unknown ones. Known exploits can be undetectable to signature-based malware detecting tools after their code has been obfuscated (see section 2.2), although their activities can be easily observed. It also often happens that a new malware piece of software is composed of known components from other ones. This results in another behavior pattern that can be tracked as a new exploit, not identified yet. The result of threats tracking stage is an alert informing about identification of suspicious or malicious events with a certain similarity rate to the known malware types. Reaction to the detected attack, which is beyond this Thesis, can be realized by Federated Cyber Defense System [JPB+ 12], that has been developed since 2010 at Military Communication Institute.. 3.3. PRONTO module classification On the basis of the main features of the Method, according to Figure 3.2, one can classify it as: – recognizing known patterns of malware behavior [MDL+ 12], – Host-based Intrusion Detection System (HIDS) [VB10], – having central module responsible for malware detection [JP12], – with malware detection module fed up by the system logs [FPZ+ 08], – passively reacting for detected malware [HBB+ 07], – analyzing incidents in real time [WAFS+ 08]. PRONTO – malware hunting tool – is developed to cooperate with existing signature based detection methods (e.g. anti-viruses), which do not recognize distinctive features of new or obfuscated malware. PRONTO can be also used in so called sandboxes in order to trace development and progress of malicious activities. Thus, it could be also used as a client honeypot [WWCZ10] that waits passively for an attack. During the attack PRONTO recognizes single incidents that are compared with CP-net models of malware activity. For that reason PRONTO is classified to the known patterns recognition method. It is also possible to adapt this Method to detect anomalies. It was proved that CP-net models were successfully utilized to detect Denial-of-Service cyber attacks over the Internet’s router infrastructure [Hea09]. This work is focused only on detection of malicious activities at the host level and recognizing incidents coming from a single machine or operating system. Therefore, PRONTO is classified as Host-based Intrusion Detection System, slight modification will allow it to detect unwanted actions at the network level, though. The architecture of PRONTO itself is centralized, however PRONTO module is utilized to detect malware in a federated system. Modification of security policy and architecture in such a way to accept B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(21) 21. 3.3. PRONTO module classification. the events from other domains / computer systems / machines will allow to classify it as the federated system.. Features of malware detection methods. Method of detection. Protected system. Architecture. Source of data. Type of reaction. Time of analysis. Anomaly detection. Host (HIDS). Centralized. System logs. Active. Real time. Known paterns recognition. Network (NIDS). Federated. Network traffic. Passive. Delayed. Hybrid. Hybrid. System statistics. Legend: current classification of PRONTO possible use of PRONTO. Figure 3.2: Features of malware detection methods (based on [Ren11]) In terms of data sources, for the purpose of the Thesis PRONTO module is limited to analyzing only system logs (registry modifications, operations on files, running processes, etc.). Extension of the source of data to network traffic and system statistic will allow to search through wider range of information with probably negative effect on efficacy and efficiency. To address this problem ongoing research at Military Communication Institute is divided into three parallel tracks: – malware analysis at the host level (PRONTO), – detection of attacks in network traffic with utilization of machine learning [HJ97], [FCR09], – analysis of system statistics with utilization of Tsallis entropy theory [Tsa88]. Reaction to detected malware is beyond the scope of this work, however, it is foreseen that the Method would be passive in terms of reaction type, because it cannot be used to block the attack before its appearance. The last feature in Figure 3.2 defines the time of analysis. In this case PRONTO module must be assigned to real time solutions, because every symptom of malware is analyzed in the time when it appears.. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(22) 4. Ontology. This chapter presents an introduction of the theoretic fundamentals of ontology, inference and the use of rules. It defines what is ontology, presents different types of modeling languages with emphasis on their expressiveness, and introduces the idea of using rules. The chapter is summarized with the overview of ontology application in knowledge and software engineering.. 4.1. Ontology definition The term ontology derives from philosophy where, since the ancient times, it has been used to formally describe the surrounding world in terms of entities, their characteristics, hierarchy and relationships. In fact, ontology is still important for philosophers who are dealing with the formal logics trying to understand basic rules of the world. However, the real value of ontologies reflects in their application to knowledge representation and software engineering. Ontology application has existed in computer science since 1970s when researchers in the field of artificial intelligence understood the power of expert systems and their potential in real-world applications. However, formal definitions of ontology appeared not earlier than in 1990s [Gru93], [Gru95]. According to [AvH03] an ontology is an explicit and formal specification of a conceptualization. In general, ontology describes some domain of knowledge formally, defining basic concepts, their properties and relationships among them. With this approach an ontology, by the definition of common vocabulary enables to provide shared understanding of the meaning of terms both among people as well as among software agents. It allows do define hierarchies of ontologies and re-use existing knowledge supporting interoperability and avoiding re-inventing the wheel. Ontologies describe artifacts with different level of detail. They can be simple taxonomies (such as the Yahoo hierarchy), metadata schemes (such as the Dublin Core), as well as logical theories. Lately, with the invention and rapid expansion of the web-based tools and interoperable, platform-independent markup languages, ontologies are very frequently used for the development of the Semantic web [Hef04], where a significant degree of structure is necessary. In order to achieve this structure complexity ontologies are usually expressed in a logic-based language, so that detailed, accurate, consistent, sound, and meaningful distinctions that can be made among the classes, properties, and relations [Hef04]. This, in turn, allows to perform automated reasoning supporting development of intelligent applications that can work at the human conceptual level (e.g. soft22.

(23) 4.2. Semantic models. 23. ware agents, decision support, understanding of speech and natural language, knowledge management, automated choices). However, application of ontologies can be more trivial. Common understanding of terms and relations that they give strong support for semantic interoperability, the goal of all current system engineers. Systems working in the global network sharing information from various communities usually rely on exchanging data between parties who have agreed to the definitions beforehand. This approach however makes it necessary to update interfaces implementation every time XML Schema changes. Typically, ontology consists of a finite list of terms, and relationships between them. Its aim is to provide semantic description of objects and, in the end, allow to define facts and develop knowledge base (abbr. KB). KB is composed of two important types of statements: – TBox, so called Terminological statements, describing a conceptualization, a set of concepts and properties for these concepts with the use of controlled vocabulary, – ABox, so called Assertional statements, facts associated with terminological vocabulary within a knowledge base. Making a reference to the object-oriented programming, TBox statements are sometimes associated with object-oriented classes and ABox statements – with instances of those classes. Together, they form a knowledge base. One of the most important traits of ontology application in software engineering is the possibility to infer knowledge on the basis of facts in KB. Reasoners, dedicated software programs can, on the basis of facts, relations among those facts, axioms, assertions – infer new facts. For example, if All professors are faculty members (subclass relationship): Prof(x) -> faculty(x). and faculty members are Staff members (subclass relationship) faculty(x) -> Staff(x). and Marcin Szpyrka is a Professor (individual of the Prof class – ABox entry) Prof(Marcin Szpyrka). then: Prof(x) -> Staff(x) – all Professors are also Staff members (inferred subclass relationship), faculty(Marcin Szpyrka) – Marcin Szpyrka is also a faculty member (inferred knowledge), Staff(Marcin Szpyrka) – and a Staff member (inferred knowledge).. This very trivial example shows that, on the basis of existing knowledge, ontology enables to deduce new facts. Possibilities of semantic models expressiveness are shown on the basis of examples in Section 4.2 presenting particular semantic languages.. 4.2. Semantic models Knowledge representation with the use of ontologies can have different expressiveness, which strongly influences the possibility of querying, inferencing and reasoning. For the purpose of knowB. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(24) 4.2. Semantic models. 24. ledge engineering and automatic reasoning specialized markup languages used for semantic modeling are defined. These are: – RDF – The Resource Description Framework [LS99], – RDFS – The RDF Schema language [BGM04], and – OWL – Web Ontology Language [PSH04]. RDF statements are called triples and appear in the form of subject – predicate – object expressions. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses relationship between the subject and the object. For example, the statement "Bartosz Jasiul is the author of this Thesis." in RDF is represented by the triple: – a subject denoting Bartosz Jasiul, – a predicate denoting author, and – an object denoting this Thesis. RDF is an abstract model with several serialization formats: the most common – XML (eXtensible Markup Language) format, tabular Notation 3 (or N3), introduced by W3C as a non-XML serialization of RDF models designed to be easier to write by hand, and in some cases easier to follow and JSON (a proposal). With the use of RDF in XML the above sentence can be written as follows: <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF %deleted namespaces% <ns:person rdf:about="http://www.wil.waw.pl/phd#Bartosz Jasiul"> <ns:firstName>Bartosz</ns:firstName> <ns:lastName>Jasiul</ns:lastName> <ns:author rdf:resource="http://www.wil.waw.pl/phd#this Thesis"/> </ns:person> </rdf:RDF>. RDF defines the following vocabulary in terms of classes: – rdf:XMLLiteral – the class of XML literal values, – rdf:Property – the class of properties, – rdf:Statement – the class of RDF statements, – rdf:Alt, rdf:Bag, rdf:Seq – containers of alternatives, unordered containers, and ordered containers (rdfs:Container is a super-class of the three), – rdf:List – the class of RDF Lists, – rdf:nil – an instance of rdf:List representing the empty list. RDF also defines the following vocabulary in terms of properties: – rdf:type – an instance of rdf:Property used to state that a resource is an instance of a class, – rdf:first – the first item in the subject RDF list, B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(25) 4.2. Semantic models. 25. – rdf:rest – the rest of the subject RDF list after rdf:first, – rdf:value – idiomatic property used for structured values, – rdf:subject – the subject of the subject RDF statement, – rdf:predicate – the predicate of the subject RDF statement, – rdf:object – the object of the subject RDF statement. RDFS enhances RDF with additional vocabulary, i.e. classes, associated properties and utility properties built on the limited vocabulary of RDF. In terms of classes RDFS defines the following vocabulary: – rdfs:Resource is the top – hierarchy class. All things described by RDF are resources. – rdfs:Class – defines particular group of resources. – rdfs:Literal – literal values such as strings and integers. – rdfs:Datatype – the class of datatypes. – rdf:XMLLiteral – the class of XML literal values. In terms of Properties, which are instances of the class rdf:Property and describe a relation between subject resources and object resources: – rdfs:domain of an rdf:predicate – the class of the subject in a triple whose second component is the predicate. – rdfs:range of an rdf:predicate – the class or datatype of the object in a triple whose second component is the predicate. – rdfs:subClassOf – allows to declare hierarchies of classes. – rdfs:subPropertyOf – an instance of rdf:Property that is used to state that all resources related by one property are also related by another. – rdfs:label – an instance of rdf:Property that can be used to provide a human-readable version of a resource’s name. – rdfs:comment – an instance of rdf:Property that can be used to provide a human-readable description of a resource. – rdfs:seeAlso – an instance of rdf:Property that is used to indicate a resource that might provide additional information about the subject resource. – rdfs:isDefinedBy – an instance of rdf:Property that is used to indicate a resource defining the subject resource. This property may be used to indicate an RDF vocabulary in which a resource is described. RDF/RDFS allow to provide description of facts in the form of triples. They allow simple semantics to be associated with identifiers. They enable to define classes, their hierarchy, properties (with hierarchy) and simple restrictions on domain and range of properties. In this sense, RDF Schema is a simple ontology language. RDF/RDFS do not allow, however, to define transitive, unique or inverse properties, to define classes as sums or subtraction of classes (e.g. P erson v W oman t M an), amount restrictions, disjointness of classes. That is why for ontology representation and modeling a more expressive language is necessary. These requirements are met by OWL defined as an extension to RDF/RDFS. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(26) 4.2. Semantic models. 26. OWL stands on the top of XML/XMLSchema/RDF/RDFS stack and adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes [MvH04]. In general, OWL is based on experience gained by authors of the DAML+OIL web ontology language. In terms of expressiveness OWL is divided into three main types [MvH04]: – OWL Lite – the simplest classification hierarchy and the lowest formal complexity. It supports cardinality constraints although permits only cardinality values of 0 or 1. – OWL DL – based on description logics paradigm. Includes all OWL language constructs, but they can be used only under certain restrictions. For instance, in OWL DL a class may be a subclass of many classes, but cannot be an instance of another class. This set of vocabulary provides maximum expressiveness and keeps computational completeness, which means that all conclusions are guaranteed to be computable, and decidability, which means that all computations will finish in finite time. – OWL Full – which has the richest vocabulary but does not guarantee computational completeness. For this reason this type of expressiveness cannot be used in any reasoning software. All OWL types are sublanguages in the following meaning: – Every legal OWL Lite ontology is a legal OWL DL ontology: (OWL Lite ⊂ OWL DL). – Every valid OWL Lite conclusion is a valid OWL DL conclusion: (OWL Lite conclusion ⊂ OWL DL conclusion). – Every legal OWL DL ontology is a legal OWL Full ontology: (OWL DL ⊂ OWL Full ). – Every valid OWL DL conclusion is a valid OWL Full conclusion: (OWL DL conclusion ⊂ OWL Full conclusion). In terms of requirements taken in this Thesis, ontology is to be used for the purpose of decision support. This is a kind of an expert system that, on the basis of the knowledge base can define which events should be treated as suspicious. For this purpose it is proposed to use description logic (DL), which enables to represent knowledge formally. Typical reasoning tasks for OWL DL are decidable. Due to that, it is used in artificial intelligence for formal reasoning and greatly supports application of ontologies in computer engineering (e.g. for medical knowledge) and Semantic Web [BHS05]. With OWL one can define classes and properties of those classes. Every class is a descendant of owl:Thing. Classes are defined using owl:Class construct, e.g. <owl:Class rdf:ID="MilitaryRanks"/>. We can define two classes as equivalent with owl:equivalentClass <owl:Class rdf:ID="Major"> <owl:equivalentClass rdf:resource="#OF3"/> </owl:Class> B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(27) 27. 4.2. Semantic models. Subsumption relationship among classes is expressed with rdfs:subClassOf. <owl:Class rdf:ID="Major"> <rdfs:subClassOf rdf:resource="#Officer"/> </owl:Class>. We can also define disjointness of classes (owl:disjointWith) and collections of objects (owl:oneOf). <owl:Class rdf:about="Major"> <owl:disjointWith rdf:resource="Lieutenant Colonel"/> </owl:Class>. Attributes of objects are called properties. These are: – datatype properties – attributes that specify class features by means of data (XSD datatype), – object properties – attributes that define relationship between classes (Relations).. Table 4.1: OWL axioms Axiom. DL Syntax. Example. subClassOf. C1 v C2. Human v Being u Biped. equivalentClass. C1 ≡ C2. Man ≡ Human u Male. disjointWith sameIndividualAs. C1 v ¬C2 {x1 } ≡ {x2 }. Male v ¬Female {Marcin Szpyrka} ≡ {Prof . Szpyrka}. differentFrom. {x1 } v ¬{x2 }. subPropertyOf. P1 v P2. hasPhDStudent v hasStudent. equivalentProperty. P1 ≡ P2. Maj . ≡ OF3. inverseOf. P2−. P1 ≡. transitiveProperty. P+ v P. functionalProperty. T v ≤ 1P. inverseFunctionalProperty. T v ≤ 1P −. {Marcin Szpyrka} v ¬{Bartosz Jasiul }. hasPhDStudent ≡ hasSupervisor − supervisor + v supervisor T v ≤ 1 hasSupervisor T v ≤ 1 hasPhDStudent −. Each property has its domain – that defines originating class and range – that defines the target class. For datatype properties range is an XSD datatype. For object properties domain and range is a class. It can be the same class (e.g. hasSupervisor(x,y) where x and y are persons). OWL enables to express both hierarchy of classes and relationships. Therefore object properties can have subsumption. Moreover, transitions can be defined as transitive, symmetric and functional (owl:TransitiveProperty, owl:SymmetricProperty, owl:FunctionalProperty). OWL allows to define instances of classes as individuals. Vocabulary of OWL DL has been shown in Tables 4.1 and 4.2. In particular, axioms of OWL DL are presented in Table 4.1 and class constructors in Table 4.2 respectively. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(28) 28. 4.2. Semantic models. Table 4.2: OWL class constructors Constructor. DL Syntax. Example. Model Syntax. intersectionOf. C1 u ... u Cn. Human u Male. C1 ∧ ... ∧ Cn. unionOf. C1 t ... t Cn. Major t Professor. C1 ∨ ... ∨ Cn. ¬C. ¬Major. ¬C. {x1 } t ... t {xn }. {Marcin} t ... t {Bartosz }. x1 ∨ ... ∨ xn. allValuesFrom. ∀P.C. ∀hasPhDStudent.Professor. [P ]C. someValuesFrom. ∃P.C. ∃hasSupervisor .Major. hP iC. maxCardinality. ≤ nP. ≤ 1 hasSupervisor. [P ]n+1. minCardinality. ≥ nP. ≥ 3 hasPhDStudent. hP in. complementOf oneOf. In a simple case OWL DL is ALC – Attribute Language with Complement and enables to build the following concepts and relationships between concepts (see Table 4.3). These concepts allow to build TBox entries – terminological axioms. An example of using this expressiveness has been shown in Table 4.4, which presents appropriate ALC axioms in contrary to the example from Table 4.5. When information about disjointness of classes is added, one can also deduce on types of individuals (see Table 4.5). OWL DL can have extended expressiveness which is defined as SHOIN . It allows to define the vocabulary for concepts, individuals and roles as presented in Table 4.6. Moreover, SHOIN provides cardinality restrictions. This allows to use data values in the argument of particular role and close class as well as particular domain.. Table 4.3: ALC : Building concepts and stating relationship between concepts [HHK+ 06] Constructor. Description. C uD. individuals in C and D. C tD. individuals in C or D. ¬C. individuals not in C. ∃R.C. individuals with some relation R to C. ∀R.C. individuals with all relations R to C. CvD. all individuals of C are also in D. C≡D. the individuals of C and D are the same. An example of logical reasoning based on DL can be seen in Table 4.4. It is based on Open World Assumption (abbr. OWA) which states that anything might be true unless it can be proven false. Therefore, everything we do not know is undefined. It is contradictory to the Closed World Assumption that refers to everything we do not know is false. According to OWA, in Table 4.5 before we entered fact about disjointness of classes, we could have make a logical mistake. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(29) 29. 4.2. Semantic models. Table 4.4: Exemplary terminological axioms – TBox entries Constructor. Description. Conference v Event. Every conference is an event.. Conference v ∀participant.Person. Each participant of a conference is a person.. Person v Female t Male. Persons are female or male.. cisim2013 : Conference. CISIM2013 is a conference.. cisim2013 : bartoszjasiul. Bartosz Jasiul participates in CISIM2013.. bartoszjasiul : Person. Bartosz Jasiul is a person.. Table 4.5: Missing disjointness – TBox entries Constructor. Description. Conference v Event. Every conference is an event.. Conference v ∀participant.Person. Each participant of a conference is a person.. (cisim2013 , bartoszjasiul ) : paricipant. Interesting – CISIM2013 participates in Bartosz Jasiul.. It is not a contradiction in case CISIM2013 is a person. cisim2013 : Person. Curiously – CISIM2013 is a person.. What is missing? Person v ¬Event. Person is not an event.. cisim2013 : Conference. CISIM2013 is a conference.. Table 4.6: OWL DL: SHOIN and particular domains (based on [HHK+ 06]) Concepts ALC. Boolean operators: u, t, ¬, ∀R, ∃R. N. Number restrictions. ≥ 3 has_phdstudent; ≤ 1 has_supervisor. Q. Qualified number restriction. ≥ 3 has_phdstudent.Professor. O. Nominals. {marcin, bartosz, ula, adam}. Individuals ≈. Same. bartosz ≈ b_jasiul. 6≈. Different. bartosz 6≈ marcin. H. Subrole hierarchy. has_professor v has_supervisor. I. Inverse roles. has_supervisor− v has_phdstudent. S. (ALC +) roole transitivity. Trans(has_supervisor). Roles. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(30) 30. 4.3. Rules. 4.3. Rules OWL expressiveness is limited, however. Given DL roles: parent, brother and uncle, one cannot describe their exact relationship, i.e. Someones’ uncle is the brother of their parent in OWL. The TBox statements define properties about entities, however, they cannot define conditional statements, e.g. If a Student studies Maths then he is a Maths Student. For this purpose it is recommended to use rules and rule engines that allow for adding certain facts to the knowledge based on existing axioms. Rules are of the form of an implication between an antecedent(body) and consequent(head). Their meaning can be read as: whenever the conditions specified in the antecedent hold, then the conditions specified in the consequent must also hold. In relatively informal human readable format: antecedent(body) -> consequent(head).. Both the antecedent(body) and consequent(head) may consist of zero or more atoms. An empty antecedent is treated as trivially true (i.e. satisfied with every interpretation), so the consequent must also be satisfied with every interpretation. When a consequent is empty, it is treated as trivially false (i.e. not satisfied with any interpretation), so the antecedent must also not be satisfied with any interpretation. When both antecedent and consequent are conjunctions of 1 − n atoms the rule takes the following form: a1 ∧ ... ∧ an . Variables are indicated using the standard convention of prefixing them with a question mark (e.g., ?x). Using this syntax, there can be defined a rule asserting that if a parent (x2) has a child (x1) and a brother (x3), the brother is an uncle to the child, i.e.: hasParent(?x1,?x2) ^ hasBrother(?x2,?x3) -> hasUncle(?x1,?x3).. The rules can be defined using a few formal languages, e.g. Jess rule language [Jes], JessML [SWR], RuleML (Rule Markup Language) [BAP+ 12], SWRL (Semantic Web Rule Language) [HPSB+ 04a]. Due to the easiness of defining and processing rules in SWRL, this language has been selected to be used in the Thesis. It uses the human-readable syntax as presented above together with the abstract and XML syntax. It is supported by software components and can be used in real-life scenario. The abstract syntax example would be: Implies(Antecedent(hasParent(I-variable(x1) I-variable(x2)) hasBrother(I-variable(x2) I-variable(x3))) Consequent(hasUncle(I-variable(x1) I-variable(x3)))).. SWRL uses ontology vocabulary (classes and properties), however, it has also enhanced possibilities defined in so called built-ins. They are predicates that take one or more arguments and evaluate to true if the arguments satisfy the predicate. For example, an equal built-in can be defined to accept two arguments and return true if the arguments are the same. A number of core built-ins for common mathematical and string operations are contained in the SWRL Built-in Submission [HPSB+ 04b]. These are e.g. – – – – – –. swrlb:equal, swrlb:notEqual, swrlb:lessThan, swrlb:lessThanOrEqual, swrlb:greaterThan, swrlb:greaterThanOrEqual.. B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.

(31) 4.4. Ontology applications. 31. OWL DL with SWRL allow to define efficient models that have very rich expressiveness, however, in order to stay decidable, restrictions on SWRL rules must be applied. These are so called DL-safe rules, which are SWRL rules that are restricted to known individuals [HHK+ 06]. With the use of rule engines new (inferred) knowledge is put into working memory that stores all facts and axioms. In order to retrieve it from KB, it is necessary to use special language – Semantic Query-enhanced Web Rule Language (SQWRL) – built on the SWRL rule language [HPSB+ 04a]. The syntactics of SQWRL is similar to SWRL. It takes a standard SWRL antecedent and treats it as a pattern for query. SQWRL provides the possibility for basic querying, for sorting the results, but also enables to calculate some functions (e.g. count individuals sqwrl:count(?p), calculate average sqwrl:avg(?age), etc.). It also uses special built-ins (sqwrlb) that allow to define, for instance, equa-. lity functions: swrlb:lessThan, sqwrlb:greaterThan. The following examples show some of the possibilities of SQWRL: Person(?p) ^ hasAge(?p,?a) ^ swrlb:lessThan(?a,18) -> sqwrl:select(?p,?a) Person(?p) -> sqwrl:count(?p) (hasChild>=1)(?i) -> sqwrl:select(?i).. 4.4. Ontology applications This Section is focused on presentation of ontology-enabled approaches found in related work in different domains. Ontologies were utilized for many different practical and scientific purposes [UG96], [HKST06], [FBD96] and only limited list of them has been outlined in this Section. One of the possible usages of ontologies is their application to provide common understanding of one domain from different viewpoints. This can be assisted by an example of requirements establishing, which is important in every project focused on development of a particular product. This is crucial in military domain, where meeting rigorous requirements may decide on someone’s life. The problem of perception of particular domain described from one perspective was discussed in technical [MKA04], [HKST06] and scientific documents [LFB96], [Gua98], [Gru93]. Ontology was proposed as a framework for information and knowledge sharing among participants taking part in delivery of a product to the customer. Domain of user requirements might be differently perceived by stakeholders taking part in designing, prototyping, developing, testing as well as advertising the expected product. Different understanding of the user domain may lead to ambiguous and incomplete specification on each level of development [HS06]. An ontology was proposed to be used for both, to describe requirements specification documents, and to formally represent requirements knowledge in the form of domain model specified with the use of normative or formal languages [WDVP00]. Ontologies were also identified as useful to describe the functionality of various components for their reuse or migration between different domains. The adaptation of ontologies were proposed e.g for software engineering repositories [CD00], [MMM98] in order to provide developers with semantic queries to search for components that already exist. In such a way they can save time, avoid additional work and improve the software quality in contrary to repositories that are limited to a key-word based search engines that suffer from low precision. In similarly way, ontologies may be adapted to different workflow B. Jasiul. Modeling of Selected Cyber Threats with Ontology and Petri Nets.