Index of /rozprawy2/10885

Pełen tekst

(1)University of Science and Technology in Kraków Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering. P H D D ISSERTATION. K NOWLEDGE F ORMALIZATION M ETHODS FOR S EMANTIC I NTEROPERABILITY IN RULE BASES. AUTHOR : Krzysztof Kaczor. S UPERVISOR : Grzegorz J. Nalepa, Ph.D.. Kraków 2014.

(2) Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie Wydział Elektrotechniki, Automatyki, Informatyki i In˙zynierii Biomedycznej. ROZPRAWA D OKTORSKA. M ETODY FORMALIZACJI OPISU I PRZEKŁADU BAZ WIEDZY REGUŁOWEJ. AUTOR : Krzysztof Kaczor. P ROMOTOR : dr hab. inz˙ . Grzegorz J. Nalepa. Kraków 2014.

(3) iii. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(4) iv. Ps 138. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(5) Abstract This dissertation concerns efficient methods for translation of knowledge that is expressed with the help of rules. This powerful and declarative knowledge representation is simultaneously transparent and easy to understand. Therefore, the application field of rules continues to expand. In recent years, they are used in business area where they allow for defining or constraining aspects of the business operations as the socalled business rules. They are also used as complementary method of knowledge representation in semantic web or business processes management. The variety of different rule applications resulted in emergence of distinct rule representations having distinct nature, assumptions and expressive power. This significantly reduces possibilities of knowledge sharing over representations as the usage of naive translation methods leads to semantic mismatch. This problem is known since classic expert systems and it is referred to as the rule interoperability problem. The main goal of the work presented in this dissertation is to provide an interoperability method for production rules that will preserve knowledge semantics. In other words, after knowledge base translation the target system allows for inferring the same conclusions as the original one. In order to reach this goal, the dissertation makes several original contributions. First of all, it provides a deep analysis of the selected production rule languages in terms of the elements determining language expressiveness. This analysis is followed by the definition of the formalized model of production rule representation that is based on the attributive logic. This model precisely defines interpretation of all selected elements and identifies those having the same semantics. Finally, an interoperability method for the selected languages is defined in terms of this model. In order to show the correctness of the proposed method, the theoretical and practical evaluation is provided. The theoretical one is based on the operational semantics of rules and involves comparison of this semantics before and after translation. In turn, the practical evaluation presents translation of the selected use case examples and comparison of results of their executions. As a result, both evaluations confirm that the knowledge translated with the help of the proposed method allows for inferring the same conclusion as the original system.. v.

(6) Streszczenie Przedstawiona rozprawa doktorska dotyczy efektywnych metod wymiany wiedzy regułowej. Reguły jako deklaratywna metoda reprezentacji wiedzy posiadaja˛ duz˙ a˛ ekspresywno´sc´ przy jednoczesnym zachowaniu przejrzysto´sci i łatwo´sci interpretacji. Dlatego tez˙ , znajduja˛ one zastosowanie w wielu nowych obszarach. W ostatnich latach reguły sa˛ uz˙ ywane w s´rodowisku biznesowym do definiowania sposobów funkcjonowania przedsi˛ebiorstwa gdzie sa˛ nazywane regułami biznesowymi. Uz˙ ywane sa˛ one takz˙ e jako komplementarna metoda reprezentacji wiedzy w projekcie sieci semantycznej czy tez˙ w procesach biznesowych. Duz˙ a róz˙ norodno´sc´ zastosowań reguł spowodowała powstanie wielu odr˛ebnych reprezentacji majacych ˛ róz˙ na˛ natur˛e, bazujacych ˛ na rozbiez˙ nych załoz˙ eniach i posiadajacych ˛ róz˙ noraka˛ ekspresywno´sc´ . Róz˙ nice te powoduja,˛ z˙ e tłumaczenie wiedzy pomi˛edzy takimi reprezentacjami nie jest zadaniem trywialnym a zastosowanie naiwnych algorytmów translacji (np. syntaktycznych) najcz˛esćiej prowadzi do semantycznego niedopasowania wiedzy przed i po tłumaczeniu. Problem ten został rozpoznany juz˙ w czasach klasycznych systemów ekspertowych i jest nazywany problemem wymiany wiedzy regułowej. Głównym przedmiotem rozprawy jest opracowanie metody wymiany wiedzy, opartej o reguły produkcyjne, która pozwoli na uwzgl˛ednienie jej semantyki. Innymi słowy, rezultatem tłumaczenia wykonanego przy pomocy takiej metody ma być wiedza pozwalajaca ˛ na wyciaganie ˛ tych samych konkluzji co wiedza z´ ródłowa. W trakcie badań b˛edacych ˛ przedmiotem tej rozprawy, wypracowano szereg rezultatów stanowiacych ˛ oryginalny wkład do rozwiazania ˛ rozwaz˙ anego problemu. Na wst˛epie przeprowadzono szczegółowa˛ analiz˛e wybranych j˛ezyków regułowych pod katem ˛ zidentyfikowania elementów decydujacych ˛ o ich ekspresywno´sci. Analiza ta poprzedziła definicj˛e sformalizowanego modelu reprezentacji wiedzy bazujacego ˛ na logice atrybutowej. Model ten pozwolił na precyzyjne okre´slenie interpretacji zidentyfikowanych elementów i rozpoznanie tych, które posiadaja˛ taka˛ sama˛ semantyk˛e. Prace te były kluczowe dla opracowania skutecznej metody wymiany wiedzy, która została stworzona na bazie otrzymanego modelu. W celu wykazania słuszno´sci proponowanego podej´scia, przeprowadzona została teoretyczna i praktyczna ewaluacja zdefiniowanej metody. Ewaluacja teoretyczna polega na porównaniu semantyki operacyjnej reguł przed i po translacji wiedzy. Z kolei, praktyczna pokazuje tłumaczenie bazy wiedzy przykładowego systemu regułowego do wybranych j˛ezyków i porównuje rezultaty otrzymane po ich uruchomieniu. Jako rezultat przeprowadzonej analizy omówiono wyniki, s´wiadczace ˛ o słuszno´sci proponowanego podej´scia. vi.

(7) Contents 1 Introduction. 1. 1.1. Motivation, Scope and Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.2. Goal and Plan of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 1.3. Original Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.4. Exclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2 State of the Art in Rule Representation 2.1. 2.2. 2.3. 7. Knowledge Representation with Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.1.1. Selected Knowledge Representation Methods . . . . . . . . . . . . . . . . . . . . . .. 7. 2.1.2. Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 2.1.3. Production Systems Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. 2.1.4. Business Rules Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12. 2.1.5. Rules on Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15. 2.1.6. Rules in Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16. Formalization of Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17. 2.2.1. Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18. 2.2.2. First-Order Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 2.2.3. Common Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21. 2.2.4. Description Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 22. 2.2.5. Attributive Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 2.2.6. F-L OGIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26. 2.2.7. Modal Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27. Knowledge Engineering Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27. 2.3.1. Problem Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27. 2.3.2. Knowledge Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28. 2.3.3. Knowledge Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. 2.3.4. Inference Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. 2.3.5. Knowledge Verification and Validation . . . . . . . . . . . . . . . . . . . . . . . . . .. 32. vii.

(8) viii. CONTENTS. 2.3.6 2.4. Knowledge Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33. Rule Interoperability Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 34. 2.4.1. Knowledge Interchange Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36. 2.4.2. Rule Interchange Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36. 2.4.3. Production Rule Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 38. 2.4.4. Rule Markup Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39. 2.4.5. REWERSE Rule Markup Language . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 3 Languages for Production Rules. 41. 3.1. Important Features of Rule Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41. 3.2. Polish Liability Insurance Use Case Example . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 3.3. CLIPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44. 3.4. Jess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50. 3.5. D ROOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 54. 3.6. XTT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60. 3.7. Comparison of Rule Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66. 3.8. 3.7.1. CLIPS versus J ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67. 3.7.2. CLIPS versus D ROOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67. 3.7.3. J ESS versus D ROOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68. 3.7.4. All versus XTT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68. Existing Approaches to Translation of the Selected Rule Languages . . . . . . . . . . . . . .. 69. 4 Model of Production Rule Representation. 73. 4.1. Multilevel Approach to Rule Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74. 4.2. Definition of the Formalized Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 76. 4.3 K. Kaczor. 4.2.1. Data Types and Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. 4.2.2. Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 87. 4.2.3. System State and Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. 4.2.4. Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 93. 4.2.5. Taxonomy of Formulae and Operators . . . . . . . . . . . . . . . . . . . . . . . . . .. 94. 4.2.6. Semantics of Formulae and Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 103. 4.2.7. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119. 4.2.8. Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124. 4.2.9. Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(9) ix. CONTENTS. 5 Model-based Knowledge Translation. 133. 5.1. Identification of Semantically Equivalent Features . . . . . . . . . . . . . . . . . . . . . . . . 133. 5.2. Translation of Rule Base Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142. 5.3. 5.2.1. Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143. 5.2.2. Rule Level Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145. 5.2.3. Submodules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156. 6 Evaluation of the Approach. 158. 6.1. Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158. 6.2. Definition and Translation of the Formal Model of the PLI Use Case . . . . . . . . . . . . . 158 6.2.1. Definitions of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159. 6.2.2. Initial State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162. 6.2.3. Definitions of Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165. 6.2.4. Modules Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167. 6.3. Implementation of Translation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177. 6.4. Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.4.1. Summary of the Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 180. 6.4.2. Identified Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184. 6.4.3. Achieved Goals of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185. 7 Concluding Remarks and Future Work. 186. 7.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186. 7.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187. A Syntax of rule languages. 189. A.1. XML Schema of the Model Concrete Synatx . . . . . . . . . . . . . . . . . . . . . . . . . . . 189. A.2. CLIPS Syntax in BNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 A.2.1 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 A.2.2 Variables and Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.2.3 Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.2.4 deffacts Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.2.5 deftemplate Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.2.6 Fact Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 A.2.7 defrule Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 A.2.8 defglobal Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(10) x. CONTENTS. A.2.9 deffunction Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 A.2.10 defgeneric Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 A.2.11 defmethod Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 A.2.12 defclass Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 A.2.13 defmessage-handler Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 A.2.14 definstances Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 A.2.15 defmodule Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 A.2.16 Constraint Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 A.3. J ESS Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 A.3.1 deffacts Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 A.3.2 deffunction Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 A.3.3 defglobal Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 A.3.4 defmodule Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 A.3.5 defquery Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 A.3.6 defrule Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 A.3.7 deftemplate Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204. B Complete Models of the Selected Use Cases B.1. B.2. Model of PLI Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 B.1.1. Formal Model of System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205. B.1.2. Model in CLIPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213. B.1.3. Model in J ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219. B.1.4. Model in D ROOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226. B.1.5. Model in XTT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233. Model of UserV Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 B.2.1. Formal Model of System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246. Bibliography. K. Kaczor. 205. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases. 271.

(11) List of Abbreviations AI. –. Artificial Intelligence. AL. –. Attributive Logic. ALSV(FD). –. Attributive Logic with Set of Values over Finite Domain. BP. –. Business Process. BPM. –. Business Process Management. BR. –. Business Rule. BRA. –. Business Rules Approach. CEP. –. Complex Event Processing. CL. –. Common Logic. CLIF. –. Common Logic Interchange Format. CWA. –. Closed World Assumption. DL. –. Description Logic. DRL. –. D ROOLS Rule Language. DSS. –. Decision Support Systems. ES. –. Expert System. FOL. –. First Order Logic. H ALVA. –. H E K AT E Verification and Analysis. H A DE S. –. H E K AT E Design Environment. H EA RT. –. H E K AT E RunTime. H E K AT E. –. Hybrid Knowledge Engineering Project. HMR. –. H E K AT E Meta Representation. HQE D. –. H E K AT E Qt Editor. KIF. –. Knowledge Interchange Framework. KRR. –. Knowledge Representation and Reasoning. LHS. –. Left Hand Side. MOF. –. Meta-Object Facility. NAF. –. Negation As Failure. NLP. –. Natural Language Processing. OCL. –. Object Constraint Language. OMG. –. Object Management Group. OWA. –. Open World Assumption. OWL. –. Web Ontology Language.

(12) PL. –. Propositional Logic. PRR. –. Production Rule Representation. RBS. –. Rule-Based Systems. RDF. –. Resource Description Framework. RDFS. –. RDF Schema. RHS. –. Right Hand Side. R2ML. –. REWERSE Rule Markup Language. RIF. –. Rule Interchange Framework. RULE ML. –. Rule Markup Language. S BVR. –. Semantic Business Vocabulary and Business Rules. SKE. –. Semantic Knowledge Engineering. SW. –. Semantic Web. SWRL. –. Semantic Web Rule Language. UML. –. Unified Modeling Language. XTT2. –. eXtended Tabular Trees, Version 2.

(13) Chapter 1. Introduction This chapter introduces the reader to the dissertation. It is divided into four sections. Section 1.1 briefly describes background of the research problem and shows why it is considered as important issue in the field of Computer Science. Moreover, it defines the scope of the thesis. Section 1.2 specifies the main goal of the described research and presents the steps that were made in order to reach it. Section 1.3 emphasizes those results of the thesis that are considered as the original contribution. Section 1.4 discusses issues that deliberately are not addressed in this dissertation.. 1.1. Motivation, Scope and Research Problem. Artificial Intelligence [141] is the field of Computer Science that aims at studying and designing intelligent systems. Since seventies of the 20th century, a number of successful paradigms for the engineering of such systems have been developed. The most important of them are focused on decision support with Expert Systems (ES) [51, 83]. Among the existing types of expert systems, the rule-based ones proved to be most successful. This is mainly due to the fact that rules allow for powerful and declarative specification of the knowledge that is simultaneously easy to represent and understand. Currently, application field of rules is still expanding. In recent years, they are used in business area where they allow for defining or constraining aspects of the business operations as the so-called Business Rules (BR) [156]. They are also used as complementary method of knowledge representation in Semantic Web (SW) [4] or Business Processes Management (BPM) [42]. The variety of different rule applications resulted in emergence of distinct rule representations having distinct nature, assumptions and expressive power. The first effort to classification of rule types was made by Lig˛eza in [83]. Currently, many classifications can be found in the literature. An exemplary one, provided by the RULE ML organization, is depicted in Figure 1.1. Together with different representations, disparate tools allowing for modeling, verifying and performing inference in particular representations have been de1.

(14) 1.1. Motivation, Scope and Research Problem. 2. Figure 1.1: RULE ML classification of rules [123]. veloped. Nevertheless, differences between representations cause that rules expressed in one representation may have different semantics in another one. This significantly reduces possibilities of knowledge sharing over representations as the usage of naive1 translation methods leads to semantic mismatch. This problem is known since classic expert systems and still remains unsolved. It is referred to as the rule interoperability. problem and this thesis proposes solution for it in the context of production rules. Research related to the interoperability problem is still an active area [161, 160, 31]. The most commonly used approaches assume development of an intermediate rule representation format with a well-defined semantics and improved expressiveness which allows for expressing rules in different representations. Using this format, the interchange between two representations is performed in two steps: firstly, the knowledge is translated from the source representation to an intermediate format, later, from this format to the second (target) representation. Nevertheless, there are three major challenges that must be overcome in order to define an efficient interoperability method: • Diversity of requirements – existing rule representations were applied in many areas and therefore provide different knowledge representations, reasoning capabilities and expressive power. For example, rules used in Decision Support Systems (DSS) [72, 58] usually share the Closed World Assumption (CWA) [93] and are represented as production rules allowing for non-monotonic reasoning in FirstOrder Logic (FOL). In turn, rules used in SW follow the Open World Assumption (OWA) and are based on the Description Logics [7] allowing only for monotonic reasoning. Such differences are crucial and cannot be neglected by interoperability method. In general, they have significant impact on the knowledge semantics. 1. Methods that take only syntax into account and neglect the semantics.. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(15) 1.2. Goal and Plan of the Work. 3. • Rigorous definition of semantics – according to [74], interoperability requires formal definition of the knowledge semantics. The formalized semantics must be provided by both rule representation as well as rule interoperability methods. Otherwise, informal semantics may lead to ambiguities during translation and semantic mismatch at the end. • Trade-off between expressiveness and tool support – more expressive method is, on one hand, able to support translation between more different representations, but on the other, is more difficult to implement supporting tools. In general, according to [158], due to the great diversity of rule representations, no single interoperability method is able to support interchange between all of them. Instead of this, many of the modern approaches support only selected representations or provide dialects which are dedicated for a certain cluster containing similar rule languages. The work presented in this dissertation tries to overcome the first challenge by focusing only on knowledgebased systems that share the same assumptions. Therefore, the thesis considers only rule-based systems that use production rules, forward chaining inference mode and adopt closed world assumption. This selection was made due to the wide capabilities of practical application of this type of knowledge-based systems2 . In turn, the last two challenges constitutes the motivation for this thesis as they may bring several profits e.g. underlying formal model of knowledge representation for rule languages being only programming solutions, unified definition of their semantics, or automated translation methods preserving original semantics.. 1.2. Goal and Plan of the Work. The main goal of the work presented in this thesis is to provide an interoperability method for production rule representations that will preserve knowledge semantics during translation. Knowledge semantics is understood as the meaning of the entire knowledge base and not only as the semantics of all individual knowledge elements (rules, facts, etc.). It is assumed that two rule bases, expressed in different representations, have the same semantics3 if for a given initial state both production systems infer the same conclusion. It is assumed that the goal of this work can be reached in three steps: 1. Definition of the formalized and generalized model of production rule representation. 2. Model-based formulation of the rule base semantics provided by the considered representations. 3. Translation of the model-based rule base to the selected rule representations. These steps are discussed in details in the further part of the thesis that is organized as follows: Chapter 2 provides an overview of the existing methods and tools dedicated for rule-based knowledge representation. 2. Currently, many different implementations of such systems are available (see Section 2.1.2 and 3), and what is more, they are. used in real-life applications. 3 The semantics of a knowledge base corresponds in fact to the so-called operational semantics as presented in [168] and describes changes of a fact base after rules application considered from the user perspective.. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(16) 1.3. Original Contribution. 4. It includes issues related to different application areas of rules in Section 2.1, formalization of rule-based knowledge representation in Section 2.2, knowledge translation in Section 2.4 and describes the complete knowledge engineering process in the context of rules in Section 2.3. Selected rule representations are discussed in Chapter 3. In Section 3.1, this chapter identifies a set of general aspects (features) of the production rule languages that play a crucial role from this thesis point of view as they have an impact on rule base semantics. The short characteristics of each of the selected representations and tools are provided in sections from 3.3 to 3.6 and are based on the use case study introduced in Section 3.2. Section 3.8 is the last section in this chapter and provides state of the art concerning the existing rule interoperability methods that involves the considered rule representations. The features of rule languages identified in Chapter 3 are the principal aspects that must be taken into account by the rule representation model. The definition of this model is provided in Chapter 4 which corresponds to the above mentioned step 1. This chapter starts with an overview of the proposed approach in Section 4.1 while within its remaining part, from Section 4.2 to Section 4.3, the formalized production rule representation model is defined. The model is crucial from the point of view of a definition of a rule interoperability method. It is used in Chapter 5 for accomplishment of the two remaining steps. Section 5.1 corresponds mainly to step 2 and provides alignment of the semantically equivalent elements of the model and rule representations. This section also raises issues related to the step 3 as the semantically equivalent elements can be interchanged in a direct way. In turn, Section 5.2 discusses issues related only to the step 3 and describes how to translate the remaining aspects of the rule base preserving their semantics. The evaluation of the proposed approach is presented in Chapter 6. Section 6.1 introduces procedure that describes how the evaluation was performed. In Section 6.2 of this chapter, the most important results of the rule base translation from the model to selected representations are described with the help of the PLI case study. This section is focused mainly on the way how the translated knowledge is processed within each representation and what are the results of inference. In turn, Section 6.3 discusses issues related to the practical implementation of the tool supporting proposed approach. The most important results of the evaluation are summarized in Section 6.4. Chapter 7 finishes this dissertation providing conclusions and short summary of the presented work.. 1.3. Original Contribution. The approach to rule interoperability discussed in this dissertation is superior to state of the art in several aspects. The following six issues are considered to be the original contribution of this thesis: 1. Extension of Attributive Logic – This work significantly extends the attributive logic model, originally proposed by Lig˛eza in [85] (see Section 2.2.5 for details). Important extensions towards production K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(17) 1.3. Original Contribution. 5. rule systems include object-oriented type system with polimorphism and truth maintenance4 . For full definition of the model see Section 4.2. 2. Identification of rule languages features related to knowledge semantics – In order to define efficient interoperability methods for the considered rule languages, the elements of these languages that determine knowledge semantics must be identified. Therefore, Chapter 3 presents an analysis of the selected languages in terms of such features. 3. Multilevel semantics of rule base – The proposed model provides a rigorous definition of knowledge semantics. Moreover, the proposed approach captures the semantics of the whole rule base. Whereas, the existing approaches focus on single rules. Therefore rule base semantics can be treated as an additional abstraction level of knowledge base semantics that must be taken by interoperability method into account. All the considered abstraction levels are discussed in details in Section 4.1. 4. Formalization of rule languages semantics – Existing methods for rule interoperability are usually based on some formalism that precisely defines their semantics. Nevertheless, these methods cannot be efficiently applied to representations having informal semantics due to the risk of ambiguities and semantic mismatch after translation. Unfortunately, many of production rule representations do not provide any underlying formalism but only rule-based programming language. Hence, Section 5.1 provides a precise definition of semantics of the selected rule languages in terms of the model. 5. Definition of semantics-preserving model-based translation – Thanks to the unified logic-based formalization of the rule languages and rule representation model, the knowledge translation method that preserves its semantics was developed. This method is dedicated for production rule systems that use CWA and forward chaining inference. For more details concerning this method see Chapter 5. 6. Evaluation of translated cases – In order to evaluate the proposed approach, two non-trivial use cases were expressed in terms of the model and later translated into the selected rule languages. Chapter 6 selects one of these use cases and describes how the translation was performed. Moreover, it presents results of executions of the received models in all selected rule languages and compares these results with other methods. Besides these six elements, the proposed work introduces other aspects that are novel in the context of production rule interoperability e.g. the proposed model defines a concept of dynamic constraint or provides precise and flexible definition of the truth maintenance mechanism.. 4. This dissertation extends ALSV(FD) logic that was invented for formalization of rule-based systems. This logic is selected. because of intuitiveness and simplicity of its notation and semantics. However, more sophisticated logics like presented in [32], [41] and [45] can also be considered.. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(18) 1.4. Exclusions. 1.4. 6. Exclusions. Rule-based technologies involve many different aspects starting from different formalisms of rule representation through inference modes to logical verification issues. This thesis is focused on the rule interoperability methods that are mainly related to the way how the knowledge is represented. Therefore, aspects related to modeling [102] or verification [91] of rule bases are not significant in this context and are not discussed. Moreover, this dissertation also omits problems of knowledge processing including type of inference algorithms [11] or conflict set resolution strategies [52]. As it was mentioned, it is focused on the forward chaining systems as the backward chaining inference mode is hardly applicable to production rules. The formalization of knowledge representation is a very important issue in the context of rule interoperability. Currently, a variety of such formalisms based on different logics exist. Therefore, it is practically impossible to provide an efficient interoperability method for all of them [158]. This thesis is focused on the attributive logic-based representation. In particular it does not consider modal, fuzzy or temporal logics as well as does not support inconsistent knowledge. Thus, the proposed interoperability approach cannot be applied to technologies like Semantics of Business Vocabulary And Business Rules (S BVR) [121], Complex Event Processing (CEP) [86], etc. Another issue of knowledge representation concerns different types of rules. The classification proposed by the RULE ML organization rule (see Figure 1.1) introduces five primary classes of rules [123]. Due to the widest application area of production rules and its specific nature, this thesis takes only them into account. The work described in this dissertation is supported by the S A MUR A I research Project funded from NCN (National Science Center) resources for science according to decision no. DEC2011/03/N/ST6/008865 . The author was also involved in the number of other projects that allowed him to get knowledge related to rule-based systems and gain its experience in this area: • 2008-2009 – H E K AT E (MNiSW N516 024 32/2878): Hybrid Knowledge Engineering6 . • 2009-2011 – R EBIT (POIG 1.3.1): Business and Technological Rules Management7 . • 2009-2015 – INDECT (FP7-218086, FP7: Collaboration/Security): Intelligent Information System Supporting Observation, Searching, and Detection for Security of Citizens in Urban Environment8 . • 2010-2012 – BIMLOQ (MNiSW N516 422338): Business Models Optimization for Quality9 . • 2011-2012 – PARNAS (NCN N516 481240): Tools for Inference Control and System Quality Analysis for Modularized Rule-Based Systems10 . 5. See: http://home.agh.edu.pl/~kk/doku.php?id=others:samurai:start See: http://hekate.ia.agh.edu.pl 7 See: http://www.rebit.zarz.agh.edu.pl 8 See: http://indect-project.eu 9 See: http://bimloq.ia.agh.edu.pl 10 See: http://parnas.ia.agh.edu.pl 6. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(19) Chapter 2. State of the Art in Rule Representation Within Artificial Intelligence (AI) the development of Decision Support Systems (DSS) has always been one of primary goals. To support human decision processes an AI system has to be able to represent human knowledge as well as use it to make decisions. Originally, DSS mostly used symbolic knowledge representation methods for storing knowledge. Formalized logic-based reasoning techniques would then be used to processes it. Section 2.1 briefly discusses the existing knowledge representation techniques. It is mainly focused on the rule-based knowledge representation and discusses different fields of rules applications like Expert Systems, Business Rules Approach, Semantic Web or Software Engineering. The very important issue related to symbolic rule representation concerns the ability of automated rule processing. Rules encoded with the help of a given rule language must have a precisely defined semantics in order to be processable in automatic way. In this context, the formalization of the rule representation plays a crucial role because it provides unequivocal interpretation of the rule semantics. Currently, many different logic-based formalisms are used for rule representation. Most important ones are described in Section 2.2. The rule interoperability is the one of many tasks in the whole Knowledge Engineering (KE) process which defines steps for development of a knowledge-based system. Section 2.3 shortly describes all the steps and points the scope where the rule interoperability problem may be located in this process. Finally, Section 2.4 discusses rule interoperability problem in details.. 2.1. Knowledge Representation with Rules. 2.1.1. Selected Knowledge Representation Methods. Knowledge is a theoretical or practical understanding of a subject or a domain [114]. Knowledge can be possessed by people as well as stored in computer systems. If knowledge is intended to be processable by machines, it must have a well defined representation and semantics. Frames, first proposed by Minsky in the 1970s, are used to capture and represent knowledge in a frame7.

(20) 2.1. Knowledge Representation with Rules. 8. based expert system. A Frame is a data structure with typical knowledge about a particular object or concept which is described by collection of slots. Frame-based representation has many advantages. First of all, frames are suitable for visual representation, what make them more transparent and intuitive. Moreover, frame-based knowledge modeling corresponds to object-oriented programming paradigm which offers a natural way of representing the real world within a computer system by using objects. In turn, the main disadvantage of this knowledge representation is related to inheritance mechanism which allow unrestrained overwriting of inherited properties. Semantic Networks (SN) are classic AI knowledge representation technique introduced in 60’ by Quillian. At the beginning, this representation was developed for AI as a way of representing human memory and language understanding. Since then, SN were applied to many problems involving knowledge representation. The structure of the SN consist of nodes and labeled edges which forms a directed graph. Nodes are usually refered as physical objects, situations or concepts whereas edges (also called links) are used to express relationships. SN are very expressive knowledge representation that can be expressed in a visual way. On the other hand, the relations are limited to only binary and their names are not standardized. Besides frames and semantic networks, rules are another classic knowledge representation method. They are known since 70’ and they prove to be one of the most successful KR method in AI. Rules are commonly used because they are transparent and easy to understand. A simple rule can be written using if... then... statement which can be divided into two parts: conditional part if... and conclusion (a.k.a. decision) part then.... The straightforward interpretation of such rule is very intuitive: if the conditions in the first part are satisfied, then the conclusions in the second part must also be satisfied. The wide use of rules caused that many specific representations based on rules have been developed. The differences between them lie in the rule syntax, the way how they should be processed, etc. In general, the main difference between them is related to expressiveness of a specific language. Rules can also be considered as an origin of other knowledge representation called Decision Tables (DTs) [152]. DTs provide a tabular form for grouping and expressing rules and have compact representation suitable for visualization. In the decision table each row corresponds to one rule. Such representation allows for transparent presentation of set of rules. Hence, Decision Tables are commonly used in different types of systems providing support for decision making process like Drools, Matlab, etc. Decision Tables representation technique was intensively developed by Jan Vanthienen ([154, 152, 153, 155, 151]). Decision Trees constitute an another method for rule-based knowledge representation. Decision Trees have the same structure as trees understood as data structure, however their interpretation is different. The nodes of a Decision Tree contain conditional expressions while edges correspond to the value of this expression. The final decision determined by Decision Tree is specified in leafs. The main advantage of Decision Trees is that they constitute convenient method for visual representation of a decision process. The Decision K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(21) 2.1. Knowledge Representation with Rules. 9. Tree-based visual representation facilitates understanding decision process, tracing step-by-step or explaining an obtained decision. In turn, the main disadvantages of this representation are the redundancy of nodes as well as danger of combinatorial explosion of the nodes number. Logics provide a formalized knowledge representation methods. Due to the formalization, they have well-defined semantics and expressive power. Moreover, they allow for development of clear framework enabling uniform knowledge modeling, providing well-defined automated processing and verifiable theoretical characteristics. Logics are very often used to formalization of rule-based representations. Among the existing ones, Propositional Logic, First-Order Logic or Description Logics are most commonly used. All the mentioned methods are used as knowledge representation methods. For each of these representations, the tools allowing for inferencing and drawing conclusions, were developed. In general, there are two main inference strategies: Forward Chaining (Data Driven), which allows for drawing conclusions according to what is currently known and Backward Chaining (Goal Driven), which allows for proving statements in terms of the current knowledge [114]. Some of the mentioned representations are more suitable for forward while another for backward chaining. These representation were widely used in Artificial Intelligence for building domain-specific decision support systems called expert systems.. 2.1.2. Expert Systems. Expert Systems (ESs) [51, 64, 81] are one of the most successful field of AI that emerged in 70’ in 20th century. They are used as a very efficient way of building DSS in a well defined domain (medicine, science, finance, etc.) and are intended to help human expert in solving problem that cannot be easily solved due to their complexity or size. An architecture of an ES consists of two most important components: knowledge base and inference engine. Knowledge base allows for storing acquired knowledge with the help of a selected representation method (rules, frames, etc.). In turn, inference engine makes this knowledge useful because it allows for its processing and thereby solving problems formulated by user. From the existing knowledge representation methods, frames and rules were the most widely used in ESs. Therefore, different rule-based tools in variety of domains were developed [51] e.g.: • Chemistry: CRYSALIS, TQMSTUNE, CLONER, MOLGEN, SECS, SPEX. • Electronics: ACE, IN-ATE, NDS, EURISKO, PALLADIO, REDESIGN, CADHELP, SOPHIE. • Medicine: PUFF, VM, ABEL, AI/COAG, CADUCEUS, ANNA, BLUE BOX, ONCOCIN, GUIDON. • Engineering: REACTOR, DELTA, STEAMER. • Geology: DIPMETER, LITHO, MUD, PROSPECTOR. • Computer Science: PTRANS, BDS, XCON, XSEL, XSITE, TIMM. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(22) 2.1. Knowledge Representation with Rules. 10. It was observed that most of these tools were based on a similar architecture. Based on this observation, a more general approach for building such systems has been developed. This approach provides framework allowing for fast building of rule-based decision support systems which implements the generic (i.e. independent from knowledge) architecture elements. An application of this approach for production rule representation is known as Production Systems Shells [51].. 2.1.3. Production Systems Shells. Production System Shells are frameworks allowing knowledge engineer to build a full rule-based decision support system in an efficient way. The only action that is performed by a knowledge engineer is to encode the knowledge by using provided rule language. After that, the system is ready to perform inference process in order to solve requested queries. This approach notably reduces the time needed for implementation of such systems. Their architecture (see Figure 2.1) provides all necessary mechanisms that work independently from a knowledge. It consists of four main components: Knowledge base – constitutes a repository for storing knowledge in the form of rules and facts. In the beginning, the repository is empty and does not contain any knowledge. During the system implementation, the knowledge is provided by knowledge engineer using a dedicated syntax. Sometimes this component is divided into two subcomponents where one contains only facts while the second contains only rules. Inference engine – provides an algorithm that allows for processing of the encoded knowledge in terms of inference task and requested goal. Such algorithm must be generic and independent from the knowledge domain. Explanation mechanism – allows a knowledge engineer to check how the inference algorithm drew a given conclusion. This is a very useful feature especially when the inference process involves a large number of facts and rules. This feature has also impact on the knowledge quality, because it allows for discovering errors in the knowledge base. User interface – provides a way how the system interacts with a user. It allows for using mentioned components, defining of problems that must be solved by system, etc. One of the most important parts of this component is the rule language that allows for knowledge encoding. Production system shells use production rules [119] as the knowledge representation. The conclusion part of the production rule contains actions that are performed when conditional part is satisfied. Usually, performing an action has impact on a knowledge base by producing (adding) or removing some information. In turn, any change done in the knowledge base must be taken by inference process into account regardless of the inference mode. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(23) 11. 2.1. Knowledge Representation with Rules Domain expert. Knowledge acquisition. Knowledge engineer. Programmer. Knowledge analysis Database. Knowledge base. Fact. Rule: IF-THEN. Inference engine. Explanation facilities. User interface. Expert system. User. Figure 2.1: Architecture of a Rule-based Expert System Shell [114]. Production system shells provides two modes of inference: forward chaining and backward chaining from which forward chaining is the basic strategy. The efficient implementation of the inference algorithm was considered as a challenge in the early rule-based systems where the brute force algorithms were used. This challenge was overcome by R ETE [46] algorithm which was proposed by Forgy in 70’. This class of the decision support systems proved to be one of the most successful branch in the AI. They are known and developed since 80’. There were many different implementations among which OPS5 [20] was the crucial one while CLIPS [51] was the most commonly known.. OPS5. Official Production System (OPS5) [20] was the first computer program implementing idea of. problem-solving processes using a set of condition-action rules as knowledge representation and forward chaining inference engine based on R ETE algorithm [46]. It was implemented in LISP in the late 70’ by Charles Forgy. Thanks to OPS5, rules became the dominant knowledge representation method in expert systems, and OPS5 became popular among expert system developers. Additionally, in some versions of the language, an invoked action may create a new rule what makes the system capable to learn.. CLIPS. C Language Integrated Production System (CLIPS) [51, 135] was developed in 1984 at NASA’s. Johnson Space Center. The main goal of this tool was to facilitate building of expert systems taking high portability, low cost, and easy integration with external systems into account. The inferencing and representation capabilities of CLIPS are similar but more powerful than those in OPS5. Currently, the CLIPS rule language is a multiparadigm programming language that provides support for rule-based, object-oriented and procedural programming. The wide spread and acceptance of the CLIPS tool result in development of K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(24) 2.1. Knowledge Representation with Rules. 12. its new incarnation in the form of J ESS [47]. Thanks to the JAVA language, J ESS can be used for building web-based software that provides reasoning capabilities. Rule-based systems allow for representing knowledge in a declarative way and, what is more, they perfectly reflect a natural process of thinking and reasoning. This is why, rule-based technologies are still one of the most commonly used knowledge representation method. Nevertheless, the classic form of the rule-based systems, that is known from the last century, is not sufficient because of its programming nature. The rule-based tools like CLIPS are more suitable for people having technical skills. This is why, nowadays a new approach to usage rule-based tools is introduced. This approach is called Business Rules Approach and is more accessible to non-experts.. 2.1.4. Business Rules Approach. Nowadays, rules are mainly used in the business understood in a broad sense. In this context, rules are used for defining logical aspects of the business which involve making decision, defining behavior in the given situation, specifying regulations or limitations. Rules that are used in this context are called Business Rules (BR) [2, 156, 57]. There is no single and precise definition of the BR in the literature. In [23] the BRs are described as a statement that defines or constrains some aspect of the business. They are intended to assert business structure or to control or influence the behavior of the business. The use of the rules for describing the way how the business works is currently called Business Rules Approach (BRA) [100, 138, 156]. BRA is a methodology – and possibility a special technology – by which one can capture, challenge, publish, automate, and change rules from a strategic business perspective. The result is a business rules system, an automated system in which the rules are separated, logically and physically, from other aspects of the system and shared across data stores, user interfaces and applications [156]. Within the classic production rule systems, like those described in the Section 2.1.3, only production rules were used. The usage of rules in the business context requires a distinction of different types of rules. This is why BRA provides four types of rules: Production rules correspond to classic production rules that are well known from classic rule-based systems e.g. If it’s raining then the playground is wet. Derivation rules are statements that allow for generating a new knowledge basing on what is currently known e.g. Each Female Australian is a Person who was born in Country ‘Australia’ and has Gender ‘Female’. Event-Condition-Action rules are very similar to production rules but, besides the conditional part, they provide an event part which defines an event that triggers rule for evaluation against satisfaction of their conditional part e.g. If it stops raining and there is a weekend then I go play ball. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(25) 2.1. Knowledge Representation with Rules. 13. Constraints can be considered as rules without conclusion part and are statement that must be always true Person has one date of birthday. In general, the main goal of the BRA is to give a clear, transparent and precise method for business description which can be easily understood and applied by non-technical people. This is why, in BRA the rule representation as well as rule expressing language play a crucial role. In one hand, they should be easy to read and understand and on the other hand, they must allow for very precise expressing of the knowledge. Finding compromise between these two issues is not a trivial task because solution of the first problem usually raises the second problem. Currently there are several rule representations developed for BRA. Usually, they abstract from the rule language and provide rule representation model which can be expressed in a several ways using e.g. logical expressions, graphs, controlled language or natural language. One of the most common and widely used is Semantic Business Vocabulary and Business Rules (S BVR) [122]. Semantic Business Vocabulary and Business Rules As it was mentioned, providing an efficient rule representation for BRA is not a trivial task. This is why, in 2003 the Object Management Group (OMG) issued the Business Semantics of Business Rule (BSBR) Request For Proposal. As a response the Semantic Business Vocabulary and Business Rules (S BVR) [120] was developed. Currently, S BVR is an adopted OMG standard of the language allowing for declarative description of business and what is more it is also an integral part of the OMG Model Driven Architecture (MDA). S BVR allows business people to define the policies and rules by which they run their business in their own language, in terms of the things they deal with in the business, and to capture those rules in a way that is clear, unambiguous and readily translatable into other representations [120]. It is intended to define meaning of concepts and rules regardless of the languages or notations used to state them. This is reached by providing rule representation metamodel which entirely abstracts from the knowledge processing, methods of inference or ways of modeling. Nevertheless, the S BVR proposal provides description of the method for expressing S BVR-based knowledge using English-based controlled natural language. This proposal considers also another ways for expressing S BVR like RuleSpeak. Business Rules Management Systems Business Rule Management Systems (BRMS) are computer systems that are intended to provide a complete support for business logic in a given business. They provide an appropriate solutions for: • knowledge storing which very often takes the form of centralized repository, • knowledge modeling by providing modeling methods that are appropriate for business people, • knowledge management by implementing user interfaces allowing knowledge engineer to modify knowledge repository, K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(26) 2.1. Knowledge Representation with Rules. 14. • knowledge processing what makes the knowledge usable in a practical way. What is more, they support a complete knowledge life-cycle including knowledge deployment within company (see Figure 2.2). Currently, there are several implementations of such systems, however they usually are very expensive. One of such system is the IBM product which is called ILog1 . There are also BRMS tools that are free for use. One of the most commonly known free for use system is D ROOLS2 [19].. Figure 2.2: Knowledge life-cycle within BRMS [115]. D ROOLS introduces the Business Logic integration Platform which provides a unified and integrated platform for Rules, Workflow and Event Processing. It consists of several projects, among which the most important are: Drools Expert – constitutes a dedicated forward-chaining rule engine for D ROOLS-based knowledge representation. It consists of a set of JAVA classes providing programming interface for building application that are able for reasoning. It also provides support for syntax of Drools Rule Language (DRL) which provides a native way for rules encoding. Drools Guvnor – is also called Business Rules Manager and provides a centralized rules repository allowing for modeling of data structure, rules and decision tables by using web-based user interface. It also supports domain specific language which allows for specifying rules using natural-based language. Drools Fusion – is a D ROOLS module which supports event processing and temporal reasoning. It is a tool that is able to support Complex Event Processing (CEP) [86] concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. Apart from rules, D ROOLS also integrates workflow-based modeling of the processes with the help of Business Process Management (BPM) [149, 77, 80]. Workflows can be designed by using Business Process Modeling Notation (BPMN) [125] and then can be executed with the help of dedicated workflow engine. 1 2. See: http://www-01.ibm.com/software/websphere/ilog See: http://www.jboss.org/drools/. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(27) 2.1. Knowledge Representation with Rules. 15. Business Rules Approach is currently one of the most important area where rules are applied. BRA uses rules for providing declarative specification of regulation that exist in business that can be used by business people. Besides BRA, rules also play important role in the Semantic Web initiative, where provide an important element allowing for performing reasoning tasks on a new abstraction level.. 2.1.5. Rules on Semantic Web. The Semantic Web (SW) is a worldwide initiative inspired by the vision presented in [9]. The main idea of the Semantic Web is to represent the meaning of data stored on the Internet in a standardized form. Such representation should be possible for the machines to interpret and process. This would enable more advanced searching and planning mechanisms done in an automated way. Semantic Web technologies include set of knowledge representation standards, each with different goals and expressive power. They work in a layered architecture, on consequent levels of abstraction (see Fig. 2.3).. Figure 2.3: Semantic Web stack [18] One of the general-purpose languages for representing information in the Web is Resource Description Framework (RDF) [92]. Using RDF the information can be described by statements about resources in the form of triples (resource, predicate, object). Each resource is described and identified by URI. Nevertheless, the description of only resources is not enough. Hence, on the higher level of abstraction the formal definitions of relations between resources classes are described with the help of ontologies. The main formalism for ontologies are Description Logics (DL) [6]. They use subsets of FOL to express relationships among concepts and individuals in the conceived universe. The Web Ontology Language is called OWL [150]. Rules on the SW is an active research area. On one hand the motivation for using rules lays in possibility of augmenting the knowledge representation. On the other, the methods dedicated for SW are not sufficient for modeling behavior which may be defined by means of rules. This is why, there is an increasing number of approaches which allow for using rules in SW. Two of such approaches are Semantic Web Rule Language (SWRL) [61, 157, 126] or OWL 2 RL. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(28) 2.1. Knowledge Representation with Rules. 16. Semantic Web Rule Language (SWRL) [61] is based on a combination of the OWL-DL and OWL Lite sublanguages of the OWL Web Ontology Language with the unary/binary DATALOG RULE ML sublanguages of the Rule Markup Language. SWRL extends the set of OWL axioms to include Horn-like rules which are similar to rules in P ROLOG or DATALOG languages. In fact, SWRL rules are DATALOG rules with unary predicates for describing classes and data types, binary predicates for properties, and some special built-in n-ary predicates which can be used to manipulate data values. SWRL is based on a high-level abstract syntax and model-theoretic semantics which is built on the same Description Logics foundation as OWL. It can be serialized using an XML syntax based on RULE ML. This syntax is supported by several tools like: Protégé3 , Bossam4 , Hoolet5 , Pellet6 , KAON27 , RacerPro8 .. OWL 2 RL9. is a syntactic subset (also called profile) of OWL 2 that is amenable to implementation. using rule-based technologies together with a partial axiomatization of the OWL 2 RDF-based semantics in the form of first-order implications that can be used as the basis for such an implementation. OWL 2 RL is aimed at applications that require scalable reasoning in return for some restriction on expressive power. These restrictions are designed in order to avoid the need for inferencing of the knowledge existence that is not explicitly present in the knowledge base, and to avoid the need for nondeterministic reasoning. This is achieved by restricting the use of constructs to certain syntactic positions.. 2.1.6. Rules in Software Engineering. Rules in SQL. SQL is the paradigm-setting language for databases. It provides several constructs for. expressing various kinds of rules: constraints, derivation and reaction. In SQL databases, integrity rules may occur in various places, most notably at the level of attribute definitions in the form of CHECK10 , which allows to specify a wide range of integrity rules for tables, such as range of values and list of values, at the level of table definitions in the form of CONSTRAINTs11 , and at the database schema level in the form of ASSERTIONs [161]. In turn, derivation rules may occur in the form of VIEWs12 that define a derived table by means of a query whereas reaction rules may occur in the form of TRIGGERS that define a reaction in response to update events of a certain type. 3. See: http://protege.cim3.net/cgi-bin/wiki.pl?SWRLTab See: http://bossam.wordpress.com 5 See: http://owl.man.ac.uk/hoolet 6 See: http://pellet.owldl.com 7 See: http://kaon2.semanticweb.org 8 See: http://www.racer-systems.com/products/racerpro/index.phtml 9 See: http://www.w3.org/TR/rif-owl-rl 10 See: http://www.w3schools.com/sql/sql_check.asp 11 See: http://www.w3schools.com/sql/sql_constraints.asp 12 See: http://www.w3schools.com/sql/sql_view.asp 4. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(29) 17. 2.2. Formalization of Rules. Rules in UML/OCL. The Unified Modeling Language (UML) [124, 128] may be viewed as the. paradigm-setting language for software and information systems modeling. UML allows for rules specification by using Object Constraint Language (OCL for short) [136]. OCL is a complementary part of the UML specification providing a dedicated language for rules definitions that are applied to UML models. OCL allows for expressing integrity constraints as invariants in a formal way. It also allows to include derived attributes, classes or associations in a class diagram. The derived concepts are defined by means of derivation rules [36].. Rules in Software Testing Software testing process is an important activity in the software engineering process. There is a large number of types of tests corresponding to the phases of the software lifecycle. One of them are black-box testing techniques that do not take an internal structure of the system into account but are based on the system specification. Among black-box techniques, the decision tables based technique can be distinguished [109]. In this technique decision tables (DTs) are used for testing system response for a given input. The content of DT consists of rules and corresponds to the possible combinations of the values of tested attributes [142]. Each rule in DT defines input for a system as its premise and expected system response as its conclusion. The rule is satisfied when system response is the same as assumed in the DT. This section provides a general overview of the different methods for knowledge representation that were developed over the years. Nevertheless, this thesis is focused on the rule-based knowledge representations and this is why, majority of the mentioned methods are no longer considered. According to this section, there are also many different knowledge representations that use rules. However, many of them are just programming solutions that provide only rule language without any underlying formal model. This makes such representation may be unclear or unequivocal in terms of their semantics and thus an efficient knowledge interoperability, involving this representation, becomes impossible. This is why, the next section elaborates on different methods of rules formalization that allow for precise definition of semantics.. 2.2. Formalization of Rules. Currently there exists different rule representations that are used in multiple systems in various domains. On one hand, all of these representations must provide a syntax for knowledge encoding. On the other, some of these systems are critical because they make decisions that may have impact on people health or safety. Due to this fact, such syntax must have a precisely defined semantics in order to be unambiguous. Therefore, informal rule representations can cause many problems: 1. Processing problem – due to the ambiguity, the encoded knowledge can be processed in different way in comparison to the intention of the knowledge engineer. K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(30) 18. 2.2. Formalization of Rules. 2. Validation problem – there is no possibility to check if the developed knowledge base meets user requirements [147]. 3. Verification problem – lack of precise semantics disallows the checking if the knowledge base contains errors e.g. on the logical level [147]. 4. Interoperability problem – the knowledge translation involving such representation becomes impossible due to the ambiguous semantics that cannot be aligned with the semantics of other representations. The clearly defined semantics is a crucial issue in the context of unequivocal interpretation of rules. In turn, unequivocal interpretation of rules is one of necessary conditions that must be satisfied in order to overcome the above mentioned problems. Moreover, it brings also the following advantages of: • cross-platform knowledge development. • merging of the existing rule bases. • using of different modeling and verification tools. One of the most commonly used method for ensuring of unequivocal interpretation of rules is to provide an underlying formal model. Such model is usually based on certain logic-based formalisms. This section provides an short overview of logical-based formalisms that are commonly used in this context.. 2.2.1. Propositional Logic. Propositional Logic (PL) is possibly the simplest logical system with respect to both syntax and semantics. It does not allow for using individual variables, terms or quantifiers. Thanks to this simplicity, it is practically useful language for representing rule-based systems. What is more, the automated reasoning, performed for PL-based rule representation, is decidable and very efficient. The achieved simplicity is reached at the cost of expressiveness which in case of PL is poor and insufficient for real-world systems. The detailed discussion concerning expressiveness, syntax and semantics of PL can be found in [85]. A rule in PL is commonly represented using Horn Clause which constitutes an important form of knowledge representation in the rule-based systems. A simple Horn clause ψ can be written as follows: ψ∶ ¬p1 ∨ ¬p2 ∨ ⋅ ⋅ ⋅ ∨ ¬pn ∨ h. (2.1). A Horn clause may contain at most one positive literal. According to the 2.1 formula, any Horn clause containing positive literal h can be transformed into form of rule r i.e.: r∶ p1 ∧ p2 ∧ ⋅ ⋅ ⋅ ∧ pn → h where: • ψ=r • pi and h are propositional symbols, • p1 ∧ p2 ∧ ⋅ ⋅ ⋅ ∧ pn = LHS(r) and is called Left Hand Side of the rule r or Conditional Part, and K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(31) 19. 2.2. Formalization of Rules. • h = RHS(r) and is called Right Hand Side of the rule r or Conclusion/Decision Part. In order to assign a meaning to propositional symbols pi and h the following notation is used: def. pi = ’definition’ Let us consider the following example of a rule ”who is my boss” which defines a rule that says ”Tom is my boss if he is a manager of the department in which I work”. First of all we must define the meaning of the propositional symbols: p1 p2 h. def. =. def. =. def. =. ’Tom is the manager of the department of Computer Science’ ’I work in the department of Computer Science’ ’Tom is my boss’. Now the rule can be defined: r∶ p1 ∧ p2 → h It is important to notice that for any propositional symbol can be assigned only a unique meaning. The simplicity and efficient reasoning capabilities of PL caused that it has been applied in the area of rule-based systems. RBS uses PL as the internal knowledge representation allowing for expressing knowledge in a declarative way. What is more, PL is suitable for visual representation and modeling of knowledge. A rule base can be displayed in the form of binary decision tables, binary decision trees, binary decision graphs, etc. Another very important application of PL is related to efficient reasoning in higher order logics e.g. first-order one. In this approach called instantiation-based automated reasoning [79] propositions expressed in higher order logic are reduced into proposition expressed in PL. Of course such reduced model consists of many more propositions however the reasoning capabilities becomes incomparably improved. This approach is applied in several reasoners that are developed. One of the most commonly known and efficient is I P ROVER [78]. Propositional logic is very often called zeroth-order logic because it prevents from using quantifiers. Quantifiers can be used in the first order logic which is discussed in the following section.. 2.2.2. First-Order Predicate Calculus. First Order Logic (FOL) is one of the most popular logical systems. In computer science FOL is mainly used for formalization of programs and their components, and as a basic knowledge representation language in logic programming and AI. FOL adopts most of the features from PL. However, its expressive power is incomparably higher and allows for expressing complex knowledge. The high expressiveness was mainly obtained due to the use of individual variables, terms and quantifiers e.g. boss, john, me can be interpreted as john is my boss, where john and my are constant symbols while boss is a predicate that defines kind of K. Kaczor. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.

(32) 20. 2.2. Formalization of Rules. relation between their arguments. The detailed discussion concerning expressiveness, syntax and semantics of FOL can be found in [85]. Thanks to the improved expressiveness of FOL, the rules can express much more complex knowledge in more precise way. In general FOL rule is represented as Horn clause in the following way: ψ∶ ¬p1 ∨ ¬p2 ∨ ⋅ ⋅ ⋅ ∨ ¬pn ∨ h. (2.2). where: h is a literal (either positive or negative one). Similarly as in case of PL, taking definition of implication and De Morgan’s laws, such clause can be written in the form of rule: p1 ∧ p2 ∧ ⋅ ⋅ ⋅ ∧ pn → h. (2.3). where: q1 , q2 , . . . , qn and h are some literals. Let write the ”who is my boss” example using FOL-based notation: works_in_department(Y, D) ∧ is_manager (X, D) → boss(X, Y ). (2.4). where: • works_in_department, is_manager and boss are predicates which can be defined as follows: works_in_department(X, D). def. =. is_manager (X, D). def. boss(X, Y ). def. = =. ’X works in department D’ ’X is a manager of department D’ ’X is a boss of Y’. • X, Y, D are variables (or terms in general). Using FOL, this example can be much more precisely written that in PL. It is worth noticing that the above rule can be used for any facts that belong to the appropriate relations - not like in PL where the rule is defined for only three specific facts (me, tom, computerScience). Among different programming languages and paradigms, the Logic Programming constitutes one of the most important ideas. This paradigm consists of direct application of subset of FOL for declarative encoding of knowledge and application of a specific strategy of resolution theorem proving for inference [85]. Currently, there are two most common languages that allow pure logic programming: DATALOG [137] and P ROLOG [118, 17]. Both languages are declarative and use Horn rules as a knowledge representation. Nevertheless, DATALOG is syntactically subset of P ROLOG. The P ROLOG syntax for rules is as follows: h :- q1, q2, ..., qn.. what corresponds to rule defined with the help of formula (2.3). The rule from the example ”who is my boss” can be easily modeled in P ROLOG in the following way: boss(X,Y) :K. Kaczor. worksInDepartment(Y,D), isManager(X,D).. Knowledge Formalization Methods for Semantic Interoperability in Rule Bases.