An Attempt to Knowledge Systematization of Opinion Mining Approaches

(1)

AN ATTEMPT TO KNOWLEDGE SYSTEMATIZATION OF OPINION MINING APPROACHES

JAROSŁAW WĄTRÓBSKI Summary

Nowadays, opinions are central to almost all human activities because they are key influencers of our behaviours. Frequently, to make a decision, we want to know others’ opinions. In the real world, businesses and organizations always want to find consumer or public opinions about their products and services. Current solutions for opinion mining and sentiment analysis are fastly evolving, typically by reducing the amount of human effort needed to classify comments. In this paper, an analysis and a proper selection of methodological approach dedicated to selected opinion mining problems is provided. The paper introduces state-of-the-art and preliminary results referred to opinion mining approaches, offering valuable, general insights and infor-mation about selected approaches in the context of identified set of attributes.

Keywords: opinion mining, sentiment analysis, methods and tools for opinion mining Introduction

Modern enterprises and organizations struggle with the problem of enormous amount of infor-mation, generated by different processes [6,19]. Management of such information is a daunting task, demanding a huge effort. What is more, the information is gathered both from internal and external sources [11,18]. Apart from that, organization needs to cope with structured, semi-structured and unstructured data [16,27]. Unfortunately, decision making is mainly based on the knowledge ac-quired on the basis of semi structured and unstructured data [15,22].

World Wide Web is rapidly growing, offering a wide spectrum of business activities. Many products and services are offered in the Internet, and customers have possibilities to share their opinion on the Web. The problem occurs when an increasing number of opinions are added. The similar situation takes place, when organization wants to monitor the results and reviewer’s opinions [9,14]. Although a number of opinions are enormous, the main problem concerns the extraction of valuable and complete knowledge [21]. Thus, for analysing customer´s behaviours and activities a large amount of data related to customer´s purchasing pattern is required [20,32], and conse-quently, needs to be analysed. To do this, the heterogeneous data presented in various forms needs to be transformed into a consistent format.

Due to availability of large volume of information on Web, and an existence of various methods and tools dedicated to support opinion mining processes, this paper surveys the selected approaches and summarizes the main features offered by them. The Section 2 discusses the previous and existing approaches for opinion mining. In Section 3, a comparative analysis of selected opinion mining methods and tools is provided. The conclusions finish this work.

(2)

34 1. Literature review

Opinion mining is crucial task both for companies and individuals. Oftentimes, companies are willing to know the customers’ feedback of the products and services they offered to make future decisions, whereas individuals may consider the other individual opinions [29]. Thus, the succeeding decisions are taken on base of the gathered opinions. This process leads to promote tools and meth-ods supporting marketing intelligence, using opinion mining [31]. According to the literature, opin-ion mining (OM) is a highly active research field that comprises natural language processing, (NLP) computational linguistics and text analysis techniques with the aim of extracting various kinds of added-value and informational elements from users’ opinions [1, 17]. However, current opinion mining approaches are hampered by a number of drawbacks such as the absence of semantic rela-tions between concepts in feature search processes or the lack of advanced mathematical methods in sentiment analysis processes [33].

The previous works concentrated on an analysis of the process of opinion mining by using a framework logically derived by analyzing critically the existing research in opinion mining [2]. Moreover, an architecture that uses multi-dimensional model to integrate customer’s characteristics and their comments about products was proposed by [32]. This approach first identifies the entities and then sentiments present in the customers reviews are transformed into an attribute table by using a 7 point polarity system (–3 to 3). Further elaborations referred to use ontology-based approaches for movie reviewers [34]. The authors exploited a fine-grain approach for opinion mining, which used the ontology structure as an essential part of the feature extraction process, by taking account the relations between concepts. Approximate ontology-based approach was provided by [4], where the authors wanted to investigate how the domain ontology can be used to guide the process of identifying the most relevant discourse relations between elementary discourse units. Alternatively, other computational approaches developed the linguistic and cognitive models of opinion, using psycholinguistic theories of emotions to analyse how opinions are lexically expressed in texts [1, 24, 30]. Similar work based on elaboration of linguistic resources was provided by [26,28]. They applied corpus based and dictionary based approaches to automatically or semi-automatically extract opinion bearing terms/expressions and their sentiment orientation. The elaboration and aggregation of local opinions in order to compute the overall orientation of a document or sentence were evolved by [3, 7, 8, 25, 28]. Another research work concentrated on feature based opinion mining [5, 7, 14, 23]. These approaches relied on opinions expressed towards the features of an object or a product, and consequently its extraction and summarization.

Some works in the field of opinion mining and sentiment analysis exploited association rule mining algorithm to discover product features [14]. The other approach developed by [23] proposed the system OPINE to extract only nominal groups whose frequency was above a threshold deter-mined experimentally using the calculation of PMI (Point-wise Mutual Information) between each of these nouns and meronymy expressions associated with the product. However, the main limitation of these approaches is that there are a great many extracted features and there is a lack of organiza-tion.

An analysis of literature provided some works using feature taxonomies. For example, the pre-defined taxonomies and semantic similarity measures were adopted to automatically extract classic features of a product and calculate their closeness to predefined concepts in the taxonomy [5]. A ded-icated system PULSE to analyse of a large amount of text contained in a database was evolved by

(3)

[12]. Another solution was developed by [13], aiming to extract information about services, aggre-gates the sentiments expressed on every aspect and produces a summary.

Further elaborations practised the ontology-based approaches, especially using a domain ontol-ogy to guide the feature extraction phase, building manually [34], or semi-automatically [7, 10]. At the same time, [7] developed the system OMINE, offering a mechanism for ontology enrichment using a domain glossary which included specific terms (e.g. words of jargon, abbreviations and ac-ronyms). Consecutive advancement of the work proposed by [34] expanded their work by adding to their ontology concepts using a corpus based method, including sentences containing a combination of conjunction word and already recognized concept. Likewise, a support of polarity mining was mapped out by [35], where the authors manually built an ontology for movie reviews, and in the aftermath of this, they incorporated it into the polarity classification task which significantly im-proved performance over standard baseline.

2. A comparative analysis of selected opinion mining approaches

A number of existing approaches emphasizes the role of opinion mining and challenges related to this issue. According to the main application, they can be divided into the following subsets: machine learning approaches, lexicon-based approaches, and ontology-based approaches. Machine learning can be subdivided into unsupervised and supervised learning, whereas supervised learning encloses decision trees classifiers, linear classifiers (support vector machines and neural networks), rule-based classifiers, and probabilistic classifiers (naïve bayes, Bayesian network, maximum en-tropy). Lexicon-based approach includes dictionary-based approach and corpus-based approach (statistical and semantic). The schema presents the classification of the selected opinion mining ap-proaches with regard to its appropriation (Figure 1).

(4)

36

Figure 1. General schema of opinion mining approaches

This schema was a basis for further elaboration of the selected opinion mining approaches. Thus, the example set of methods was prepared, including 10 elements: (naive Bayes classifier, Bayesian network, SVM, Classifier of maximum entropy, rule-based classifier, neuron network, kNN, decision trees, ontology-based approach, and lexicon-based approach. Based on the specifica-tions offered by the selected approaches, the set of attributes was unified and adapted to the applica-bility for opinion mining. Finally, the set of attributes contains 8 criteria: (Aim, Algorithm/Classifier, Learning time, Speed of learning, Tolerance for missing values, Resistance for noise, Overfitting, Explanation). Each of considered attributes has assigned the acceptable values, presented below:

• Aim attribute presents the main assumptions offered by a given approach. It provides a short insight into the general purpose.

• Algorithm/Classifier attribute describes the mechanisms, theorems and classifiers that are used by given solutions. Due to heterogeneity of selected approaches, this attribute offers various options.

• Learning time attribute determines the time intended to learn, covering the 4 possible values: small, medium, big and very big. The information about the learning time of considered ap-proaches is derived from the scientific literature.

• Speed of learning attribute conditions the rapidity of learning. It covers the following values: very short, short, medium, and long, whereas the very short value is the most desirable option. In case of the lack of information or impossibility to measure it, the approach receives the value: not applicable.

(5)

• Tolerance for missing values attribute informs of the possible options of acceptance for missing elements. In this context, the defined set of values is determined: small, medium, big and very big.

• Resistance for noise attribute comprises the information of the probable level of resistance for noise, including the values as: small, medium and big. In case of the lack of information or impossibility to measure it, the approach receives the value: not applicable.

• Overfitting attribute informs of the states related with the level of an excessive learning. The following values are possible: small, medium and big. The lack of information or impossibility of measurement is assigned as not applicable.

• Explanation attribute refers to the level of clarification of a given approach. The following values are passable: small, medium, and big. The lack of information and incomplete infor-mation is covered by the state: not applicable.

Final classification contains the set of 10 approaches dedicated to opinion mining and the set of 8 attributes with assigned optional values. Table 1 shows the complete set of approaches [1, 5, 13, 10, 17, 19, 23, 33, 34, 35].

Table 1. Final classification of selected opinion mining approaches Method Naive Bayes

classifier Bayesian network SVM Classifier of maximum entropy Rule-based classifier Criteria Aim Calculation of the probability of occurrence of a class based on words in a document. Reasoning about belonging to a class based on the network. Determination of the hyperplane with the largest possible class separation margin. Converting a set of labeled objects to the form of vectors used to calculate the weight af a concrete feature. Creating a data space based on user rule sets.

Algorithm/Classifier Bayeas theorem Bayeas theorem Machine of supporting vectors Feature-based classifier List of user rules

Learning time Small Small Very big Very big Medium

Speed of learning Very short Very short Long Medium Medium

Tolerance for missing values

Very big Big Very big Big Very big

Resistance for noise Very big Medium Medium Medium Medium

Overfitting Big Big Medium Big Small

(6)

38 Method Neuron

network

kNN Decision trees Ontology-based approach Lexicon-based approach Criteria Aim Calculating probabilities based on weights assigned to neurons. Location of the nearest instances to query insights, and define the class. Hierarchical distribution of training data. Classification on base of the ontology. Specifying the polarization of specific words or the whole text. Algorithm/Classifier Perceptrons with specific weights

Algorithm kNN C4.5 WordNet, NLP WordNet

Learning time Big Medium Medium Big Medium

Speed of learning Long Very short Short Not applicable Not applicable Tolerance for missing

values

Very big Small Very big Big Very big (manual) / Medium (automatic ) Resistance for noise Small Small Big Not applicable Not applicable Overfitting Small Small Medium Not applicable Not applicable Explanation Small Big Medium Not applicable Not applicable This table presents the selected and most popular algorithms applied to opinion mining prob-lems. The identification of sets of differentiating criteria and the successive features of individual approaches allowed for the construction of the taxonomy. It is clear visible that the presented values of the attributes are characterized by a large variety. Thus, the elaborated table can be treated as a guidance how to use the selected method for a given decision situation. Apart from the preliminary results of the conducted research, and assumptive limitations only to 10 most popular approaches, the presented taxonomy explicitly emphasizes on the necessity of careful analysis of each of the selected opinion mining problems, and in the aftermath of this, an adequate method selection to a given decision situation and conditions of scientific experiment.

3. Conclusions

Due to the immense popularity of social media and e-business based on the reviews written by the reviewers on the e-commerce websites, the role of opinion mining increases and brings more benefits in order to gain advantage in the market. Thus, opinion mining is getting more important than before, because whenever we need to make a decision, we want to know others points of view. This paper investigates the problem of an analysis and a proper selection of methodological approach dedicated to selected opinion mining problem. The paper introduced state-of-the-art and preliminary results referred to opinion mining approaches. The results contained a comparative anal-ysis of selected methods and tools applicable in opinion extraction processes. Based on this, the results provide valuable, general insights and information about selected approaches in the context of identified set of attributes.

(7)

Bibliography

[1] Asher N., Benamara F., Mathieu. Y.Y., Appraisal of Opinion Expressions in Discourse.

Lingvistic? Investigationes, John Benjamins Publishing Company, Amsterdam, Vol. 32:2.

[2] Binali H., Potdar V., Wu C., A state of the art opinion mining and its application domains, IEEE International Conference on Industrial Technology, Gippsland, VIC, 2009, pp. 1–6. [3] Bo P., Lee L., Shivakumar V., Thumbs up? Sentiment Classification using Machine Learning

Techniques, Proceedings of EMNLP 2002.

[4] Cadilhac A., Benamara F., Aussenac-Gilles N. Ontolexical resources for feature-based

opinion mining: a case-study, 2010.

[5] Carenini G., Raymond T. Ng, Zwart E., Extracting Knowledge from Evaluative Text, In Proceedings of the 3rd international conference on Knowledge captur, 2005.

[6] Castellanos M., Wang D.U., Processing and DW2.0 in Operational Business Intelligence

Information Systems, Lecture Notes in Computer Science, pp.33–45.

[7] Cheng X., Xu F., Fine-grained Opinion Topic and Polarity Identification, In Proceedings of the Sixth International Language Resources and Evaluation (LREC' 08), Marrakech, Morocco 2008.

[8] Choi Y., Cardie C., Riloff E., Patwardhan S., Identifying sources of opinions with conditional

random fields and extraction patterns, In Proceedings of HLT/EMNLP 2005.

[9] Eirinaki P., Singh J., Feature-based opinion mining and ranking, Journal of Computer and System Sciences, 78(4), 2011, pp.1175–1184.

[10] Feiguina O., Résumé automatique des commentaires de Consommateurs. Mémoire présenté

? la Faculté des études supérieures en vue de l'obtention du grade de M.Sc. en informatique,

Département d'informatique et de recherche opérationnelle, Université de Montréal 2006. [11] Felden C., Chamoni P., Execution towards a Business Process Intelligence Processing, 47(6),

pp.195–206.

[12] Gamon M., Aue A., Corston O.S., Ringger E., Pulse: Mining Customer Opinions from Free

Text, In Proceedings of International symposium on intelligent data analysis N°6, Madrid

2005.

[13] Goldensohn B. et al., Building a Sentiment Summarizer for Local Service Reviews, WWW2008 Workshop : Natural Language Processing Challenges in the Information Explosion Era (NLPIX 2008), 2008.

[14] Hu M., Liu B., Mining and summarizing customer reviews, in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, NewYork, USA, 2004, pp.168–177.

[15] Kantardzic M., Data Mining: Concepts, Models, Methods, and Algorithms, ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Wilet-IEEE Express, 2002.

[16] Konys A. Wątróbski J., Różewski P., Approach to Practical Ontology Design for Supporting

COTS Component Selection Processes, ACIIDS 2013 – A. Selamat et al. (Eds.): ACIIDS

2013, Part II, LNAI 7803, Springer, Heidelberg, 2013, 245–255.

[17] Konys A., A Tool Supporting Mining Based Approach Selection to Automatic Ontology

Construction, IADIS Journal on Computer Science and Information Systems, 2015, pp. 3–

(8)

40

[18] Konys A., A Framework for Analysis of Ontology-Based Data Access, in: Computational Collective Intelligence, 8th International Conference, ICCCI 2016, Part II, Nguyen, N.-T., Iliadis, L., Manolopoulos, Y., Trawiński, B. (Eds.), Lecure Notes in Computer Science, Springer International Publishing, 2016, pp. 397–408.

[19] Konys, A., An Ontology-Based Knowledge Modelling for a Sustainability Assessment

Domain, Sustainability 2018, 10, 300.

[20] Lau R.Y.K. et al., Automatic Domain Ontology Extraction for Context-Sensitive Opinion

Mining, ICIS 2009 Proceedings. Paper 35. 2009.

[21] Lejeune M. A. M., Measuring the Impact of Data Mining on Churn, Management, Internet Research, ABI/INFORM Global, vol. 11, no. 5. Bradford, 2001, pp. 375–388.

[22] Negash, S., Gray, P., Business intelligence (Chapter 45). In: F. Burstein & C., Holsapple (eds.) Handbook of decision support systems 2. Springer Link 2008, 175–193.

[23] Popescu A.M., Etzioni O., Extracting Product Features and Opinions from Reviews, In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing 2005.

[24] Read J., Hope D., Carroll J., Annotating Expressions of Appraisal in English, The Linguistic Annotation Workshop, ACL 2007.

[25] Soo-Min K., Hovy E., Extracting Opinions, Opinion Holders, and Topics Expressed in

Online News Media Text, In Proceedings of ACL/COLING Workshop on Sentiment and

Subjectivity in Text, Sydney, Australia 2006.

[26] Strapparava C., Valitutti A., WordNet-Affect: an Affective Extension of WordNet, Proceedings of LREC 04, 2004.

[27] Sukumaran S., Sureka S., Integrating Structured and Unstructured Data Using Text Tagging

and Annotation, Business Intelligence Journal 2006, 11(2), pp. 8–16.

[28] Turney P.D., Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised

Classification of Reviews, Proceedings of 2006 International Conference on Intelligent User

Interfaces (IUI06).

[29] Wei W., Gulla J.A., Sentiment Learning on Product Reviews via Sentiment Ontology Tree, Proceedings of the Association for ComputationalnLinguistics (ACL), pp.404–413, 2010. [30] Wiebe J., Wilson T., Cardie C., Language Res Eval, 2005, 39: 165.

[31] Xu R. et al., Learning Knowledge from Relevant Webpage for Opinion Analysis, in Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Sydney, Australia, pp. 307–313. 2008.

[32] Yaakub R. M, Li. Y., Feng,Y., Integration of Opinion into Customer Analysis Model, in proceedings of Eighth IEEE International Conference on e-Business Engineering 2011, pp. 90–95.

[33] Zhang Q., Segall R. S., Web Mining: a Survey of Current Research, Techniques, and Software, 2008.

[34] Wątróbski J., Jankowski J., Knowledge management in MCDA domain, in 2015 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2015, pp. 1445–1450.

[35] Zhou L., Chaovalit P., Ontology-supported polarity mining, J. Am. Soc. Inf. Sci. Technol. 59, 1, January 2008, 98–110.

(9)

ELEMENTY SYSTEMATYZACJI WIEDZYW OBSZARZE OPINION MINING Summary

Badanie opinii konsumentów stanowi interesujący i dynamicznie rozwijający się trend badawczy. Skutkuje to intensywnym rozwojem specjalistycznych metod i technik analizy danych. Ich wykorzystanie w obszarach czy to analizy opinii czy też analizy sentymentu wspomaga decydenta ograniczając nakłady niezbędne do analizy zgroma-dzonycch danych. Artykuł prezentuje probe systemazycaji i kategoryzacji podejść me-todycznych wykorzystywanych w obszarze opinion mining zawierając jednocześnie ze-staw wytycznych niezbednych do poprawnego wyboru odpowiedniej metody dla analizowanego problemu.

Słowa kluczowe: opinion mining, zarządzanie wiedzą, uczenie maszynowe Jarosław Wątróbski

University of Szczecin

Faculty of Economics and Management ul. Mickewicza 64, 71-101 Szczecin, Poland e-mail: jwatrobski@wneiz.pl