• Nie Znaleziono Wyników

Stream Mining on Opinionated Texts

N/A
N/A
Protected

Academic year: 2021

Share "Stream Mining on Opinionated Texts"

Copied!
32
0
0

Pełen tekst

(1)

Stream Mining on

Opinionated Texts

Learning Classifiers with minimal Human Aid

Myra Spiliopoulou

Joint work with:

(2)

?

(3)

Opinions from booking.com

(4)

Opinions from TripAdvisor

§

on EA Embassy Prague Hotel (27.04.2014)

3/5

The location of the hotel is ok

, not a long walk to everything you wanna reach

.

We asked for

a double bed which was two singel bed pushed together.

Some of the staff was really

friendly

,

but we got a little bit questioned by a girl when we didnt wanna use the hotels cab

company, and instead have a cab by taxameter (which was so much cheaper

) the breakfast

is good and they have alot to choose from!

3/5

Stayed here for 3 nights on a trip with colleagues for some R&R over a long weekend

recently...

Felt more like a large guest house than a hotel with doors locked at night and a

need to ring the bell to get back in late at night!

This place is around a 15 minute walk from the city centre

with no views and a rather austere

feel

...

.rooms large

but not that welcoming. Reception quite cold in attitude and the dining

room is far from warm, engaging or anything else

... Adequate for purpose but only just!

4/5

Had a great time at this hotel, I arrived at 4 am and the management was very kind even prepared some tea

for me while I wait for breakfast at 7:00. The room was great, with AC, nice decor and great beds. Washroom

was very nice and clean. Breakfast was delicious with local breads and cakes. Management was always

helpful and spoke english.

15 min. Walk from old town, but

I kind of like it, very

lose

to good restaurants

. The

bus station is just a few blocks away

and underground as well

.

Cant ask for more.

(5)

Opinions from TripAdvisor

§

on Hotel Masowiecki in Warsaw (14.05.2014)

3/5 May 2012

Hotel located on a busy street, opposite a pub - means noise from traffic and drunken people's

shouts until late night, not really helpful if one wants to sleep. Rooms rather basic, with old

tube-type tv and alarm clock radio. No double beds, twin beds only, however pretty comfortable.

Bathroom clean, nice breakfast.

4/5 Sept. 2012

Location's ideal - Halfway between the Old Town Square and Laziewecki Park, although both

are a good 15 minute walk.

Place is impeccably clean and great value. Other reviews have

mentioned the late night noise, however my room was at the rear of the hotel and I had no

problems whatsoever (even with my window fully open).

3/5 Aug. 2013

Stayed 2 nights at

€25 a night

in very spacious and

functional twin room

on L3 with

wash basin

.

Excellent location and a 4 minute walk to university end of Main Street. Very clean, a good style

of hotel

and reception staff were efficient. Lift a little on the slow side.

... ...

(6)

Opinions in

Netflix

Y. Koren „Collaborative Filtering with

Temporal Dynamics“, KDD‘09 & CACM‘10

Drift of the target concept

for all products

(7)

Opinion stream mining problem

Given is a set of opinionated documents:

§

How to disentangle the different, ad hoc aspects

discussed in the sentences of the documents and

§

associate a dominant polarity to each aspect ?

When the set becomes a stream:

§

How to monitor the polarity of each aspect,

(8)

Agenda

1.

Derive the polarity of a piece of text and

Learn a polarity classifier over a stream of texts

2.

Learn a polarity stream classifier with only an initial

seed of labeled texts

3.

Derive the polarity of an aspect, then

Learn and monitor aspects and their polarities

(9)

1

(10)

Propagating labels from texts to

words and back

perfect: (

2

,

0

)

expensive: (

0

,

2

)

fair: (

2

,

1

)

perfect location

expensive breakfast

expensive breakfast, fair rooms

perfect location, fair price

fair parking facilities

expensive sauna

0

,

2

expensive, unpleasant place

0

,

2

fair but cold rooms

2

,

1

fair but expensive lodging

2

,

3

T

ra

in

in

g

se

ed

S

Vocabulary of S

(11)

Propagating labels from texts to

words and back

Multinomial Naive Bayes:

§

The probability of polarity class C for text (sentence) d is

§

where the probability of observing word w in C is on

(12)

Taking the

age

of the words into

account

(13)

Selectively forgetting the past

with TrIP

1.

TrIP's window strategy selectively removes objects from

the window (SSDBM'10)

2.

TrIP deletes a node of the decision tree, if it has not

(14)

Impact of TrIP window strategy

on performance

§

Four TrIP classifiers on the PKDD 1999 dataset

0.5 0.75 1

Ar

e

a

U

n

d

e

r

th

e

C

u

rv

e

FIN1-DR

FIN2-DR

FIN3-DR

REF-DR

0 0.25 0 20 40 60 80

Ar

e

a

U

n

d

e

r

th

e

C

u

rv

e

Timepoint

(15)

Propagating

age

from texts to

words

§

Age of a text d that arrived at t_d

§

Impact of text age on the words

(16)

Impact of backward adaptation

on classifier performance

(17)

2

(18)

Adding useful texts to the seed S

§

Effect of adding a text d to the initial seed S:

§

based on how d affects what we know on the polarity of a

(19)

Forward adaptation by adding

useful texts to S

§

A text is useful,

if its usefuleness exceeds a threshold α in (-1,0].

§

By adding a useful text to S,

new words are added to V

S

V

S

={

friendly

,

sensible

,

expensive

}

d

={expensive, noisy}

(20)

Impact of forward adaptation on

classifier performance

(21)

3

(22)

Aspects and polarity

perfect: (

2

,

0

)

expensive: (

0

,

2

)

fair: (

2

,

1

)

perfect location

expensive breakfast

expensive breakfast, fair rooms

perfect location, fair price

fair parking facilities

In

iti

al

b

at

ch

S

Sentiment Vocabulary

Aspect Vocabulary

location

breakfast

rooms

price

parking

facilities

(23)

Learning aspects & their polarity

1.

Specify aspect vocabulary

2.

Learn a set of clusters

3.

Define aspects as cluster centroids

4.

Within each cluster

1)

Specify sentiment vocabulary

from the initial global seed

2)

Learn a

within-cluster polarity classifier on the texts

5.

Derive aspect polarity

(24)

Aspect & polarity learning on a

stream

At each timepoint:

1.

Update aspect vocabulary

2.

Adapt the clusters

3.

Update the aspects (cluster centroids)

4.

Within each cluster

1)

Update sentiment vocabulary

2)

Adapt the

within-cluster polarity classifier

5.

Derive aspect polarity

(25)

Aspect & polarity learning on a

stream [Neurocomp J.'14]

At each timepoint:

1.

Update aspect vocabulary

2.

Adapt the clusters

3.

Update the aspects (cluster centroids)

4.

Within each cluster

1)

Update sentiment vocabulary

2)

Adapt the Learn a

within-cluster polarity classifier

5.

Derive aspect polarity

through within-cluster majority voting

bag of words

1.  Two-level hierarchy of clusters

2.  Backward & forward adaptation

bag of words, bag of adjectives

Version with

semi-supervised incremental

learner SUBMITTED

(26)

4

(27)

Tasks

1.

Derive the polarity of a piece of text and

Learn a polarity classifier over a stream of texts

2.

Learn a polarity stream classifier with only an initial

seed of labeled texts

3.

Derive the polarity of an aspect, then

(28)

Achievements

§

Semisupervised stream classification with

backward adaptation

&

forward adaptation

§

Propagating labels and weights from the original texts

to the words and

to the aspects

§

Stream clustering algorithm with

selective remembering

reducing the weight of old words

incorporating new texts and words

(29)

Open issues

§

Building the aspect vocabulary and

the sentiment vocabulary

§

Quality guarantees when expanding

the initial seed

§

Evaluating aspect and polarity learning

simultaneously

We extend both

the training set

and

(30)

Acknowledgements

§

KMD team:

Max Zimmermann,

Georg Krempl, Pawel Matuszyk, Tommy Hielscher,

Zaigham Siddiqui

§

Co-author:

Eirini Ntoutsi

§

and

IMPRINT “Incremental Mining for Perennial Objects”

(2011- 2014)

(31)

Publications

§ 

Z.F. Siddiqui, M. Spiliopoulou: Tree Induction over Perennial

Objects. SSDBM 2010. LNCS 6187, Heidelberg, May-June

2010

§ 

Z. F. Siddiqui and M. Spiliopoulou. Classification rule mining for

a stream of perennial objects. In Proc. of the 5th Int.

Symposium on Rules: Research Based, Industry Focused

(RuleML’11), Barcelona, Spain, July 2011.

§ 

M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou: Extracting

opinionated (sub)features from a stream of product reviews.

DS 2013. LNCS 8140, Singapore, Oct. 2013

§ 

M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Adaptive

semi supervised opinion classifier with forgetting mechanism.

SAC 2014.

§ 

M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Discovering

and monitoring product features and the opinions on them with

opinstream. Neurocomputing, 2014 (accepted 4/2014)

(32)

Thank you very much !

Questions ?

Cytaty

Powiązane dokumenty

● sed -ne '/^\s.*[1-9]:/p' input //wydrukuje linie zaczynające się spacją oraz zawierające liczbę po której następuje “:”.

Przy oprawie poszczególnych zeszytów w jeden rocz­ nik należy usunąć kartę tytułową oraz wykaz treści zeszy­ tów poszczególnych, umieszczając natomiast poniższy wy­

aan de Technische Hogeschool Delft, op gezag van de rector magnificus, voor een commissie aangewezen door het college van dekanen te verdedigen op dinsdag 23 november 1982

– do Nord Stream and Yuzhny Potok (South Stream) pose a threat to the energy security of the European Union, or are they strategically important projects, ensuring the stable and

Diesel electric propulsion is seen as a solution when different operating speeds are required and the power demands are widely divergent.. The concept is often

Niezależnie od formatów debat, które różnią się w obydwu krajach oraz w poszczególnych organizacjach na przestrzeni dwudziestu lat, element mów końcowych jest niezmienny i

Adresy gratulacyjne do Jubilata skierowali: Wydział Polonistyki Uniwersytetu Warszawskiego i Zarząd Główny Towarzystwa

(b) Amount of water flowing through the stream with the gains from the different sources and the losses as inferred from the temperature profiles and the up- and downstream