• Nie Znaleziono Wyników

Reviewer’s opinion on Ph.D. dissertation authored by Maciej Piernik entitled: Pattern-based Clustering and Classification of XML Data

N/A
N/A
Protected

Academic year: 2021

Share "Reviewer’s opinion on Ph.D. dissertation authored by Maciej Piernik entitled: Pattern-based Clustering and Classification of XML Data"

Copied!
2
0
0

Pełen tekst

(1)

Prof Yannis Manolopoulos Department of Informatics Aristotle University

54124 Thessaloniki, Greece manolopo@csd.auth.gr

19/5/2015

Reviewer’s opinion

on Ph.D. dissertation authored by Maciej Piernik

entitled:

Pattern-based Clustering and Classification of XML Data

1. Problem and its impact

The thesis focuses on XML mining and particularly in classification and clustering. The thesis is well- written and organized; some minor issues pointed out in the “Other Remarks” section could be taken into account to improve readability. The thesis is based on solid bibliography that has been developed the last years and it gives some nice scientific aspects with practical meaning as well. This means that the results of the thesis are immediately applicable in systems that manipulate XML documents and require the execution of data mining tasks such as clustering and classification.

2. Contribution

The main contribution of the thesis is a framework for XML clustering and classification that is based on patterns. Although this approach has been used before in the data mining literature, the author did a good job in adapting this technique to XML data, and in applying it successfully to clustering and classification. In addition, it is noticed that the candidate’s publications (2 journal papers and 1 conference paper) have been published in very decent/prestigious fora.

In some cases the contribution of the thesis is not very clear. For example, in page 75, Section 6.4.5 there is a comment saying that “… we cannot significantly state that our algorithm performs better that the others.”. Although this is obvious from Table 6.10, the author should explain this behaviour.

Moreover, for the thesis to be self-contained, the reasons for which the algorithms given in [CTT05]

and [HST+05] are not applicable in datasets db11, db2 and db3 must be clearly reported.

In addition, with respect to the experimental results reported in the sections 7 and 8, there is limited comparison with respect to the runtime of the algorithms. In particular, in the era of big data, accuracy is still important BUT we have to also check the time required by the algorithms to run. This can also be used as a trade-off. Therefore, although there are a lot of accuracy results reported in the thesis, there is no reference to the actual running time in places where there is a comparison with other approaches. Such an example is Table 6.10, whereas the same applies for Table 7.1 and all results reported in Chapter 8.

3. Correctness

The thesis is based on solid mathematical background. Also, the statements and propositions given are correct. Lastly, the manuscript is absolutely acceptable in terms of edition (style, language, and structure).

(2)

4. Knowledge of the candidate

The candidate did a very good job with respect to the overall and the per-chapter organization of the thesis. A minor comment is that probably the bibliography could be further extended, as well as I would expect more citations with respect to classification and clustering in Section 3 (state-of-the-art).

However, my impression is that the candidate has a very good knowledge of the field and this is visible through the whole thesis.

5. Other remarks

Page 2: “Structure of XML documents ...”  “The structure of XML documents ...”

Page 3: “... and wide applicability ...”  “... and the wide applicability ...”

Page 3: “... is to cluster the documents prior to them being saved”, please rephrase.

Page 4: “... various applications in many bioinformatics tasks ...”, please rephrase.

Page 8: “To sum up, the main objectives ...”. It is better to start each of the next sentences with “to propose ...”, “to validate ...”, “to analyse …”, etc.

Finally, a table of symbols may be useful to the reader.

6. Conclusion

Taking into account what I have presented above and the requirements imposed by Article 13 of the Act of 14 March 2003 of the Polish Parliament on the Academic Degrees and the Academic Title (with amendments)1, my evaluation of the dissertation according to the three basic criteria is the following:

A. Does the dissertation present an original solution to a scientific problem? (the selected option is marked with X)

X Definitely YES Rather yes Hard to say Rather no Definitely NO

B.After reading the dissertation, would you agree that the candidate has general theoretical knowledge and understanding of the discipline of Computing, and particularly the area of Software

Engineering?

X Definitely YES Rather yes Hard to say Rather no Definitely NO

C. Does the dissertation support the claim that the candidate is able to conduct scientific work?

X Definitely YES Rather yes Hard to say Rather no Definitely NO Summarizing, the present Ph.D. thesis contains original contribution in the specific field of Processing XML Data. In my opinion the reviewed thesis fulfills the requirement imposed by the above Article 13 and I recommend to distinguish the dissertation for its quality.

Signature

      

1http://www.nauka.gov.pl/g2/oryginal/2013_05/b26ba540a5785d48bee41aec63403b2c.pdf

Cytaty

Powiązane dokumenty

Źródło: Rejestracja oświadczeń pracodawców o zamiarze powierzenia pracy cudzoziemcowi; Zezwolenia na pracę cudzoziemców, Analizy i raporty Ministerstwa Pracy i Polityki Społecznej,

This thesis takes a fresh look at the two most important replication techniques used with transactions: active replication (also known as state machine replication) and

The related work for each method (i.e, SECR-RSVD, RI, PRI, etc.) in chapters could be further extended to give a better knowledge of the related work by the candidate?. For

Zrozumienie mechanizmów regulujących stosunki wodne w komórce wymaga wiedzy o procesach dyfuzji i  osmozy. Istniejące w  umysłach uczniów i  studentów koncepcje tych

Wydaje się, że u niektórych pacjentek z zaburzeniami jedzenia występować może zespół cech osobowości borderline i obsesyj- no-kompulsyjnej, który w zależności od fazy

Based on the fitting results, the Kolmogorov-Smirnov test (K-S test) is performed to the data and the most likely distribution. The null hypothesis is that the data are drawn from

S ˛ adzi sie˛ jednak, z˙e w całos´ci owej dyskusji moz˙na wyłowic´ kilka centralnych punktów odniesie- nia, wokół których toczyła sie˛ wymiana zdan´:

Several Term spaces in the Vector Space Model were built on the basis of (a) a set of terms extracted from poster abstracts and titles, (b) a set of free keywords assigned to