• Nie Znaleziono Wyników

SUBJECT OBJECTIVES The student who has completed the course should:

\

SUBJECT OBJECTIVES The student who has completed the course should:

C1 Know application areas and design principles of OLAP (Online Analytical Processing) systems.

C2 Be able to design ETL (Extract-Transform-Load) processes, multidimensional databases, multidimensional cubes using a selected development platform (such as MS SQL Server Integration Services (SSIS) and Analysis Services (SSAS)).

C3 Know the purpose and application areas of the most prominent methods of data mining (such as predictive modelling, clustering, association rules mining, time series analysis, text mining) in business and scientific problems.

C4 Know the most important statistical and/or machine learning algorithms used in data mining, and specific methods of distributed analysis of big data (MapReduce model).

C5 Know the data mining methodology (such as CRISP-DM or SEMMA).

C6 Be able to implement the data mining process using a selected data mining tool (such as SAS

Enterprise Miner).

C7 Be able to fine-tune predictive models in order to meet the requirements in terms of sensitivity / specificity

C8 Be capable of self-managed learning of new developments in data mining / business intelligence.

SUBJECT EDUCATIONAL EFFECTS relating to knowledge:

PEK_W01 Knows application areas and design principles of data warehouse and OLAP systems

PEK_W02 Knows specific requirements for analytical databases and most prominent models of the analytical systems (relational – ROLAP, multidimensional – MOLAP, hybrid - HOLAP)

PEK_W03 Knows design principles of ETL (Extract-Transform-Load) processes

PEK_W04 Knows application areas of most prominent methods of data mining in business and science (including predictive modelling, clustering, association rules mining, Web mining)

PEK_W05 Knows the most important algorithms employed by various methods of data mining PEK_W06 Knows the data mining methodology applied in business problems (e.g. CRISP-DM

or SEMMA) relating to skills:

PEK_U01 Is able to design a multidimensional reporting system based on data warehouse/OLAP technologies

PEK_U02 Is able to design and implement ETL processes for integration of disparate data sources using MS SQL Server Integration Services

PEK_U03 Is able to implement multidimensional database and multidimensional cubes using MS SQL Server Analysis Services

PEK_U04 Is able to perform analysis of requirements of a business intelligence system and to select appropriate methods of data mining

PEK_U05 Is able to implement data mining process in a selected data mining tool (e.g. in SAS Enterprise Miner)

PEK_U06 Is able to fine-tune predictive models in terms of sensitivity / specificity measures relating to social competences:

PEK_K01 Is able to continue self-managed learning regarding new methods and tools in data mining and business intelligence

PROGRAMME CONTENT

Form of classes – lecture Number

of hours

Lec 1 Purpose, application areas, important concepts, architecture of data warehouse / Online Analytical Processing (OLAP) systems

2

Lec 2,3 Analytical database design – data warehouse schemes: relational (ROLAP),

multidimensional (MOLAP), hybrid (HOLAP). Data aggregation in multidimensional databases (MDDB). MDX – multidimensional database query language

4

Lec 4 Purpose and application areas of most important methods of data mining in business and science (methods of predictive modelling, clustering, association rules mining, time series analysis). Web mining.

2

Lec 5 Predictive modelling algorithms: regression. Fundamentals of the statistical learning theory. Goodness-of-fit in regression. Feature selection in regression.

2

Lec 6 Predictive modelling algorithms: classification. Theoretical foundation, Bayes classifier, Bayes error. Discriminant analysis, nonparametric classifiers, logistic regression.

2

Lec 7 Linear methods in classification: perceptron algorithm, neural networks 2

Lec 8 Decision trees 2

Lec 9 Support Vector Machine classifier 2

Lec 10 Measures of predictive performance, ROC curve; dimensionality reduction. PCA algorithm, regularization methods (Lasso, ElasticNet)

2

Lec 11 Clustering algorithms: kNN, hierarchical algorithms, vector quantization, SOM 2

Lec 12 Algorithms for association rules mining 2

Lec 13,14 Methods of Text Mining, selection of features from text documents, TF IDF measure, NLP methods used in Text Mining

4

Lec 15 Analysis of massive data in MapReduce model (Apache Spark, MLlib library), applications and algorithms.

2

Total hours 30

Form of classes - laboratory Number

of hours

Lab 1,2 Introduction to MS SQL Server Integration Services (SSIS) and Analysis Services (SSAS)

4 Lab 3,4 Design and implementation of the ETL process in MS SSIS 4 Lab 5,6 Design and implementation of the multidimensional model of data – fact

and dimension tables, OLAP cubes – implementation in MS SSAS.

Deployment of the OLAP cubes in Analysis Services database engine.

4

Lab 7 Preparation of documentation of the designed business intelligence system (documentation of the ETL/data warehouse/OLAP reporting layer)

2 Lab 8,9 Introduction/tutorial to the SAS Enterprise Miner tool 4 Lab 10,11 Implementation in SAS Enterprise Miner of the data mining process for the

task of classification. Estimation of predictive performance of different types of models (decision trees, neural networks, logistic regression, memory-based learning, etc.) – sensitivity, specificity, ROC curves.

4

Lab 12 Fine tuning predictive models: dimensionality reduction / feature selection, PCA

2 Lab 13 Fine tuning complexity vs simplicity settings of the fitted predictive models 2 Lab 14 Empirical evaluation of the methods of metalearning – boosting, bagging,

model ensembles

2 Lab 15 Fine tuning predictive models: target profiles for nonsymmetrical costs of

errors, transformation of variables, etc.

2

Total hours 30

TEACHING TOOLS USED N1. Lecture, power point presentations, handouts

N2. Laboratory classes N3. Consultations

N4. Self-study – preparation for the laboratory classes

N5. Self-study – preparation for the final test

EVALUATION OF SUBJECT LEARNING OUTCOMES ACHIEVEMENT Evaluation (F – forming

(during semester), P – concluding (at semester end)

Learning outcomes code Way of evaluating learning outcomes achievement

F1 PEK_U01 – PEK_U06

PEK_K01

Evaluation of the laboratory tasks;

discussion with student regarding results obtained

F2 PEK_W01 – PEK_W06 Final test (written)

P = 0.5*(F1+F2), provided that F1>2 and F2>2

PRIMARY AND SECONDARY LITERATURE PRIMARYLITERATURE:

[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Third Edition, Elsevier 2012, (lub Second Edition, 2006)

[2] Z. Markov, D. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley

[3] H. Maciejewski, Application programming: Data mining and data warehousing, PWR 2011 [4] J. Leskovec, A. Rajaraman, J. Ullman, Mining of Massive Datasets, 2014.

SECONDARYLITERATURE:

[1] T. Hastie, R. Tibshirani, J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer 2011 -- http://www-stat.stanford.edu/~tibs/ElemStatLearn/

[2] D. Larose, Data Mining Methods and Models, Wiley

[3] Data mining / business intelligence portal: http://www.kdnuggets.com/

SUBJECT SUPERVISOR (NAME AND SURNAME, E-MAIL ADDRESS) Henryk Maciejewski, henryk.maciejewski@pwr.edu.pl

Zał. nr 5 do ZW 8/2020 Załącznik nr … do programu studiów WYDZIAŁ ELEKTRONIKI

KARTA PRZEDMIOTU

Nazwa przedmiotu w języku polskim: Badania naukowe, metody, zasady, realizacja 1 Nazwa przedmiotu w języku angielskim: Research Skills and Methodologies 1

Kierunek studiów: Informatyka techniczna Specjalność:

Poziom i forma studiów: II stopień, stacjonarna Rodzaj przedmiotu: obowiązkowy

Kod przedmiotu INEA15009

Grupa kursów: TAK

Wykład Ćwiczenia Laboratorium Projekt Seminarium Liczba godzin zajęć

zorganizowanych w Uczelni (ZZU)

15 30 15

Liczba godzin całkowitego nakładu pracy studenta (CNPS)

30 60 30

Forma zaliczenia Zaliczenie

na ocenę

Zaliczenie na ocenę

Zaliczenie na ocenę Dla grupy kursów zaznaczyć

kurs końcowy (X) X

Liczba punktów ECTS 4

w tym liczba punktów odpowiadająca zajęciom o charakterze praktycznym (P)

- 1 2 1

w tym liczba punktów ECTS odpowiadająca zajęciom wymagającym bezpośredniego udziału nauczycieli lub innych osób prowadzących zajęcia

(BU)

1 2 1

WYMAGANIA WSTĘPNE W ZAKRESIE WIEDZY, UMIEJĘTNOŚCI I KOMPETENCJI SPOŁECZNYCH

CELE PRZEDMIOTU

C1 Nabycie umiejętności prezentacji wyników badań wykonanych w ramach kompleksowego projektu.

C2 Nabycie umiejętności przeprowadzenia analizy porównawczej wybranych własności algorytmów rozwiązujących problem decyzyjny na podstawie badań symulacyjnych.

C3 Zdobycie doświadczenia w działalności badawczej, w szczególności planowania eksperymentów, projektowania systemu eksperymentowania oraz stosowania kreatywnego podejścia innowacyjnego w realizacji postawionego celu badań.

PRZEDMIOTOWE EFEKTY UCZENIA SIĘ