\
SUBJECT OBJECTIVES The student who has completed the course should:
C1 Know application areas and design principles of OLAP (Online Analytical Processing) systems.
C2 Be able to design ETL (Extract-Transform-Load) processes, multidimensional databases, multidimensional cubes using a selected development platform (such as MS SQL Server Integration Services (SSIS) and Analysis Services (SSAS)).
C3 Know the purpose and application areas of the most prominent methods of data mining (such as predictive modelling, clustering, association rules mining, time series analysis, text mining) in business and scientific problems.
C4 Know the most important statistical and/or machine learning algorithms used in data mining, and specific methods of distributed analysis of big data (MapReduce model).
C5 Know the data mining methodology (such as CRISP-DM or SEMMA).
C6 Be able to implement the data mining process using a selected data mining tool (such as SAS
Enterprise Miner).
C7 Be able to fine-tune predictive models in order to meet the requirements in terms of sensitivity / specificity
C8 Be capable of self-managed learning of new developments in data mining / business intelligence.
SUBJECT EDUCATIONAL EFFECTS relating to knowledge:
PEK_W01 Knows application areas and design principles of data warehouse and OLAP systems
PEK_W02 Knows specific requirements for analytical databases and most prominent models of the analytical systems (relational – ROLAP, multidimensional – MOLAP, hybrid - HOLAP)
PEK_W03 Knows design principles of ETL (Extract-Transform-Load) processes
PEK_W04 Knows application areas of most prominent methods of data mining in business and science (including predictive modelling, clustering, association rules mining, Web mining)
PEK_W05 Knows the most important algorithms employed by various methods of data mining PEK_W06 Knows the data mining methodology applied in business problems (e.g. CRISP-DM
or SEMMA) relating to skills:
PEK_U01 Is able to design a multidimensional reporting system based on data warehouse/OLAP technologies
PEK_U02 Is able to design and implement ETL processes for integration of disparate data sources using MS SQL Server Integration Services
PEK_U03 Is able to implement multidimensional database and multidimensional cubes using MS SQL Server Analysis Services
PEK_U04 Is able to perform analysis of requirements of a business intelligence system and to select appropriate methods of data mining
PEK_U05 Is able to implement data mining process in a selected data mining tool (e.g. in SAS Enterprise Miner)
PEK_U06 Is able to fine-tune predictive models in terms of sensitivity / specificity measures relating to social competences:
PEK_K01 Is able to continue self-managed learning regarding new methods and tools in data mining and business intelligence
PROGRAMME CONTENT
Form of classes – lecture Number
of hours
Lec 1 Purpose, application areas, important concepts, architecture of data warehouse / Online Analytical Processing (OLAP) systems
2
Lec 2,3 Analytical database design – data warehouse schemes: relational (ROLAP),
multidimensional (MOLAP), hybrid (HOLAP). Data aggregation in multidimensional databases (MDDB). MDX – multidimensional database query language
4
Lec 4 Purpose and application areas of most important methods of data mining in business and science (methods of predictive modelling, clustering, association rules mining, time series analysis). Web mining.
2
Lec 5 Predictive modelling algorithms: regression. Fundamentals of the statistical learning theory. Goodness-of-fit in regression. Feature selection in regression.
2
Lec 6 Predictive modelling algorithms: classification. Theoretical foundation, Bayes classifier, Bayes error. Discriminant analysis, nonparametric classifiers, logistic regression.
2
Lec 7 Linear methods in classification: perceptron algorithm, neural networks 2
Lec 8 Decision trees 2
Lec 9 Support Vector Machine classifier 2
Lec 10 Measures of predictive performance, ROC curve; dimensionality reduction. PCA algorithm, regularization methods (Lasso, ElasticNet)
2
Lec 11 Clustering algorithms: kNN, hierarchical algorithms, vector quantization, SOM 2
Lec 12 Algorithms for association rules mining 2
Lec 13,14 Methods of Text Mining, selection of features from text documents, TF IDF measure, NLP methods used in Text Mining
4
Lec 15 Analysis of massive data in MapReduce model (Apache Spark, MLlib library), applications and algorithms.
2
Total hours 30
Form of classes - laboratory Number
of hours
Lab 1,2 Introduction to MS SQL Server Integration Services (SSIS) and Analysis Services (SSAS)
4 Lab 3,4 Design and implementation of the ETL process in MS SSIS 4 Lab 5,6 Design and implementation of the multidimensional model of data – fact
and dimension tables, OLAP cubes – implementation in MS SSAS.
Deployment of the OLAP cubes in Analysis Services database engine.
4
Lab 7 Preparation of documentation of the designed business intelligence system (documentation of the ETL/data warehouse/OLAP reporting layer)
2 Lab 8,9 Introduction/tutorial to the SAS Enterprise Miner tool 4 Lab 10,11 Implementation in SAS Enterprise Miner of the data mining process for the
task of classification. Estimation of predictive performance of different types of models (decision trees, neural networks, logistic regression, memory-based learning, etc.) – sensitivity, specificity, ROC curves.
4
Lab 12 Fine tuning predictive models: dimensionality reduction / feature selection, PCA
2 Lab 13 Fine tuning complexity vs simplicity settings of the fitted predictive models 2 Lab 14 Empirical evaluation of the methods of metalearning – boosting, bagging,
model ensembles
2 Lab 15 Fine tuning predictive models: target profiles for nonsymmetrical costs of
errors, transformation of variables, etc.
2
Total hours 30
TEACHING TOOLS USED N1. Lecture, power point presentations, handouts
N2. Laboratory classes N3. Consultations
N4. Self-study – preparation for the laboratory classes
N5. Self-study – preparation for the final test
EVALUATION OF SUBJECT LEARNING OUTCOMES ACHIEVEMENT Evaluation (F – forming
(during semester), P – concluding (at semester end)
Learning outcomes code Way of evaluating learning outcomes achievement
F1 PEK_U01 – PEK_U06
PEK_K01
Evaluation of the laboratory tasks;
discussion with student regarding results obtained
F2 PEK_W01 – PEK_W06 Final test (written)
P = 0.5*(F1+F2), provided that F1>2 and F2>2
PRIMARY AND SECONDARY LITERATURE PRIMARYLITERATURE:
[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Third Edition, Elsevier 2012, (lub Second Edition, 2006)
[2] Z. Markov, D. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley
[3] H. Maciejewski, Application programming: Data mining and data warehousing, PWR 2011 [4] J. Leskovec, A. Rajaraman, J. Ullman, Mining of Massive Datasets, 2014.
SECONDARYLITERATURE:
[1] T. Hastie, R. Tibshirani, J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer 2011 -- http://www-stat.stanford.edu/~tibs/ElemStatLearn/
[2] D. Larose, Data Mining Methods and Models, Wiley
[3] Data mining / business intelligence portal: http://www.kdnuggets.com/
SUBJECT SUPERVISOR (NAME AND SURNAME, E-MAIL ADDRESS) Henryk Maciejewski, henryk.maciejewski@pwr.edu.pl
Zał. nr 5 do ZW 8/2020 Załącznik nr … do programu studiów WYDZIAŁ ELEKTRONIKI
KARTA PRZEDMIOTU
Nazwa przedmiotu w języku polskim: Badania naukowe, metody, zasady, realizacja 1 Nazwa przedmiotu w języku angielskim: Research Skills and Methodologies 1
Kierunek studiów: Informatyka techniczna Specjalność:
Poziom i forma studiów: II stopień, stacjonarna Rodzaj przedmiotu: obowiązkowy
Kod przedmiotu INEA15009
Grupa kursów: TAK
Wykład Ćwiczenia Laboratorium Projekt Seminarium Liczba godzin zajęć
zorganizowanych w Uczelni (ZZU)
15 30 15
Liczba godzin całkowitego nakładu pracy studenta (CNPS)
30 60 30
Forma zaliczenia Zaliczenie
na ocenę
Zaliczenie na ocenę
Zaliczenie na ocenę Dla grupy kursów zaznaczyć
kurs końcowy (X) X
Liczba punktów ECTS 4
w tym liczba punktów odpowiadająca zajęciom o charakterze praktycznym (P)
- 1 2 1
w tym liczba punktów ECTS odpowiadająca zajęciom wymagającym bezpośredniego udziału nauczycieli lub innych osób prowadzących zajęcia
(BU)
1 2 1
WYMAGANIA WSTĘPNE W ZAKRESIE WIEDZY, UMIEJĘTNOŚCI I KOMPETENCJI SPOŁECZNYCH
CELE PRZEDMIOTU
C1 Nabycie umiejętności prezentacji wyników badań wykonanych w ramach kompleksowego projektu.
C2 Nabycie umiejętności przeprowadzenia analizy porównawczej wybranych własności algorytmów rozwiązujących problem decyzyjny na podstawie badań symulacyjnych.
C3 Zdobycie doświadczenia w działalności badawczej, w szczególności planowania eksperymentów, projektowania systemu eksperymentowania oraz stosowania kreatywnego podejścia innowacyjnego w realizacji postawionego celu badań.
PRZEDMIOTOWE EFEKTY UCZENIA SIĘ