DATA BASE AND BIG DATA ANALYTICS
Academic Year 2020/2021 - 1° Year- Databases: Simone Palazzo
- Big Data Analytics: Orazio Tomarchio
Taught classes: 80 hours
Term / Semester: 1° and 2°
Learning Objectives
- Databases
The course covers the fundamental concepts of management and design of database systems.
Topics include data models (relational); query languages (SQL); implementation techniques of database management systems (index structures and query processing); and noSQL databases.
The learning objectives are: a) To understand and use the main technologies for database management; b) To design a relational database (and not), from a conceptual, logical and physical perspective; c) To use SQL language for performing efficient queries in cases of large datasets; and d) To create and query large scale datasets.
- Big Data Analytics
This module covers the fundamental concepts of management and design of a business intelligence system. Topics include data models for building a data warehouse; ETL (extract, transform and load) functionalities; OLAP analysis; basic data mining; reporting and interactive dashboards, evolution of BI architectures on large datasets. The module covers techniques and algorithms for data visualization and exploratory analysis based on principles and techniques from graphic design, perceptual psychology and cognitive science. It is targeted to using visualization in their data analytics work. The learning objectives are as follows:
Knowledge and understanding
- To understand the most important methodologies and techniques used by industries to analyse data in order to support the decision process
- To understand the main methodologies to design a data warehouse
- To understand the main methodologies to transform data into sources of knowledge through visual representation
Applying knowledge and understanding
- To be able to apply methodologies and techniques to analyse data.
- To be able to design a data warehouse.
- To be able to build report and data analysis and organize them into interactive dashboards
Course Structure
- Databases
Lectures, hands-on exercises, paper reading, student presentations and seminars.
Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus.
- Big Data Analytics
The main teaching methods are as follows:
- Lectures, to provide theoretical and methodological knowledge of the subject;
- Hands-on exercises, to provide “problem solving” skills and to apply design methodology;
- Laboratories, to learn and test the usage of related tools
Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus.
Required Prerequisites
- Databases
Basic programming skills
- Big Data Analytics
- Basic knowledge of database systems
- Basic knowledge of SQL
Attendance of Lessons
- Databases
Strongly recommended. Attending and actively participating in the classroom activities will contribute positively towards the overall assessment of the oral exam.
- Big Data Analytics
Strongly recommended. Attending and actively participating in the classroom activities will contribute positively towards the overall assessment of the oral exam.
Detailed Course Content
- Databases
1) Models and Languages for Database Management
-
Fundamentals of Database Management Systems (DBMS)
-
Relational Model: basic concepts, integrity constraints and keys.
-
SQL language: data definition, data modification, queries, views, transactions.
-
NO-SQL database: MongoDB
2) Querying and processing big data
-
Apache Spark SQL with Python
-
Dataset and Dataframes
-
Examples of data analysis with Spark SQL
-
- Big Data Analytics
1. Introduction to Business Intelligence and Big Data Analytics (6 hours)
- Goal and rationale of BI systems
- The value of knowledge - data driven decision making
- The structure and evolution of BI and Big Data analytics systems
- OLAP vs OLTP
- Data warehouse and Business intelligence
- Advanced tools and platforms for BI and analytics
2. Data models for data warehouse (10 hours)
- Conceptual modeling
- Dimensions and facts
- Multi-dimensional data model
- Conceptual, logical and physical design
3. BI Architecture (8 hours)
- ETL (extract, transform and load) functionalities
- OLAP analysis
- OLAP query
- Reporting and Interactive Dashboard
- Overview on commercial and open-source BI platforms
4. Data Visualization (16 hours)
- Introduction to Visualization
- Data Visualization fundamentals: Visual Perception and Preattentive Attributes
- Charts and standard views: relevance, appropriateness and best practices
- Use of colors in data visualization
- Dashboard Design
- Advanced and innovative tools for data visualization: the Tableau platform
Textbook Information
- Databases
-
R. Elmasri and S. Navathe, "Fundamentals of Database Systems", 7th Edition, Pearson, 2016.
-
B. Chambers, M. Zaharia, "Spark: the definitive guide", O'Reilly, 2018.
-
Instructor’s notes
-
- Big Data Analytics
- [GoRi] Golfarelli, Rizzi. Data Warehouse Design: Modern Principles and Methodologies, McGraw Hill
- [Dash] Steve Wexler, Jeffrey Shaffer, Andy Cotgreave. The Big Book Dashboards: Visualizing Your Data Using Real-World Business Scenarios. Wiley (2017)
- [Few1] Stephen Few. Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2nd edition, Analytics Press (2012)
- [Few2] Stephen Few. Information Dashboard Design: Displaying Data for At-a-Glance Monitoring, 2nd edition, O’Reilly Media (2013)
- [Notes] Instructor’s notes (published on Studium and/or the Microsoft Teams platform)
Course Planning
Databases | |||
Subjects | Text References | ||
---|---|---|---|
1 | Introduction to databases: Concepts and Architecture | Book 1 - Chapter 1 and 2 | |
2 | Relational Data Model | Book 1 - Chapter 5 | |
3 | Basic SQL: data definition, SQL query, update instruction set. | Book 1 - Chapter 6 + Notes | |
4 | Advanced SQL: Complex Queries, Triggers, Views | Book 1 - Chapter 7 + Notes | |
5 | Query processing and optimization | Book 1 - Chapter 18 and 19 | |
6 | NOSQL Databases and Big Data Storage Systems | Book 1 - Chapter 24 + Notes | |
7 | Active, Temporal, Spatial, Multimedia, and Deductive Databases | Book 1 - Chapter 26 | |
8 | Getting started with Spark SQL for Data Processing | Book 2 - Chapter 1 and 2 + Notes | |
9 | Spark SQL for Data Exploration | Book 2 - Chapter 3 + Notes | |
10 | Spark SQL for Learning Applications | Book 2 - Chapter 6 and 10 + Notes | |
11 | Multimedia benchmarks for bias identification and analysis | Research paper list on course web site | |
Big Data Analytics | |||
Subjects | Text References | ||
1 | Introduction to Big Data Analytics. | [Notes] | |
2 | Business intelligence: introduction, fundamental concepts and architectures | [Notes] [GoRi] Chap. 1 | |
3 | The structure and evolution of BI and Big Data analytics systems | [Notes] | |
4 | Data models for data warehouse: conceptual modeling and design | [GoRi] Chap. 2-6 | |
5 | Multi-dimensional data model | [GoRi] Chap. 5 | |
6 | Data models for data warehouse: logical modeling and design | [GoRi] Chap. 8-9 | |
7 | ETL (extract, transform and load) process | [GoRi] Chap. 10 [Notes] | |
8 | OLAP analysis and query | [GoRi] Chap. 7 [Notes] | |
9 | Introduction to Data Visualization. Visual Perception and Preattentive Attributes | [Dash] Chap. 1 [Few2] Chap. 4 | |
10 | Charts and standard views: relevance, appropriateness and best practices | [Few1] | |
11 | Use of colors in data visualization | [Dash] Chap. 1 | |
12 | Advanced and innovative tools for data visualization: the Tableau platform | [Notes] | |
13 | Dashboard design principles. Exploratory vs. Explanatory dashboards. | [Few2] | |
14 | Data visualization: infographics and storytelling | [Few2] |
Learning Assessment
Learning Assessment Procedures
- Databases
Written exam with SQL and noSQL exercises.
Learning assessment may also be carried out on line, should the conditions require it.
- Big Data Analytics
The final exam consists of
- a project work aiming at assessing the capabilities in developing a BI system including the analysis and the visualization of relevant information,
- an oral exam that will consist of the discussion of the project work.
Assessment criteria include: depth of analysis, adequacy, quality and correctness of the proposed solutions to the project work, ability to justify and critically evaluate the adopted solutions, clarity.
The vote on the Big Data Analytics module will account for 50% of the total grade for the entire course.
Learning assessment may also be carried out on line, should the conditions require it.
Examples of frequently asked questions and / or exercises
- Databases
- Implement a query using relational algebra
- Implement a query in SQL
- Implement a query in MongoDB
- Define the entity-relation model for a given scenario
- Big Data Analytics
Examples of questions and exercises are available on the Studium platform and/or the Microsoft Teams platform