
Big Data Applications
Code
200145
Academic unit
NOVA Information Management School
Credits
7.5
Teacher in charge
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Objectives
The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Spark and key elements of the Hadoop Ecosystem used in developing end-to-end applications for processing Big Data efficiently.
Students who complete this course will understand key Spark and Hadoop concepts, and they will learn to apply Spark and Hadoop tools in developing applications for solving the types of problems faced by enterprises and research institutions today.
Prerequisites
Basic programming experience in python, as well as basic familiarity with the Linux command line is preferable. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.
Subject matter
CUC1.Introduction to Hadoop
- Introduction to Hadoop and the Hadoop Ecosystem
- Hadoop Architecture and HDFS
CUC2.Importing and Modeling Structured Data
- Importing Relational Data with Apache Sqoop
- Introduction to Impala and Hive
- Modeling and Managing Data with Impala and Hive
- Data Formats
- Data File Partitioning
CUC3.Ingesting Streaming Data
- Capturing Data with Apache Flume
CUC4.Distributed Data Processing with Spark
- Spark Basics
- Working with RDDs in Spark
- Aggregating Data with Pair RDDs
- Writing and Deploying Spark Applications
- Parallel Processing in Spark
- Spark RDD Persistence
- Common Patterns in Spark Data Processing
- Spark SQL and DataFrames
Bibliography
Hadoop: The Definitive Guide. Tom White. O'Reilly 2014; Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset. Michael Frampton; 0; 0; 0
Teaching method
The course is mainly based on lecture and practical classes. The practical sessions include exposure of concepts and methodologies, sample resolution, discussion and interpretation of results.
Evaluation method
1st term and 2nd term
- elective group project (40%)
-exam (60%)
Courses
- PostGraduate in Information Analysis and Management
- PostGraduate Information Systems and Technologies Management
- PostGraduate in Knowledge Management and Business Intelligence
- PostGraduate in Information Systems Governance
- PostGraduate in Marketing Research e CRM
- PostGraduate in Marketing Intelligence
- PostGraduate in Enterprise Information Systems
- PostGraduate Digital Marketing and Analytics
- PostGraduate in Digital Enterprise Management
- PostGraduate Marketing Research e CRM
- PostGraduate in Information Systems and Technologies Management
- PostGraduate Risk Analysis and Management
- PostGraduate in Smart Cities
- PostGraduate in Information Management and Business Intelligence in Healthcare