NOVA Information Management School

Big Data Applications

Code

200145

Academic unit

NOVA Information Management School

Credits

7.5

Teacher in charge

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Objectives

The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Spark and key elements of the Hadoop Ecosystem used in developing end-to-end applications for processing Big Data efficiently.

Students who complete this course will understand key Spark and Hadoop concepts, and they will learn to apply Spark and Hadoop tools in developing applications for solving the types of problems faced by enterprises and research institutions today.

 

Prerequisites

Basic programming experience in python, as well as basic familiarity with the Linux command line is preferable. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.

Subject matter

CUC1.Introduction to Hadoop

  • Introduction to Hadoop and the Hadoop Ecosystem
  • Hadoop Architecture and HDFS

CUC2.Importing and Modeling Structured Data

  • Importing Relational Data with Apache Sqoop
  • Introduction to Impala and Hive
  • Modeling and Managing Data with Impala and Hive
  • Data Formats
  • Data File Partitioning

CUC3.Ingesting Streaming Data

  • Capturing Data with Apache Flume

CUC4.Distributed Data Processing with Spark

  • Spark Basics
  • Working with RDDs in Spark
  • Aggregating Data with Pair RDDs
  • Writing and Deploying Spark Applications
  • Parallel Processing in Spark
  • Spark RDD Persistence
  • Common Patterns in Spark Data Processing
  • Spark SQL and DataFrames

Bibliography

Hadoop: The Definitive Guide. Tom White. O'Reilly 2014; Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset. Michael Frampton; 0; 0; 0

Teaching method

The course is mainly based on lecture and practical classes. The practical sessions include exposure of concepts and methodologies, sample resolution, discussion and interpretation of results.

Evaluation method

1st term and 2nd term
 - elective group project (40%)
 -exam (60%)

Courses