Learn how Amazon EMR integrates with open source projects like Apache Hive, Hue, and HBase and with AWS services like AWS Glue and AWS Lake Formation. The course covers the components of data collection, ingestion, cataloging, storage, and processing in the context of Spark and Hadoop. You will learn how to use EMR notebooks to support analytics and machine learning workloads. You will also learn to apply best practices for security, performance, and cost management to Amazon EMR operations.
Module A: Overview of Data Analytics and the Data Pipeline
Module 1: Introduction to Amazon EMR
Module 2: Data analysis pipeline with Amazon EMR: Ingestion and storage
Module 3: Powerful batch data analysis with Apache Spark on Amazon EMR
Module 4: Processing and analyzing batch data with Amazon EMR and Apache Hive
Module 5: Serverless data processing
Module 6: Security and monitoring of Amazon EMR clusters
Module 7: Designing batch data analysis solutions
Module B: Developing modern data architectures on AWS
This course includes presentations, interactive demos, hands-on exercises, discussions and class exercises.
This course is aimed at the following job roles:
We recommend that participants in this course have the following prerequisites:
Form of learning
Learning form
No filter results
The training is carried out in cooperation with an authorized training partner.
The latter collects and processes data under its own responsibility. Please take note of the corresponding privacy policy