Building Batch Data Analytics Solutions on AWS

Online

1 day

German

Download PDF

€ 790,-

plus VAT.

€ 940,10

incl. VAT.

Booking number

36413

Venue

Online

1 appointment

€ 790,-

plus VAT.

€ 940,10

incl. VAT.

Booking number

36413

Venue

Online

1 appointment

Become a certified
Machine Learning Engineer

This course is part of the certified Master Class "Machine Learning Engineer". If you book the entire Master Class, you save over 15 percent compared to booking this individual module.

To the Master Class

In-house training

In-house training just for your employees - exclusive and effective.

Inquiries

In cooperation with

In this course, you will learn how to build batch data analytics solutions using Amazon EMR, a managed enterprise-class Apache Spark and Apache Hadoop service.

Contents

Learn how Amazon EMR integrates with open source projects like Apache Hive, Hue, and HBase and with AWS services like AWS Glue and AWS Lake Formation. The course covers the components of data collection, ingestion, cataloging, storage, and processing in the context of Spark and Hadoop. You will learn how to use EMR notebooks to support analytics and machine learning workloads. You will also learn to apply best practices for security, performance, and cost management to Amazon EMR operations.

Module A: Overview of Data Analytics and the Data Pipeline

Use cases of data analysis
Use of the data pipeline for data analysis

Module 1: Introduction to Amazon EMR

Use of Amazon EMR in analytics solutions
Architecture of Amazon EMR clusters
Interactive demo 1: Starting an Amazon EMR cluster
Cost management strategies

Module 2: Data analysis pipeline with Amazon EMR: Ingestion and storage

Storage optimization with Amazon EMR
Techniques for data transfer

Module 3: Powerful batch data analysis with Apache Spark on Amazon EMR

Use cases for Apache Spark on Amazon EMR
Why Apache Spark on Amazon EMR?
Spark concepts
Interactive Demo 2: Connect to an EMR cluster and run Scala commands with the Spark shell
Transformation, processing and analysis
Use of notebooks with Amazon EMR
Practice Lab 1: Low-latency data analysis with Apache Spark on Amazon EMR

Module 4: Processing and analyzing batch data with Amazon EMR and Apache Hive

Using Amazon EMR with Hive to process batch data
Transformation, processing and analysis
Practice Lab 2: Batch data processing with Amazon EMR and Hive
Introduction to Apache HBase on Amazon EMR

Module 5: Serverless data processing

Serverless data processing, transformation and analytics
Using AWS Glue with Amazon EMR workloads
Practice Lab 3: Orchestrating data processing in Spark with AWS Step Functions

Module 6: Security and monitoring of Amazon EMR clusters

Backing up EMR clusters
Interactive demo 3: Client-side encryption with EMRFS
Monitoring and troubleshooting Amazon EMR clusters
Demo: Checking the history of Apache Spark clusters

Module 7: Designing batch data analysis solutions

Use cases for batch data analytics
Activity: Designing a workflow for batch data analysis

Module B: Developing modern data architectures on AWS

Modern data architectures

Your benefit

Comparison of the functions and advantages of data warehouses, data lakes and modern data architectures
Designing and implementing a solution for batch data analysis
Identify and apply appropriate techniques, including compression, to optimize data storage
Selecting and providing suitable options for recording, converting and storing data
Selection of suitable instance and node types, clusters, auto-scaling and network topology for a specific business use case
Understand how data storage and processing impact the analytics and visualization mechanisms required to gain actionable business insights
Backing up data at rest and during transfer
Monitoring of analytics workloads to identify and resolve issues
Application of best practices for cost management

trainer

Milo Fels

Methods

This course includes presentations, interactive demos, hands-on exercises, discussions and class exercises.

Final examination

Recommended for

This course is aimed at the following job roles:

Data analytics

We recommend that participants in this course have the following prerequisites:

At least one year of experience with managing open source data frameworks such as Apache Spark or Apache Hadoop

Building Batch Data Analytics Solutions on AWS

Booking number

Day & time

Price

Booking number

Day & time

Price

Venue

zoom

Contact information

Arrival