pds-it
['Product detail page','no']
Amazon Web Services / AWS Data Analytics
The illustrations were created in cooperation between humans and artificial intelligence. They show a future in which technology is omnipresent, but people remain at the center.
AI-generated illustration

Building Batch Data Analytics Solutions on AWS

Online
1 day
German
Download PDF
€ 790,-
plus VAT.
€ 940,10
incl. VAT.
Booking number
36413
Venue
Online
2 dates
€ 790,-
plus VAT.
€ 940,10
incl. VAT.
Booking number
36413
Venue
Online
2 dates
Become a certified
Machine Learning Engineer
This course is part of the certified Master Class "Machine Learning Engineer". If you book the entire Master Class, you save over 15 percent compared to booking this individual module.
To the Master Class
In-house training
In-house training for your Employees only - exclusive and effective.
Inquiries
In cooperation with
In this course, you will learn how to build batch data analytics solutions using Amazon EMR, a managed enterprise-class Apache Spark and Apache Hadoop service.
Contents

Learn how Amazon EMR integrates with open source projects like Apache Hive, Hue, and HBase and with AWS services like AWS Glue and AWS Lake Formation. The course covers the components of data collection, ingestion, cataloging, storage, and processing in the context of Spark and Hadoop. You will learn how to use EMR notebooks to support analytics and machine learning workloads. You will also learn to apply best practices for security, performance, and cost management to Amazon EMR operations.

Module A: Overview of Data Analytics and the Data Pipeline

  • Use cases of data analysis 
  • Use of the data pipeline for data analysis

 

Module 1: Introduction to Amazon EMR

  • Use of Amazon EMR in analytics solutions
  • Architecture of Amazon EMR clusters
  • Interactive demo 1: Starting an Amazon EMR cluster
  • Cost management strategies

 

Module 2: Data analysis pipeline with Amazon EMR: Ingestion and storage

  • Storage optimization with Amazon EMR
  • Techniques for data transfer

 

Module 3: Powerful batch data analysis with Apache Spark on Amazon EMR

  • Use cases for Apache Spark on Amazon EMR
  • Why Apache Spark on Amazon EMR?
  • Spark concepts
  • Interactive Demo 2: Connect to an EMR cluster and run Scala commands with the Spark shell
  • Transformation, processing and analysis
  • Use of notebooks with Amazon EMR
  • Practice Lab 1: Low-latency data analysis with Apache Spark on Amazon EMR

 

Module 4: Processing and analyzing batch data with Amazon EMR and Apache Hive

  • Using Amazon EMR with Hive to process batch data
  • Transformation, processing and analysis
  • Practice Lab 2: Batch data processing with Amazon EMR and Hive
  • Introduction to Apache HBase on Amazon EMR

 

Module 5: Serverless data processing

  • Serverless data processing, transformation and analytics
  • Using AWS Glue with Amazon EMR workloads
  • Practice Lab 3: Orchestrating data processing in Spark with AWS Step Functions

 

Module 6: Security and monitoring of Amazon EMR clusters

  • Backing up EMR clusters
  • Interactive demo 3: Client-side encryption with EMRFS
  • Monitoring and troubleshooting Amazon EMR clusters
  • Demo: Checking the history of Apache Spark clusters

 

Module 7: Designing batch data analysis solutions

  • Use cases for batch data analytics
  • Activity: Designing a workflow for batch data analysis

 

Module B: Developing modern data architectures on AWS

  • Modern data architectures
Your benefit
  • Comparison of the functions and advantages of data warehouses, data lakes and modern data architectures
  • Designing and implementing a solution for batch data analysis
  • Identify and apply appropriate techniques, including compression, to optimize data storage
  • Selecting and providing suitable options for recording, converting and storing data 
  • Selection of suitable instance and node types, clusters, auto-scaling and network topology for a specific business use case
  • Understand how data storage and processing impact the analytics and visualization mechanisms required to gain actionable business insights
  • Backing up data at rest and during transfer
  • Monitoring of analytics workloads to identify and resolve issues
  • Application of best practices for cost management
trainer
Milo Fels
Methods

This course includes presentations, interactive demos, hands-on exercises, discussions and class exercises.

Final examination
Recommended for

This course is aimed at the following job roles:

  • Data analytics

We recommend that participants in this course have the following prerequisites:

  • At least one year of experience with managing open source data frameworks such as Apache Spark or Apache Hadoop
Start dates and details

Form of learning

Learning form

19.9.2025
Online
Places free
Implementation secured
Online
Places free
Implementation secured
21.11.2025
Online
Places free
Implementation secured
Online
Places free
Implementation secured

The training is carried out in cooperation with an authorized training partner.

The latter collects and processes data under its own responsibility. Please take note of the corresponding privacy policy

 

Do you have questions about training?
Call us on +49 761 595 33900 or write to us at service@haufe-akademie.de or use the contact form.