pds-it
['Product detail page','no']
Microsoft Technology / Microsoft Azure
The illustrations were created in cooperation between humans and artificial intelligence. They show a future in which technology is omnipresent, but people remain at the center.
AI-generated illustration

Data Engineering on Microsoft Azure (DP-203)

Online
3 days
German
Download PDF
€ 1.890,-
plus VAT.
€ 2.249,10
incl. VAT.
Booking number
33788
Venue
Online
2 dates
€ 1.890,-
plus VAT.
€ 2.249,10
incl. VAT.
Booking number
33788
Venue
Online
2 dates
Become a certified
Machine Learning Engineer
This course is part of the certified Master Class "Machine Learning Engineer". If you book the entire Master Class, you save over 15 percent compared to booking this individual module.
To the Master Class
In-house training
In-house training for your Employees only - exclusive and effective.
Inquiries
In cooperation with
This training takes place in an intensive format where you have full-day sessions with our MCT experts.
Contents

The content of this intensive training is derived from the exam "DP-203: Data Engineering on Microsoft Azure". 


Module 1: Exploring compute and storage options for data science workloads

This module provides an overview of Azure compute and storage technology options available to data scientists creating analytic workloads. This module teaches methods for structuring the data lake and optimizing files for exploration, streaming, and batch processing of workloads. course participants will learn how to organize the data lake into data optimization layers when transforming files through batch and stream processing. They will then learn how to create indexes for their datasets (such as CSV, JSON and Parquet files) and use them for potential query and workload acceleration.

Lessons

  • Introduction to Azure Synapse Analytics
  • Describing Azure Databricks
  • Introduction to Azure Data Lake Storage
  • Describing the Delta Lake architecture
  • Working with data streams using Azure Stream Analytics

Lab: Exploring compute and storage options for data science workloads

  • Combining streaming and batch processing with a single pipeline
  • Organizing the data lake into levels of file transformation
  • Indexing data lake storage to accelerate queries and workloads

 

Module 2: Executing interactive queries using serverless SQL pools from Azure Synapse Analytics

In this module, course participants will learn how to work with files stored in data lakes and external file sources using T-SQL statements executed from a serverless SQL pool in Azure Synapse Analytics. The course participants query Parquet files stored in a data lake and CSV files stored in an external data store. Next, they create Azure Active Directory security groups and enforce access to files in the data lake via Role-Based Access Control (RBAC) and Access Control Lists (ACLs).

Lessons

  • Getting to know serverless SQL pool functions in Azure Synapse
  • Querying data in the lake with serverless SQL pools from Azure Synapse
  • Creating metadata objects in Azure Synapse serverless SQL pools
  • Protect data and manage users in Azure Synapse serverless SQL pools

Lab: Executing interactive queries using serverless SQL pools

  • Querying Parquet data with serverless SQL pools
  • Create external tables for Parquet and CSV files
  • Creating views with serverless SQL pools
  • Protecting access to data in a data lake when using serverless SQL pools
  • Configuring Data Lake security with Role-Based Access Control (RBAC) and Access Control Lists (ACLs)

 

Module 3: Data exploration and transformation in Azure Databricks

In this module, participants will learn how to use different Apache Spark data frame methods to explore and transform data in Azure Databricks. course participants will learn how to execute standard data framework methods to explore and transform data. They will also learn how to perform advanced tasks such as remove duplicate data, edit date-time values, rename columns, and aggregate data.

Lessons

  • Describing Azure Databricks
  • Reading and writing data in Azure Databricks
  • Working with DataFrames in Azure Databricks
  • Working with advanced methods for dataframes in Azure Databricks

Lab: Data exploration and transformation in Azure Databricks

  • Using data frames in Azure Databricks to explore and filter data
  • Caching of a data frame for faster subsequent queries
  • Remove duplicate data
  • Editing date/time values
  • Removing and renaming data frame columns
  • Aggregating data stored in a data frame

 

Module 4: Exploring, transforming and loading data in the data warehouse using Apache Spark

In this module, participants will learn how to explore, transform and load data stored in a data lake into a relational data store. course participants will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. course participants participants will then use Apache Spark to load data into the data warehouse and connect Parquet data in the data lake to data in the dedicated SQL pool.

Lessons

  • Fundamentals of big data development with Apache Spark in Azure Synapse Analytics
  • Collecting data with Apache Spark notebooks in Azure Synapse Analytics
  • Transforming data with dataframes in Apache Spark pools in Azure Synapse Analytics
  • Integrate SQL and Apache Spark pools into Azure Synapse Analytics

Lab: Exploring, transforming and loading data in the data warehouse using Apache Spark

  • Performing the data analysis in Synapse Studio
  • Collecting data with Spark notebooks in Azure Synapse Analytics
  • Transforming data with data frames in Spark pools in Azure Synapse Analytics
  • Integrate SQL and Spark pools into Azure Synapse Analytics

 

Module 5: Capturing and loading data in the data warehouse

In this module, course participants will learn how to collect data using T-SQL scripts and Synapse Analytics integration pipelines in the data warehouse. course participants will learn how to load data into dedicated Synapse SQL pools with PolyBase and COPY using T-SQL. In addition, course participants will learn how to use workload management along with a copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.

Lessons

  • Using best practices for loading data into Azure Synapse Analytics
  • Data acquisition in the petabyte range with Azure Data Factory
  • Lab: Capturing and loading data in the data warehouse
  • Executing captures in the petabyte range with Azure Synapse pipelines
  • Importing data with PolyBase and COPY using T-SQL
  • Using best practices for loading data into Azure Synapse Analytics

 

Module 6: Transforming data with Azure Data Factory or Azure Synapse pipelines

In this module, course participants will learn how to create data integration pipelines to collect data from multiple data sources, transform data using mapping data flows, and move data to one or more data sinks.

Lessons

  • Data integration with Azure Data Factory or Azure Synapse pipelines
  • Large-scale transformation without code with Azure Data Factory or Azure Synapse pipelines
  • Lab: Transforming data with Azure Data Factory or Azure Synapse pipelines
  • Execute transformations without code at scale with Azure Synapse pipelines
  • Create a data pipeline to import poorly formatted CSV files
  • Creating assignment data flows

 

Module 7: Orchestrating data movement and transformation in Azure Synapse pipelines

In this module, participants will learn how to create linked services and orchestrate data movement and transformation using notebooks in Azure Synapse pipelines.

Lessons

  • Orchestrating data movement and transformation in Azure Data Factory

Lab: Orchestrating data movement and transformation in Azure Synapse pipelines

  • Integrate data from notebooks with Azure Data Factory or Azure Synapse pipelines

 

Module 8: End-to-end security with Azure Synapse Analytics

In this module, course participants will learn how to protect a Synapse Analytics workspace and its supporting infrastructure. course participants will observe the SQL Active Directory administrator, manage IP firewall rules, manage secrets with Azure Key Vault, and access those secrets through a Key Vault-linked service and pipeline activities. course participants will learn how to implement column-level security, row-level security, and dynamic data masking when using dedicated SQL pools.

Lessons

  • Protecting a data warehouse database in Azure Synapse Analytics
  • Configure and manage secrets in Azure Key Vault
  • Implementing compliance controls for confidential data

Lab: End-to-end security with Azure Synapse Analytics

  • Protect the supporting Azure Synapse Analytics infrastructure
  • Protect the Azure Synapse Analytics workspace and managed services
  • Protecting data in the Azure Synapse Analytics workspace

 

Module 9: Supporting Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link

In this module, course participants will learn how Azure Synapse Link enables seamless connectivity between an Azure Cosmos DB account and a Synapse workspace. participants will learn how to enable and configure Synapse Link and then query the Azure Cosmos DB analytics store using Apache Spark and serverless SQL pools.

Lessons

  • Designing hybrid transactional and analytical processing using Azure Synapse Analytics
  • Configuring Azure Synapse Link with Azure Cosmos DB
  • Querying Azure Cosmos DB with Apache Spark pools
  • Queries from Azure Cosmos DB with serverless SQL pools

Lab: Supporting Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link

  • Configuring Azure Synapse Link with Azure Cosmos DB
  • Querying Azure Cosmos DB with Apache Spark for Azure Synapse Analytics
  • Querying Azure Cosmos DB with serverless SQL pool for Azure Synapse Analytics

 

Module 10: Stream processing in real time with stream analytics

In this module, course participants will learn how to process streaming data with Azure Stream Analytics. The course participants collect vehicle telemetry data in event hubs and then process this data in real time using various window functions in Azure Stream Analytics. The data is output to Azure Synapse Analytics. Finally, course participants will learn how to scale the Stream Analytics job to increase throughput.

Lessons

  • Enable reliable messaging for big data applications using Azure Event Hubs
  • Working with data streams using Azure Stream Analytics
  • Capturing data streams with Azure Stream Analytics

Lab: Stream processing in real time with stream analytics

  • Using stream analytics to process real-time data from event hubs
  • Using Stream Analytics window functions to create aggregates and output to Synapse Analytics
  • Scaling the Azure Stream Analytics job to increase throughput through partitioning
  • Re-partitioning the stream input to optimize parallelization

 

Module 11: Creating a stream processing solution with event hubs and Azure Databricks

In this module, course participants will learn how to capture and process streaming data on a large scale with Event Hubs and Spark Structured Streaming in Azure Databricks. course participants will learn about the key features and uses of Structured Streaming. participants will implement sliding windows to aggregate data blocks and apply watermarks to remove stale data. Finally, course participants will connect to event hubs to read and write streams.

Lessons

  • Processing streaming data with Structured Streaming in Azure Databricks

Lab: Creating a stream processing solution with event hubs and Azure Databricks

  • Explore the key features and uses of Structured Streaming
  • Streaming data from a file and writing this data to a distributed file system
  • Use sliding windows to aggregate data blocks instead of all data
  • Applying watermarks to remove obsolete data
  • Establishing a connection with read and write streams for event hubs
Your benefit
  • Exploring compute and storage options for data engineering workloads in Azure
  • Design and implementation of the serving layer
  • Understanding data technology considerations
  • Executing interactive queries with serverless SQL pools
  • Exploring, transforming and loading data into the data warehouse with Apache Spark
  • Performing data exploration and transformation in Azure Databricks
  • Importing and loading data into the data warehouse
  • Transforming data with Azure Data Factory or Azure Synapse Pipelines
  • Integrate data from notebooks with Azure Data Factory or Azure Synapse Pipelines
  • Optimize query performance with dedicated SQL pools in Azure Synapse
  • Analyzing and optimizing data warehouse storage
  • Supporting hybrid transactional analytical processing (HTAP) with Azure Synapse
  • Performing end-to-end security with Azure Synapse Analytics
  • Perform stream processing in real time with stream analytics
  • Creating a stream processing solution with event hubs and Azure Databricks
  • Create reports using the Power BI integration with Azure Synapse Analytics
  • Performing integrated machine learning processes in Azure Synapse Analytics

This intensive training prepares you for:
Exam: "D P-203: Data Engineering on Microsoft Azure" for the
certification: "Microsoft Certified: Azure Data Engineer Associate"

trainer
Michael Schulz
Philippe Moser
Methods

This course consists of training training and is led by a trainer who supervises the participants live. Theory and practice are taught with live demonstrations and practical exercises. The video conferencing software Zoom is used.

Final examination
Recommended for

The primary target audience for this course is data experts, data architects and experts intelligence experts who want to learn about data engineering and building analytical solutions with data platform technologies on Microsoft Azure. The secondary audience for this course is data analysts and data scientists who work with analytical solutions built on Microsoft Azure.

Requirements

Successful participants begin this course with knowledge of cloud computing and core data concepts, as well as professional experience with data solutions.

The basic knowledge acquired in the following course is recommended:

  • Microsoft Azure Data Fundamentals
Start dates and details

Form of learning

Learning form

23.6.2025
Online
Places free
Implementation secured
Online
Places free
Implementation secured
15.9.2025
Online
Places free
Implementation secured
Online
Places free
Implementation secured

This training is conducted in cooperation with the authorized training organization Digicomp Academy AG.
For the purpose of conducting the training, participants' data will be transmitted to them and processed there under their own responsibility.
Please take note of the corresponding privacy policy.

Do you have questions about training?
Call us on +49 761 595 33900 or write to us at service@haufe-akademie.de or use the contact form.