Data Engineering on Microsoft Azure (DP-203)

Online

3 days

German

Download PDF

€ 1.890,-

plus VAT.

€ 2.249,10

incl. VAT.

Booking number

33788

Venue

Online

1 appointment

€ 1.890,-

plus VAT.

€ 2.249,10

incl. VAT.

Booking number

33788

Venue

Online

1 appointment

Become a certified
Machine Learning Engineer

This course is part of the certified Master Class "Machine Learning Engineer". If you book the entire Master Class, you save over 15 percent compared to booking this individual module.

To the Master Class

In-house training

In-house training just for your employees - exclusive and effective.

Inquiries

In cooperation with

This training takes place in an intensive format where you have full-day sessions with our MCT experts.

Contents

The content of this intensive training is derived from the exam "DP-203: Data Engineering on Microsoft Azure".

Module 1: Exploring compute and storage options for data science workloads

This module provides an overview of Azure compute and storage technology options available to data scientists creating analytic workloads. This module teaches methods for structuring the data lake and optimizing files for exploration, streaming, and batch processing of workloads. course participants will learn how to organize the data lake into data optimization layers when transforming files through batch and stream processing. They will then learn how to create indexes for their datasets (such as CSV, JSON and Parquet files) and use them for potential query and workload acceleration.

Lessons

Introduction to Azure Synapse Analytics
Describing Azure Databricks
Introduction to Azure Data Lake Storage
Describing the Delta Lake architecture
Working with data streams using Azure Stream Analytics

Lab: Exploring compute and storage options for data science workloads

Combining streaming and batch processing with a single pipeline
Organizing the data lake into levels of file transformation
Indexing data lake storage to accelerate queries and workloads

Module 2: Executing interactive queries using serverless SQL pools from Azure Synapse Analytics

In this module, course participants will learn how to work with files stored in data lakes and external file sources using T-SQL statements executed from a serverless SQL pool in Azure Synapse Analytics. The course participants query Parquet files stored in a data lake and CSV files stored in an external data store. Next, they create Azure Active Directory security groups and enforce access to files in the data lake via Role-Based Access Control (RBAC) and Access Control Lists (ACLs).

Lessons

Getting to know serverless SQL pool functions in Azure Synapse
Querying data in the lake with serverless SQL pools from Azure Synapse
Creating metadata objects in Azure Synapse serverless SQL pools
Protect data and manage users in Azure Synapse serverless SQL pools

Lab: Executing interactive queries using serverless SQL pools

Querying Parquet data with serverless SQL pools
Create external tables for Parquet and CSV files
Creating views with serverless SQL pools
Protecting access to data in a data lake when using serverless SQL pools
Configuring Data Lake security with Role-Based Access Control (RBAC) and Access Control Lists (ACLs)

Module 3: Data exploration and transformation in Azure Databricks

In this module, participants will learn how to use different Apache Spark data frame methods to explore and transform data in Azure Databricks. course participants will learn how to execute standard data framework methods to explore and transform data. They will also learn how to perform advanced tasks such as remove duplicate data, edit date-time values, rename columns, and aggregate data.

Lessons

Describing Azure Databricks
Reading and writing data in Azure Databricks
Working with DataFrames in Azure Databricks
Working with advanced methods for dataframes in Azure Databricks

Lab: Data exploration and transformation in Azure Databricks

Using data frames in Azure Databricks to explore and filter data
Caching of a data frame for faster subsequent queries
Remove duplicate data
Editing date/time values
Removing and renaming data frame columns
Aggregating data stored in a data frame

Module 4: Exploring, transforming and loading data in the data warehouse using Apache Spark

In this module, attendees will learn how to explore and transform data stored in a data lake and load it into a relational data store. course participants will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. course participants participants will then use Apache Spark to load data into the data warehouse and connect Parquet data in the data lake to data in the dedicated SQL pool.

Lessons

Fundamentals of big data development with Apache Spark in Azure Synapse Analytics
Collecting data with Apache Spark notebooks in Azure Synapse Analytics
Transforming data with dataframes in Apache Spark pools in Azure Synapse Analytics
Integrate SQL and Apache Spark pools into Azure Synapse Analytics

Lab: Exploring, transforming and loading data in the data warehouse using Apache Spark

Performing the data analysis in Synapse Studio
Collecting data with Spark notebooks in Azure Synapse Analytics
Transforming data with data frames in Spark pools in Azure Synapse Analytics
Integrate SQL and Spark pools into Azure Synapse Analytics

Module 5: Capturing and loading data in the data warehouse

In this module, course participants will learn how to collect data using T-SQL scripts and Synapse Analytics integration pipelines in the data warehouse. course participants will learn how to load data into dedicated Synapse SQL pools with PolyBase and COPY using T-SQL. In addition, course participants will learn how to use workload management along with a copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.

Lessons

Using best practices for loading data into Azure Synapse Analytics
Data acquisition in the petabyte range with Azure Data Factory
Lab: Capturing and loading data in the data warehouse
Executing captures in the petabyte range with Azure Synapse pipelines
Importing data with PolyBase and COPY using T-SQL
Using best practices for loading data into Azure Synapse Analytics

Module 6: Transforming data with Azure Data Factory or Azure Synapse pipelines

In this module, course participants will learn how to create data integration pipelines to collect data from multiple data sources, transform data using mapping data flows, and move data to one or more data sinks.

Lessons

Data integration with Azure Data Factory or Azure Synapse pipelines
Large-scale transformation without code with Azure Data Factory or Azure Synapse pipelines
Lab: Transforming data with Azure Data Factory or Azure Synapse pipelines
Execute transformations without code at scale with Azure Synapse pipelines
Create a data pipeline to import poorly formatted CSV files
Creating assignment data flows

Module 7: Orchestrating data movement and transformation in Azure Synapse pipelines

In this module, participants will learn how to create linked services and orchestrate data movement and transformation using notebooks in Azure Synapse pipelines.

Lessons

Orchestrating data movement and transformation in Azure Data Factory

Lab: Orchestrating data movement and transformation in Azure Synapse pipelines

Integrate data from notebooks with Azure Data Factory or Azure Synapse pipelines

Module 8: End-to-end security with Azure Synapse Analytics

In this module, course participants will learn how to protect a Synapse Analytics workspace and its supporting infrastructure. course participants will observe the SQL Active Directory administrator, manage IP firewall rules, manage secrets with Azure Key Vault, and access those secrets through a Key Vault-linked service and pipeline activities. course participants will learn how to implement column-level security, row-level security, and dynamic data masking when using dedicated SQL pools.

Lessons

Protecting a data warehouse database in Azure Synapse Analytics
Configure and manage secrets in Azure Key Vault
Implementing compliance controls for confidential data

Lab: End-to-end security with Azure Synapse Analytics

Protect the supporting Azure Synapse Analytics infrastructure
Protect the Azure Synapse Analytics workspace and managed services
Protecting data in the Azure Synapse Analytics workspace

Module 9: Supporting Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link

In this module, course participants will learn how Azure Synapse Link enables seamless connectivity of an Azure Cosmos DB account to a Synapse workspace. attendees will learn how to enable and configure Synapse Link and then query the Azure Cosmos DB analytics store using Apache Spark and serverless SQL pools.

Lessons

Designing hybrid transactional and analytical processing using Azure Synapse Analytics
Configuring Azure Synapse Link with Azure Cosmos DB
Querying Azure Cosmos DB with Apache Spark pools
Queries from Azure Cosmos DB with serverless SQL pools

Lab: Supporting Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link

Configuring Azure Synapse Link with Azure Cosmos DB
Querying Azure Cosmos DB with Apache Spark for Azure Synapse Analytics
Querying Azure Cosmos DB with serverless SQL pool for Azure Synapse Analytics

Module 10: Stream processing in real time with stream analytics

In this module, course participants will learn how to process streaming data with Azure Stream Analytics. The course participants collect vehicle telemetry data in event hubs and then process this data in real time using various window functions in Azure Stream Analytics. The data is output to Azure Synapse Analytics. Finally, course participants will learn how to scale the Stream Analytics job to increase throughput.

Lessons

Enable reliable messaging for big data applications using Azure Event Hubs
Working with data streams using Azure Stream Analytics
Capturing data streams with Azure Stream Analytics

Lab: Stream processing in real time with stream analytics

Using stream analytics to process real-time data from event hubs
Using Stream Analytics window functions to create aggregates and output to Synapse Analytics
Scaling the Azure Stream Analytics job to increase throughput through partitioning
Re-partitioning the stream input to optimize parallelization

Module 11: Creating a stream processing solution with event hubs and Azure Databricks

In this module, course participants will learn how to capture and process streaming data on a large scale with Event Hubs and Spark Structured Streaming in Azure Databricks. course participants participants will learn about the key features and uses of Structured Streaming. attendees will implement sliding windows to aggregate data blocks and apply watermarks to remove stale data. Finally, course participants participants will connect to event hubs to read and write streams.

Lessons

Processing streaming data with Structured Streaming in Azure Databricks

Lab: Creating a stream processing solution with event hubs and Azure Databricks

Explore the key features and uses of Structured Streaming
Streaming data from a file and writing this data to a distributed file system
Use sliding windows to aggregate data blocks instead of all data
Applying watermarks to remove obsolete data
Establishing a connection with read and write streams for event hubs

Your benefit

Exploring compute and storage options for data engineering workloads in Azure
Design and implementation of the serving layer
Understanding data technology considerations
Executing interactive queries with serverless SQL pools
Exploring, transforming and loading data into the data warehouse with Apache Spark
Performing data exploration and transformation in Azure Databricks
Importing and loading data into the data warehouse
Transforming data with Azure Data Factory or Azure Synapse Pipelines
Integrate data from notebooks with Azure Data Factory or Azure Synapse Pipelines
Optimize query performance with dedicated SQL pools in Azure Synapse
Analyzing and optimizing data warehouse storage
Supporting hybrid transactional analytical processing (HTAP) with Azure Synapse
Performing end-to-end security with Azure Synapse Analytics
Perform stream processing in real time with stream analytics
Creating a stream processing solution with event hubs and Azure Databricks
Create reports using the Power BI integration with Azure Synapse Analytics
Performing integrated machine learning processes in Azure Synapse Analytics

This intensive training prepares you for:
Exam: "D P-203: Data Engineering on Microsoft Azure" for the
certification: "Microsoft Certified: Azure Data Engineer Associate"

trainer

Michael Schulz

Philippe Moser

Methods

This course consists of training training and is led by a trainer who supervises the participants live. Theory and practice are taught with live demonstrations and practical exercises. The video conferencing software Zoom is used.

Final examination

Recommended for

The primary target audience for this course is data experts, data architects and experts intelligence experts who want to learn about data engineering and building analytical solutions with data platform technologies on Microsoft Azure. The secondary audience for this course is data analysts and data scientists who work with analytical solutions built on Microsoft Azure.

Requirements

Successful attendees begin this course with knowledge of cloud computing and core data concepts as well as professional experience with data solutions.

The basic knowledge acquired in the following course is recommended:

Microsoft Azure Data Fundamentals

Data Engineering on Microsoft Azure (DP-203)

Booking number

Day & time

Price

Booking number

Day & time

Price

Venue

zoom

Contact information

Arrival