The content of this intensive training is derived from the exam "DP-203: Data Engineering on Microsoft Azure".
Module 1: Exploring compute and storage options for data science workloads
This module provides an overview of Azure compute and storage technology options available to data scientists creating analytic workloads. This module teaches methods for structuring the data lake and optimizing files for exploration, streaming, and batch processing of workloads. course participants will learn how to organize the data lake into data optimization layers when transforming files through batch and stream processing. They will then learn how to create indexes for their datasets (such as CSV, JSON and Parquet files) and use them for potential query and workload acceleration.
Lessons
Lab: Exploring compute and storage options for data science workloads
Module 2: Executing interactive queries using serverless SQL pools from Azure Synapse Analytics
In this module, course participants will learn how to work with files stored in data lakes and external file sources using T-SQL statements executed from a serverless SQL pool in Azure Synapse Analytics. The course participants query Parquet files stored in a data lake and CSV files stored in an external data store. Next, they create Azure Active Directory security groups and enforce access to files in the data lake via Role-Based Access Control (RBAC) and Access Control Lists (ACLs).
Lessons
Lab: Executing interactive queries using serverless SQL pools
Module 3: Data exploration and transformation in Azure Databricks
In this module, participants will learn how to use different Apache Spark data frame methods to explore and transform data in Azure Databricks. course participants will learn how to execute standard data framework methods to explore and transform data. They will also learn how to perform advanced tasks such as remove duplicate data, edit date-time values, rename columns, and aggregate data.
Lessons
Lab: Data exploration and transformation in Azure Databricks
Module 4: Exploring, transforming and loading data in the data warehouse using Apache Spark
In this module, participants will learn how to explore, transform and load data stored in a data lake into a relational data store. course participants will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. course participants participants will then use Apache Spark to load data into the data warehouse and connect Parquet data in the data lake to data in the dedicated SQL pool.
Lessons
Lab: Exploring, transforming and loading data in the data warehouse using Apache Spark
Module 5: Capturing and loading data in the data warehouse
In this module, course participants will learn how to collect data using T-SQL scripts and Synapse Analytics integration pipelines in the data warehouse. course participants will learn how to load data into dedicated Synapse SQL pools with PolyBase and COPY using T-SQL. In addition, course participants will learn how to use workload management along with a copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.
Lessons
Module 6: Transforming data with Azure Data Factory or Azure Synapse pipelines
In this module, course participants will learn how to create data integration pipelines to collect data from multiple data sources, transform data using mapping data flows, and move data to one or more data sinks.
Lessons
Module 7: Orchestrating data movement and transformation in Azure Synapse pipelines
In this module, participants will learn how to create linked services and orchestrate data movement and transformation using notebooks in Azure Synapse pipelines.
Lessons
Lab: Orchestrating data movement and transformation in Azure Synapse pipelines
Module 8: End-to-end security with Azure Synapse Analytics
In this module, course participants will learn how to protect a Synapse Analytics workspace and its supporting infrastructure. course participants will observe the SQL Active Directory administrator, manage IP firewall rules, manage secrets with Azure Key Vault, and access those secrets through a Key Vault-linked service and pipeline activities. course participants will learn how to implement column-level security, row-level security, and dynamic data masking when using dedicated SQL pools.
Lessons
Lab: End-to-end security with Azure Synapse Analytics
Module 9: Supporting Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link
In this module, course participants will learn how Azure Synapse Link enables seamless connectivity between an Azure Cosmos DB account and a Synapse workspace. participants will learn how to enable and configure Synapse Link and then query the Azure Cosmos DB analytics store using Apache Spark and serverless SQL pools.
Lessons
Lab: Supporting Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link
Module 10: Stream processing in real time with stream analytics
In this module, course participants will learn how to process streaming data with Azure Stream Analytics. The course participants collect vehicle telemetry data in event hubs and then process this data in real time using various window functions in Azure Stream Analytics. The data is output to Azure Synapse Analytics. Finally, course participants will learn how to scale the Stream Analytics job to increase throughput.
Lessons
Lab: Stream processing in real time with stream analytics
Module 11: Creating a stream processing solution with event hubs and Azure Databricks
In this module, course participants will learn how to capture and process streaming data on a large scale with Event Hubs and Spark Structured Streaming in Azure Databricks. course participants will learn about the key features and uses of Structured Streaming. participants will implement sliding windows to aggregate data blocks and apply watermarks to remove stale data. Finally, course participants will connect to event hubs to read and write streams.
Lessons
Lab: Creating a stream processing solution with event hubs and Azure Databricks
This intensive training prepares you for:
Exam: "D P-203: Data Engineering on Microsoft Azure" for the
certification: "Microsoft Certified: Azure Data Engineer Associate"
This course consists of training training and is led by a trainer who supervises the participants live. Theory and practice are taught with live demonstrations and practical exercises. The video conferencing software Zoom is used.
The primary target audience for this course is data experts, data architects and experts intelligence experts who want to learn about data engineering and building analytical solutions with data platform technologies on Microsoft Azure. The secondary audience for this course is data analysts and data scientists who work with analytical solutions built on Microsoft Azure.
Requirements
Successful participants begin this course with knowledge of cloud computing and core data concepts, as well as professional experience with data solutions.
The basic knowledge acquired in the following course is recommended:
Form of learning
Learning form
No filter results
This training is conducted in cooperation with the authorized training organization Digicomp Academy AG.
For the purpose of conducting the training, participants' data will be transmitted to them and processed there under their own responsibility.
Please take note of the corresponding privacy policy.