Implement Data Engineering Solutions Using Azure Databricks (DP-750)

Preparation for the "SQL AI Developer Associate" Certification

Online

4 days

German

Download PDF

€ 2.690,-

plus VAT.

€ 3.201,10

incl. VAT.

Booking number

42733

Venue

Online

2 dates

€ 2.690,-

plus VAT.

€ 3.201,10

incl. VAT.

Booking number

42733

Venue

Online

2 dates

Become a certified
Machine Learning Engineer

This course is part of the certified Master Class "Machine Learning Engineer". If you book the entire Master Class, you save over 15 percent compared to booking this individual module.

To the Master Class

In-house training

In-house training just for your employees - exclusive and effective.

Inquiries

In cooperation with

Master end-to-end data engineering with Unity Catalog, build robust pipelines, ensure security, and deliver optimized workloads for scalable enterprise lakehouse solutions.

Contents

1. Get started with Azure Databricks:
Azure Databricks is a cloud service that provides a scalable platform for data analysis using Apache Spark.

2. Understanding the Architecture of Azure Databricks
This module describes the hierarchical architecture of Azure Databricks, covering the separation of the control and compute layers, the account hierarchy, and various storage options, including Unity Catalog managed storage.

3. Understanding Azure Databricks Integrations
Learn how Azure Databricks integrates with various Microsoft services, such as Fabric, Power BI, and Copilot Studio, to deliver end-to-end solutions for data engineering, analytics, and AI.

4. Select and configure compute resources in Azure Databricks
Learn how to select and configure compute options in Azure Databricks to optimize them for different workloads, manage performance settings and access permissions, and secure serverless and classic compute resources.

5. Creating and Organizing Objects in the Unity Catalog
This module covers how to use the Unity Catalog’s three-tier namespace (catalogs, schemas, and objects) to organize data resources, create tables and volumes, and configure AI/BI Genie statements to improve data discoverability.

6. Securing Unity Catalog Objects
Learn how to secure Unity Catalog objects using centralized governance and security features such as access control, granular permissions, row/column filtering, and data access authentication via service principals.

7. Governance of Unity Catalog Objects
This section covers basic governance procedures in Unity Catalog, including implementing granular access control, tracking data lineage, configuring audit logs, and securely sharing data to monitor and manage your data assets.

8. Designing and Implementing Data Modeling with Azure Databricks
This module focuses on effective data modeling in Azure Databricks using Unity Catalog and covers designing ingestion logic, selecting tools and formats, implementing partitioning and clustering, and managing slowly changing dimensions.

9. Load data into Unity Catalog
Discover comprehensive data loading techniques in Azure Databricks for loading data into Unity Catalog tables, including managed connectors, custom code, SQL batch loading, streaming ingestion, Auto Loader, and orchestration with Lakeflow Spark Declarative Pipelines.

10. Clean, transform, and load data into Unity Catalog
This module covers basic data engineering techniques for cleaning and transforming raw data, including data quality profiling, value resolution, filtering, aggregation, record combination and reshaping, as well as loading transformed data using append, overwrite, and merge strategies.

11. Implementing and Managing Data Quality Constraints with Azure Databricks
This session explores strategies for maintaining high data quality in Azure Databricks, with a focus on implementing validation checks, enforcing schemas, managing schema drift, and using pipeline expectations for data integrity.

12. Designing and Implementing Data Pipelines with Azure Databricks
Learn how to use notebooks and Lakeflow Spark Declarative Pipelines to design and implement robust data pipelines in Azure Databricks, covering topics such as orchestration, error handling, and task logic.

13. Implementing Lakeflow Jobs with Azure Databricks
This module focuses on implementing Lakeflow jobs in Azure Databricks, guiding you through creating jobs, configuring triggers and schedules, setting up alerts, and managing automatic restarts to ensure reliable execution of data pipelines.

14. Implementing Development Lifecycle Processes in Azure Databricks
This module covers the implementation of development lifecycle processes in Azure Databricks using Git repositories for version control and Databricks Asset Bundles for infrastructure-as-code deployments, including branching workflows, testing, and CLI-based deployment.

15. Monitoring, Troubleshooting, and Optimizing Workloads in Azure Databricks
Learn how to monitor, troubleshoot, and optimize data workloads in Azure Databricks to ensure reliability and cost efficiency. You’ll analyze cluster usage, diagnose Spark jobs, optimize performance, and forward logs to Azure Log Analytics.

Requirements:

Experience working with SQL and Python, including the use of notebooks, as well as familiarity with SQL in terms of data organization and access patterns
A solid understanding of Azure Databricks workspaces and the concepts of the Unity Catalog
Basic knowledge of Azure security, including Microsoft Entra ID (Entra ID), as well as a basic understanding of cloud storage concepts
Basic understanding of data analysis and data engineering concepts
Familiarity with the basics of version control using Git

Learning environment

Your benefit

Setting up the Databricks workspace and establishing comprehensive data governance using Unity Catalog and Microsoft Purview
Organizing data sets (tables, views, and volumes) using catalogs and schemas in Unity Catalog while applying effective naming conventions
Implementing access policies, including fine-grained control (row filtering/column masking) and secure management of credentials via service principals and managed identities
Select and configure compute types, enable performance features such as photon acceleration, and manage autoscaling and Databricks runtime versions for different workloads
Designing data ingestion for batch and streaming data using tools such as Lakeflow Connect, SQL commands (COPY INTO), Auto Loader, or Spark Structured Streaming
Data profiling and transformation (joins, aggregations), managing data types, enforcing schemas, and validating data quality against pipeline expectations
Create and plan data pipelines using Lakeflow Spark Declarative Pipelines or notebooks, managed by Lakeflow jobs with triggers, dependencies, and error handling
Use Git for version control, automate deployment with Databricks Asset Bundles, and monitor performance via the Spark user interface and centralized logging (Azure Log Analytics)

trainers

No items found.

Methods

This course consists of training training and is led by a trainer who supervises the participants live. Theory and practice are taught with live demonstrations and practical exercises. The video conferencing software Zoom is used.

Certification

Prepare for the "Microsoft Certified: Azure Databricks Data Engineer Associate (beta)" exam with this course.

Recommended for

This course is designed for data engineers who have a basic knowledge of data analysis concepts, a fundamental understanding of cloud storage, and familiarity with the principles of data organization.

Implement Data Engineering Solutions Using Azure Databricks (DP-750)

Preparation for the "SQL AI Developer Associate" Certification

Booking number

Price

Modules

Booking number

Day & time

Price

Booking number

Day & time

Price

Venue

Live-Online

Contact information

Arrival

Booking number

Price

Modules

Booking number

Day & time

Price

Booking number

Day & time

Price

Venue

Live-Online

Contact information

Arrival

Do you have questions about training?