Azure Synapse Apache Spark Pools: Data engineering

Microsoft

This course provides training on using Apache Spark within Azure Synapse Analytics for data engineering. You'll learn how to create and manage Spark pools and apply them to process and transform data for pipelines, focusing on practical data engineering workflows within the Azure cloud environment.

1 hrs/week1 weeksEnglish167 enrolled

Free to Audit

About this Course

In this course, you will learn data engineering with Apache Spark in Azure Synapse Analytics, including using Delta Lake and data visualization. You will learn how to master the core features and capabilities of Apache Spark for large-scale data processing and analytics within the Azure Synapse Analytics environment, including configuring Spark pools and using notebooks to run code for loading, analyzing, and visualizing data from a data lake, as well as understanding how Spark works in a distributed environment and how to use dataframes and Spark SQL for data manipulation. You will be introduced to Delta Lake, an open-source storage layer that brings ACID transactions to Apache Spark. You will learn to create and use Delta Lake tables, including updating, querying previous versions (time travel), and using them for streaming data. You will explore how to define tables in the Spark metastore and query them using SQL, as well as how to use Delta Lake tables as sources and sinks for streaming data. Begin transforming data using Spark, including loading data into dataframes, restructuring data, and saving it in formats like Parquet. This course covers partitioning data for optimization and filtering partitioned data in queries. The use of SQL for querying and transforming data is also covered, along with visualizing data within Spark notebooks using built-in charts and Python libraries like Matplotlib.

What You'll Learn

Use Apache Spark in Azure Synapse Analytics for data engineering
Master core Apache Spark features for large-scale data processing
Configure Spark pools and use notebooks to run code
Understand how Spark works in a distributed environment
Use dataframes and Spark SQL for data manipulation
Create and use Delta Lake tables, including updating and querying
Define tables and query them using SQL
Transform data using Spark, including loading and restructuring
Partition data for optimization and filter partitioned data
Visualize data within Spark notebooks using built-in charts and libraries

Topics

Apache Parquet

Data Lakes

Workflow Management

SQL (Programming Language)

Python (Programming Language)

Microsoft Azure

Synapse Citrix

Swimming Pool Maintenance

Apache Spark

Data Engineering

Data Processing

Course Info

PlatformedX

LevelIntermediate

PacingUnknown

CertificateAvailable

PriceFree to Audit

Skills

Apache Parquet

بحيرات البيانات

إدارة سير العمل

SQL

Python

Microsoft Azure

Synapse Citrix

Swimming Pool Maintenance

Apache Spark

Data Engineering

Start Learning Now