All Courses
Azure Synapse Apache Spark Pools: Data engineering
edX
Course
Intermediate
Free to Audit
Certificate

Azure Synapse Apache Spark Pools: Data engineering

Microsoft

This course provides training on using Apache Spark within Azure Synapse Analytics for data engineering. You'll learn how to create and manage Spark pools and apply them to process and transform data for pipelines, focusing on practical data engineering workflows within the Azure cloud environment.

1 hrs/week1 weeksEnglish167 enrolled
Free to Audit

About this Course

In this course, you will learn data engineering with Apache Spark in Azure Synapse Analytics, including using Delta Lake and data visualization. You will learn how to master the core features and capabilities of Apache Spark for large-scale data processing and analytics within the Azure Synapse Analytics environment, including configuring Spark pools and using notebooks to run code for loading, analyzing, and visualizing data from a data lake, as well as understanding how Spark works in a distributed environment and how to use dataframes and Spark SQL for data manipulation. You will be introduced to Delta Lake, an open-source storage layer that brings ACID transactions to Apache Spark. You will learn to create and use Delta Lake tables, including updating, querying previous versions (time travel), and using them for streaming data. You will explore how to define tables in the Spark metastore and query them using SQL, as well as how to use Delta Lake tables as sources and sinks for streaming data. Begin transforming data using Spark, including loading data into dataframes, restructuring data, and saving it in formats like Parquet. This course covers partitioning data for optimization and filtering partitioned data in queries. The use of SQL for querying and transforming data is also covered, along with visualizing data within Spark notebooks using built-in charts and Python libraries like Matplotlib.

What You'll Learn

  • Use Apache Spark in Azure Synapse Analytics for data engineering
  • Master core Apache Spark features for large-scale data processing
  • Configure Spark pools and use notebooks to run code
  • Understand how Spark works in a distributed environment
  • Use dataframes and Spark SQL for data manipulation
  • Create and use Delta Lake tables, including updating and querying
  • Define tables and query them using SQL
  • Transform data using Spark, including loading and restructuring
  • Partition data for optimization and filter partitioned data
  • Visualize data within Spark notebooks using built-in charts and libraries

Topics

Apache Parquet
Data Lakes
Workflow Management
SQL (Programming Language)
Python (Programming Language)
Microsoft Azure
Synapse Citrix
Swimming Pool Maintenance
Apache Spark
Data Engineering
Data Processing

Course Info

PlatformedX
LevelIntermediate
PacingUnknown
CertificateAvailable
PriceFree to Audit

Skills

Apache Parquet
بحيرات البيانات
إدارة سير العمل
SQL
Python
Microsoft Azure
Synapse Citrix
Swimming Pool Maintenance
Apache Spark
Data Engineering

Start Learning Now