TrueschoTruescho
All Courses
Building Batch Data Pipelines on Google Cloud
edX
Course
Beginner
Free to Audit
Certificate

Building Batch Data Pipelines on Google Cloud

Google Cloud

Developers responsible for designing pipelines and architectures for data processing.

5 hrs/week1 weeksEnglish263 enrolled
Free to Audit

About this Course

Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.

What You'll Learn

  • Review different methods of data loading: EL, ELT and ETL and when to use what
  • Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs
  • Build your data processing pipelines using Dataflow
  • Manage data pipelines with Data Fusion and Cloud Composer

Prerequisites

  • To benefit from this course, participants should have completed “Google Cloud Big Data and Machine Learning Fundamentals” or have equivalent experience. Participant should also have: • Basic proficiency with a common query language such as SQL. • Experience with data modeling and ETL (extract, transform, load) activities. • Experience with developing applications using a common programming language such as Python. • Familiarity with machine learning and/or statistics

Instructors

G

Google Cloud Training

Course Team

Course Info

PlatformedX
LevelBeginner
PacingUnknown
CertificateAvailable
PriceFree to Audit

Start Learning Now