Most popular programs
Trending now
Master Scalable Data Engineering with Cutting-Edge Tools
Course Highlights:
Upskill to design, build & optimize data engineering pipelines that can handle complex, large-scale datasets. Prepare for demanding data roles by mastering advanced techniques with this comprehensive training.
Module 1: Queues and Databases-RabbitMQ and MySQL (6 hours)
\\\\\\\\- Video: Meet your instructor: Alfredo Deza (1 minute, Preview module)
\\\\\\\\- Video: About this course (2 minutes)
\\\\\\\\- Reading: Connect with your instructor (10 minutes)
\\\\\\\\- Reading: Meet your instructor: Noah Gift (10 minutes)
\\\\\\\\- Reading: Course structure and discussion etiquette (10 minutes)
\\\\\\\\- Video: Introduction (1 minute)
\\\\\\\\- Video: Overview of Queues (5 minutes)
\\\\\\\\- Video: What is Celery? (3 minutes)
\\\\\\\\- Reading: Key Terms (10 minutes)
\\\\\\\\- Reading: Introduction to Celery (10 minutes)
\\\\\\\\- Video: Use cases for RabbitMQ (3 minutes)
\\\\\\\\- Reading: Using RabbitMQ with Docker (10 minutes)
\\\\\\\\- Reading: External lab: Start RabbitMQ in a development environment (10 minutes)
\\\\\\\\- Video: Overview of a Flask and Celery application (3 minutes)
\\\\\\\\- Video: Summary (1 minute)
\\\\\\\\- Quiz: Introduction to RabbitMQ and Flask (30 minutes)
\\\\\\\\- Video: Introduction (0 minutes)
\\\\\\\\- Video: Configuring Celery with Flask (4 minutes)
\\\\\\\\- Video: Connecting Celery with RabbitMQ (5 minutes)
\\\\\\\\- Reading: Key Terms (10 minutes)
\\\\\\\\- Reading: Build a web app by using Python and Flask (10 minutes)
\\\\\\\\- Reading: Background tasks with Celery (10 minutes)
\\\\\\\\- Video: Defining a Celery task in Flask (3 minutes)
\\\\\\\\- Video: Fire and forget task in Flask (2 minutes)
\\\\\\\\- Video: Retrieve values from asynchronous tasks (3 minutes)
\\\\\\\\- Reading: External lab: Add a new Celery task for RabbitMQ (10 minutes)
\\\\\\\\- Video: Summary (1 minute)
\\\\\\\\- Quiz: RabbitMQ with Celery and Flask (30 minutes)
\\\\\\\\- Video: MySQL Overview (2 minutes)
\\\\\\\\- Reading: Key Terms (10 minutes)
\\\\\\\\- Reading: Getting Started with MySQL (10 minutes)
\\\\\\\\- Video: MySQL from Terminal (3 minutes)
\\\\\\\\- Video: Archive and Drop Database (5 minutes)
\\\\\\\\- Video: Import external database Sakila (7 minutes)
\\\\\\\\- Video: Modify database Sakila (4 minutes)
\\\\\\\\- Video: Bash pipelines with MySQL (5 minutes)
\\\\\\\\- Video: MySQL to Python Standard Library Web Server (4 minutes)
\\\\\\\\- Ungraded Lab: Linux Hacking with MySQL (60 minutes)
\\\\\\\\- Quiz: Quiz-MySQL for Data Engineering (30 minutes)
\\\\\\\\- Reading: Lesson Reflection (10 minutes)
\\\\\\\\- Discussion Prompt: Meet and greet (optional) (10 minutes)
\\\\\\\\- Quiz: Queues and Databases - Final week quiz (30 minutes)
****
Module 2: Optimizing Workflow Management at Scale with Apache Airflow (5 hours)
- Video: Introduction (1 minute, Preview module)
- Video: What is Apache Airflow? (6 minutes)
- Reading: Key Terms (10 minutes)
- Reading: What is Apache Airflow (10 minutes)
- Video: Installing Apache Airflow from PyPI (5 minutes)
- Video: Using Apache Airflow with Docker (6 minutes)
- Reading: Exploring the Airflow User Interface (10 minutes)
- Reading: External lab: Install Apache Airflow (10 minutes)
- Video: Exploring the Airflow UI (6 minutes)
- Quiz: Quiz-Installing Apache Airflow (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Introduction (0 minutes)
- Video: Exploring directed acyclic graphs (DAG) (10 minutes)
- Reading: Key Terms (10 minutes)
- Reading: External lab: Create a DAG (10 minutes)
- Video: Creating a DAG (7 minutes)
- Video: Running a backfill (4 minutes)
- Reading: Architecture overview (10 minutes)
- Video: Testing and validation (7 minutes)
- Video: Summary (0 minutes)
- Quiz: Quiz-Apache Airflow Fundamentals (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Introduction (1 minute)
- Video: Identifying a task to build a DAG (4 minutes)
- Reading: Key Terms (10 minutes)
- Reading: External Lab: Build a data pipeline for census data (10 minutes)
- Video: Retrieving remote data (4 minutes)
- Video: Cleaning and normalizing data (4 minutes)
- Video: Inspecting the UI for results (4 minutes)
- Reading: Build Data Pipelines with Apache Airflow (10 minutes)
- Video: Summary (1 minute)
- Reading: Lesson Reflection (10 minutes)
- Quiz: Quiz-Creating a pipeline (30 minutes)
- Quiz: Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow (30 minutes)
****
Module 3: Achieving Scalability with Vector, Graph, and Key/Value Databases (5 hours)
- Video: Picking the proper database (3 minutes, Preview module)
- Video: What are vector databases and how they work (2 minutes)
- Reading: Key Terms (10 minutes)
- Reading: What is a Vector Database? (10 minutes)
- Video: Implementing Semantic search (4 minutes)
- Video: Quickstart Qdrant (3 minutes)
- Reading: External Lab: Run Quickstart of Qdrant (10 minutes)
- Video: Qdrant Rust Client (3 minutes)
- Reading: External Lab: Extend Semantic Search (10 minutes)
- Video: Vector Database Architectures (2 minutes)
- Video: Hands-on lab: Enhance Semantic Search (3 minutes)
- Reading: Jaccard index (10 minutes)
- Quiz: Quiz-Introduction to Vector Databases (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Graph data models and database concepts (2 minutes)
- Reading: Key Terms (10 minutes)
- Reading: Rust CLI with Clap (10 minutes)
- Video: Introduction to Amazon Neptune (2 minutes)
- Reading: External Lab: Rust Graph CLI Tool (10 minutes)
- Video: Graph algorithms: UFC graph centrality in Rust (4 minutes)
- Video: Kosaraju Community Detection in Graphs (4 minutes)
- Video: Shortest Path with Graphs (3 minutes)
- Reading: Amazon Neptune (10 minutes)
- Video: Key Components of Rust CLI Tool (1 minute)
- Video: Lab Walkthrough: Building a Rust Graph CLI Tool (2 minutes)
- Quiz: Quiz-Introduction to Graph Databases (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Quiz: Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases (30 minutes)
- Ungraded Lab: Social Media Recommender (60 minutes)
****
Module 4: Real-world Advanced Data Engineering Projects (5 hours)
- Video: Learn AWS CloudShell for Dynamo Development (4 minutes, Preview module)
- Video: Learn AWS CodeCatalyst for Dynamo Development (5 minutes)
- Reading: Key Terms (10 minutes)
- Reading: Amazon CodeCatalyst (10 minutes)
- Video: Leveraging AWS CodeWhisperer for Dynamo Development (4 minutes)
- Video: Create a Table with CLI (1 minute)
- Video: Populate a Table With Batching Records (1 minute)
- Video: Query a Table with Records (2 minutes)
- Reading: External Lab: Extended DynamoDB (10 minutes)
- Video: Project Walkthrough (2 minutes)
- Quiz: Quiz-Building a solution with DynamoDB with the AWS CLI (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Introduction (1 minute)
- Video: Overview of a pipeline requirements (3 minutes)
- Reading: Key Terms (10 minutes)
- Reading: Quick start for SQLAlchemy (10 minutes)
- Video: Using SqlAlchemy with Pandas (6 minutes)
- Reading: Explore and analyze data with Python (10 minutes)
- Video: Persisting data in a task (6 minutes)
- Video: Reviewing the results (4 minutes)
- Video: Summary (1 minute)
- Quiz: Quiz-Persisting data through a multi-task DAG with Pandas (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Reading: Recommended Next Steps (10 minutes)
- Quiz: Final Quiz-Advanced Data Engineering (30 minutes)
- Ungraded Lab: Jupyter Sandbox (60 minutes)
- Ungraded Lab: VS Code Sandbox (60 minutes)
Who can take this course?
Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.
Who can take this course?
Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.