In-Depth Course on Amazon Redshift, Redshift Serverless, Integration with EMR, AWS Step Functions, AWS Lambda and more
- A computer science or IT Degree or 1 or 2 years of IT Experience
- Ability to write SQL Queries using any Relational or Data Warehouse or MPP Database
- Basic Linux Skills with ability to run commands using Terminal
- Basic Programming using Python is desired even though it is mandatory for most part of the course
AWS or Amazon Redshift is one of the key AWS Services used in building Data Warehouses or Data Marts to serve reports and dashboards for business users. As part of this course, you will end up learning AWS or Amazon Redshift by going through all the important features of AWS or Amazon Redshift to build Data Warehouses or Data Marts.
We have covered features such as Federated Queries, Redshift Spectrum, Integration with Python, AWS Lambda Functions, Integration of Redshift with EMR, and End-to-End Pipeline using AWS Step Functions.
Here is the detailed outline of the course.
- First, we will understand how to Get Started with Amazon Redshift using AWS Web Console. We will see how to create a cluster, how to connect to the cluster, and also how to run the queries using a Web-based query editor. We will also go ahead and create a Database and tables in the Redshift Cluster. Once we set up a Database and tables, we will also go through the details related to CRUD Operations against tables in Databases in Redshift Cluster.
- Once we have the databases and tables in Redshift Cluster, it is time for us to understand how to get data into the tables in Redshift Cluster. One of the common approaches we use to get data into the Redshift cluster is by Copying Data from s3 into Redshift Tables. We will go through the step-by-step process of copying the data into Redshift tables from s3 using the copy command.
- Python is one of the prominent programming languages to build Data Engineering or ETL Applications. It is extensively used to build ETL Jobs to get data into Database Tables in Redshift Cluster. Once we understand how to get data from s3 to Redshift tables using Copy Command, we will learn how to Develop Python-based Data Engineering or ETL Applications using Redshift Cluster. We will learn how to perform CRUD operations and also how to take run COPY Commands using Python-based programs.
- Once we understand how to build applications using Redshift Cluster, we will go through some of the key concepts used while creating Redshift Tables with Distkeys and Sortkeys.
- We can also connect to remote databases such as Postgres and run queries directly on the remote database tables using Redshift Federated Queries and also we can run queries on top of Glue or Athena Catalog using Redshift Spectrum. You will learn how to leverage Redshift Federated Queries and Spectrum to process data in remote Database tables or s3 without copying the data.
- You will also get an overview of Amazon Redshift Serverless as part of Getting Started with Amazon Redshift Serverless.
- Once you learn Amazon Redshift Serverless, you will end up deploying a Pipeline where a Spark Application is deployed on AWS EMR Cluster which will load the data processed by Spark into Redshift.
Who this course is for:
- University Students who want to learn AWS Redshift for Data Warehousing
- Aspiring Data Engineers and Data Scientists who want to learn about AWS Redshift for Data Warehousing
- Experienced Application Developers who would like to explore AWS Redshift for Data Warehousing
- Experienced Data Engineers to build end to end data pipelines using Python around Data Marts created using AWS Redshift
- Any IT Professional who is keen to deep dive into AWS Redshift for Data Warehousing on AWS