This course introduces you to the concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS. We will demonstrate how to collect, store, and prepare data for the data warehouse using other AWS services such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis Firehose, and Amazon Simple Storage Service (Amazon S3). We will also explain how you can use business intelligence (BI) tools to analyze your data.
This course is additionally enriched with content from the Building Data Analytics Solutions using Amazon Redshift course
Module 1: Introduction to Data Warehousing
- Relational databases
- Data warehousing concepts
- The overlap between data warehousing and big data
- Overview of data management in AWS
- Practical Lab 1: Introduction to Amazon Redshift
Module 2: Introduction to Amazon Redshift
- Conceptual overview
- Use cases from practice
- Interactive demo 1: Tour of the Amazon Redshift console
- Practical Lab 2: Starting an Amazon Redshift Cluster
- RA3 Nodes and AQUA architecture
- Amazon Redshift ML
Module 3: Starting clusters
- Structure of the cluster
- Connecting to the cluster
- Access control
- Database security
- Load data
- Practical Lab 3: Optimizing database schemas
- Optional Lab: Starting an Amazon Redshift Cluster
Module 4: Designing the database schema
- Schemas and data types
- Column-by-column compression
- Styles of data distribution
- Methods for data sorting
- Practical Lab 3: Optimizing database schemas
Module 5: Identification of data sources
- Overview of data sources
- Amazon S3
- Amazon DynamoDB
- Amazon EMR
- Amazon Kinesis data fire pants
- AWS Lambda database loader for Amazon Redshift
- Redshift Data API
- SUPER Data Type
- Interactive Demo 2: Connecting your Amazon Redshift cluster to a Jupyter notebook with Data API
- Interactive demo 3: Analyzing semi-structured data with the SUPER data type
- Practical Lab 4: Loading real-time data into an Amazon Redshift database
Module 6: Loading data
- Preparing data
- Data warehousing on AWS
- Loading data with COPY
- Maintaining tables
- Simultaneous write operations
- Troubleshooting for charging problems
- Practical exercise 5: Loading data with the COPY command
Module 7: Writing queries and performance tuning
- Amazon Redshift SQL
- User-defined functions (UDFs)
- Factors that influence query performance
- The EXPLAIN command and query plans
- Workload management (WLM)
- Interactive Demo 4: Applying mixed workload management on Amazon Redshift
- Practical Lab 6: Configuring workload management
Module 8: Amazon Redshift Spectrum
- Amazon Redshift Spectrum
- Configuring data for Amazon Redshift Spectrum
- Amazon Redshift Spectrum queries
- Data transformation
- Data sharing
- Exercise Lab 2: Data analysis with Amazon Redshift Spectrum
- Exercise Lab 3: Data transformation and retrieval in Amazon Redshift
- Practical Lab 7: Using Amazon Redshift Spectrum
Module 9: Maintenance of clusters
- Audit logging
- Performance monitoring
- Events and notifications
- Practical exercises 8: Auditing and monitoring clusters
- Resizing clusters
- Backing up and restoring clusters
- Resource labeling and limits and restrictions
- Practical Lab 9: Saving, restoring and resizing clusters
- Optional: Data analysis and visualization
Module 10: Analyzing and visualizing data
- Performance of visualizations
- Creating dashboards
- Amazon QuickSight editions and functions