Course Outline
Introduction
- Overview of deep learning scaling challenges
- Overview of DeepSpeed and its features
- DeepSpeed vs. other distributed deep learning libraries
Getting Started
- Setting up the development environment
- Installing PyTorch and DeepSpeed
- Configuring DeepSpeed for distributed training
DeepSpeed Optimization Features
- DeepSpeed training pipeline
- ZeRO (memory optimization)
- Activation checkpointing
- Gradient checkpointing
- Pipeline parallelism
Scaling Models with DeepSpeed
- Basic scaling using DeepSpeed
- Advanced scaling techniques
- Performance considerations and best practices
- Debugging and troubleshooting techniques
Advanced DeepSpeed Topics
- Advanced optimization techniques
- Using DeepSpeed with mixed precision training
- DeepSpeed on different hardware (e.g. GPUs, TPUs)
- DeepSpeed with multiple training nodes
Integrating DeepSpeed with PyTorch
- Integrating DeepSpeed with PyTorch workflows
- Using DeepSpeed with PyTorch Lightning
Troubleshooting
- Debugging common DeepSpeed issues
- Monitoring and logging
Summary and Next Steps
- Recap of key concepts and features
- Best practices for using DeepSpeed in production
- Further resources for learning more about DeepSpeed
Requirements
- Intermediate knowledge of deep learning principles
- Experience with PyTorch or similar deep learning frameworks
- Familiarity with Python programming
Audience
- Data scientists
- Machine learning engineers
- Developers
Testimonials (3)
I really liked the end where we took the time to play around with CHAT GPT. The room was not set up the best for this- instead of one large table a couple of small ones so we could get into small groups and brainstorm would have helped
Nola - Laramie County Community College
Course - Artificial Intelligence (AI) Overview
Working from first principles in a focused way, and moving to applying case studies within the same day
Maggie Webb - Department of Jobs, Regions, and Precincts
Course - Artificial Neural Networks, Machine Learning, Deep Thinking
That it was applying real company data. Trainer had a very good approach by making trainees participate and compete