LLM/SLM Engineering
Building and Scaling Production Language Model
Duration: 9 Weeks, 2 sessions per week
Time Needed: 2 hours lecture per session
Assignments: 4+ hours hands-on per week
Project: Required project presentation at the end of the course
đź‘‹Â Welcome! Explore the wonders of Large language models. At the end of this course, you will be able to build and deploy LLM applications with confidence. You will be equipped with the knowledge on how LLM works from tokens and embeddings to fine-tuning your own models and using agents to build production grade applications.
✨ The course concludes with a Live Demo Day where you’ll showcase your innovative projects!
🧑‍💻 Your LLM Journey
- Foundations: Begin with the transformer model, exploring each layer as we build GPT from scratch. Master concepts like positional encoding, self-attention, and multi-head attention. Get ready to tinker with the mechanics to uncover the “magic” behind LLMs.
- Enabling LLMs: Learn the best practices for prompt engineering, retrieval-augmented generation, fine-tuning, and agent integration. By this stage, you’ll understand how to seamlessly incorporate LLMs into your applications and identify the best approach for your use case.
- Deployment & Monitoring: Discover how to evaluate, deploy, and monitor LLM performance. Explore setups like on-premises deployments, Hugging Face Spaces, and monitoring tools such as Weights & Biases (WandB) and LangFuse.
🧑‍🤝‍🧑 Learning Activities and Support
- Every week, you will go through assignments and group discussions.
- One-on-one office hours will also be available in case you need more TLC.
🚀 Get ready for an interactive and exciting learning experience!
Part 1: The Foundations
- Introduction
- AI Today and Tomorrow
- Overview of AI Engineering
- What’s SOTA (State of the Art)
- The Transformer Architecture
- Encoder and Decoder Blocks
- BART, BERT, GPT-style
- Write Encoder-Decoder Transformer Model from scratch
- Explain each component along the way
- Tokenization: Text to Tokens
- Positional Encoding: Understanding the Order in Context
- Normalization, Dropout and Residual Connections
- Embedding Layers: Words to Numbers
- Attention Layers: Self-attention and Multi-head attention
- Feedforward Network Layers: Firing the engines
- LayerNorms: Stabilizing the network
- Projection Layer: Shaping the output
- Temperature From precision to creativity
- Decoding Methods – Top K, Top P and more
- Guardrails – controlling the outputs
- Loss and Cross Entropy Loss: The model’s compass
- RLHF (Reinforcement Learning with Human Feedback)
Part 2: Enabling LLMs
The second part focuses on best practices for integrating large language models (LLMs) into applications, beginning with Retrieval Augmented Generation (RAG) and advanced techniques to improve context relevance. It further explores fine-tuning models using efficient parameter adaptation, quantization, and practical implementation with small language models (SLMs). Additionally, it delves into optimizing inference through agents, highlighting function calling, LangGraph, and reasoning-action frameworks for enhanced performance and functionality.
- Why Context Matters?
- Retrieval Augmented Generation
- Introduction to Vector Database
- RAG Evaluation (RAGAS)
- Langchain, LlamaIndex RAG
- Fusion, Reciprocal Rank Fusion
- RAPTOR and other variants of RAG
- Your First RAG application
- Understanding parameter-efficient methods and low-rank adaptation
- Quantization
- Fine-tuning SLM on HuggingFace
- Function calling
- LangGraphÂ
- The Reasoning-Action (ReAct) Agents
- Innovation and Ideation
- Industry Use Cases
- Brainstorming Techniques and Strategies
Part 3: Productioning LLM Apps
The final part of the program focuses on evaluating and optimizing LLM applications, covering benchmarks, monitoring performance, and scaling effectively. Participants will learn practical techniques like caching prompts, managing requests, and creating efficient data pipelines to build scalable applications. It also includes deploying open-source solutions, measuring performance, and fine-tuning hardware for efficiency. The program culminates with a demo day, where learners showcase their projects to industry experts and the public, followed by a certification ceremony.
- Massive Text Embedding Benchmark (MTEB)
- Monitoring and Visibility: Efficient inference, scaling, and tracking performance
- Semantic Chunking
- Prompt and Embedding Caching
- Request Queues
- Data pipelines
- Building Scalable RAG application
- Measuring with LangFuse
- HuggingFace Text Generation
- GPT-Generated Unified Format (GGUF)
- Rightsizing the GPU
- Demo Day Rehearsal and Feedback
- Public Presentation and Demo Day
- Graduation and Certification Ceremony
Jumpstart your AI Career Now! Learn the state of the art in AI and join the AI bandwagon.
Price: P 50,000.00
Sign up here now!
Note: A skill assessment will be conducted prior to acceptance in this program.
Scholarships and discounts available for highly qualified applicants.