LLM/SLM Engineering

Building and Scaling Production Language Model

Duration: 9 Weeks, 2 sessions per week
Time Needed: 2 hours lecture per session
Assignments: 4+ hours hands-on per week
Project: Required project presentation at the end of the course

👋 Welcome! Explore the wonders of Large language models. At the end of this course, you will be able to build and deploy LLM applications with confidence. You will be equipped with the knowledge on how LLM works from tokens and embeddings to fine-tuning your own models and using agents to build production grade applications.

✨ The course concludes with a Live Demo Day where you’ll showcase your innovative projects!

🧑‍💻 Your LLM Journey

Foundations: Begin with the transformer model, exploring each layer as we build GPT from scratch. Master concepts like positional encoding, self-attention, and multi-head attention. Get ready to tinker with the mechanics to uncover the “magic” behind LLMs.
Enabling LLMs: Learn the best practices for prompt engineering, retrieval-augmented generation, fine-tuning, and agent integration. By this stage, you’ll understand how to seamlessly incorporate LLMs into your applications and identify the best approach for your use case.
Deployment & Monitoring: Discover how to evaluate, deploy, and monitor LLM performance. Explore setups like on-premises deployments, Hugging Face Spaces, and monitoring tools such as Weights & Biases (WandB) and LangFuse.

🧑‍🤝‍🧑 Learning Activities and Support

Every week, you will go through assignments and group discussions.
One-on-one office hours will also be available in case you need more TLC.

🚀 Get ready for an interactive and exciting learning experience!

Part 1: The Foundations

The foundation explores the “first principles” of large language models by thoroughly dissecting the Transformer architecture. This is a deep-dive into each and every component and layer of the Transformer model so that learners understand the intuition behind the learning capabilities of these advanced models.

Session #1: Great Expectations

Introduction
AI Today and Tomorrow
Overview of AI Engineering
What’s SOTA (State of the Art)

Session #2: The Transformer Model (1/2)

The Transformer Architecture
Encoder and Decoder Blocks
BART, BERT, GPT-style

Session #3: The Transformer Model (2/2)

Write Encoder-Decoder Transformer Model from scratch
Explain each component along the way

Session #4: The Building Blocks (1/2)

We will look into each layer of transformer model and use this visualization tool.

Tokenization: Text to Tokens
Positional Encoding: Understanding the Order in Context
Normalization, Dropout and Residual Connections
Embedding Layers: Words to Numbers

Session #5: The Building Blocks (2/2)

Attention Layers: Self-attention and Multi-head attention
Feedforward Network Layers: Firing the engines
LayerNorms: Stabilizing the network
Projection Layer: Shaping the output

Session #6: Next Token Prediction

Temperature From precision to creativity
Decoding Methods – Top K, Top P and more
Guardrails – controlling the outputs

Session #7: Pre-Training & Alignment

Loss and Cross Entropy Loss: The model’s compass
RLHF (Reinforcement Learning with Human Feedback)

Part 2: Enabling LLMs

The second part focuses on best practices for integrating large language models (LLMs) into applications, beginning with Retrieval Augmented Generation (RAG) and advanced techniques to improve context relevance. It further explores fine-tuning models using efficient parameter adaptation, quantization, and practical implementation with small language models (SLMs). Additionally, it delves into optimizing inference through agents, highlighting function calling, LangGraph, and reasoning-action frameworks for enhanced performance and functionality.

Session #8: Enabling LLMs – Retrieval Augmented Generation (1/2)

Why Context Matters?
Retrieval Augmented Generation
Introduction to Vector Database
RAG Evaluation (RAGAS)

Session #9: Enabling LLMs – Retrieval Augmented Generation (2/2)

Langchain, LlamaIndex RAG
Fusion, Reciprocal Rank Fusion
RAPTOR and other variants of RAG
Your First RAG application

Session #10: Enabling LLMs – Fine-Tuning Embeddings and Models

Understanding parameter-efficient methods and low-rank adaptation
Quantization
Fine-tuning SLM on HuggingFace

Session #11: Enabling LLMs – Improving Inference with Agents

Function calling
LangGraph
The Reasoning-Action (ReAct) Agents

Session #12: Project Preparation

Innovation and Ideation
Industry Use Cases
Brainstorming Techniques and Strategies

Part 3: Productioning LLM Apps

The final part of the program focuses on evaluating and optimizing LLM applications, covering benchmarks, monitoring performance, and scaling effectively. Participants will learn practical techniques like caching prompts, managing requests, and creating efficient data pipelines to build scalable applications. It also includes deploying open-source solutions, measuring performance, and fine-tuning hardware for efficiency. The program culminates with a demo day, where learners showcase their projects to industry experts and the public, followed by a certification ceremony.

Session #13: Evaluation and Monitoring LLM Applications

Massive Text Embedding Benchmark (MTEB)
Monitoring and Visibility: Efficient inference, scaling, and tracking performance

Session #14: Improving Inference and Optimization Strategies

Semantic Chunking
Prompt and Embedding Caching
Request Queues
Data pipelines
Building Scalable RAG application

Session #15: Deploying Scalable Open-Source Endpoints

Measuring with LangFuse
HuggingFace Text Generation
GPT-Generated Unified Format (GGUF)
Rightsizing the GPU

Session #16: Demo Day Rehearsal

Demo Day Rehearsal and Feedback

Session #17: Demo Day and Graduation

Live LLM Demo with industry practitioners and the public.

Public Presentation and Demo Day
Graduation and Certification Ceremony

Jumpstart your AI Career Now! Learn the state of the art in AI and join the AI bandwagon.
Price: P 50,000.00

Sign up here now!
Note: A skill assessment will be conducted prior to acceptance in this program.
Scholarships and discounts available for highly qualified applicants.