My Notes
Engineering notes and deep dives I've written on LLMs, inference, RAG, and agentic systems — things I actually use and build with.
LLM Fine-Tuning Handbook
Engineering reference covering SFT, LoRA, PEFT, and production fine-tuning workflows.
LLM/VLM Quantization & Inference Engineering
Handbook on quantization strategies — GPTQ, AWQ, GGUF — and inference optimization.
AWQ — Complete Guide
Deep dive into Activation-aware Weight Quantization: theory, calibration, and deployment.
vLLM KV Cache — Field Notes
Practical notes on vLLM's KV cache internals, paged attention, and throughput tuning.
Local LLM Scaling Analysis
What isn't scaling in local LLMs — bottlenecks, hardware ceilings, and real-world limits.
DFlash — Deep Dive Notes
Notes on flash attention variants and memory-efficient attention mechanisms.
RAG Deep Dive — Part 1
Comprehensive RAG study guide covering retrieval, chunking, reranking, and evaluation.
RAG Deep Dive — Part 2
Advanced RAG patterns: hybrid search, agentic retrieval, and production architecture.
LLM Memory Systems
Engineering notes on short-term, long-term, and episodic memory in LLM-based systems.
Agentic AI vs MCP
Engineering notes comparing agentic AI architectures and the Model Context Protocol.
Multi-Agent AI Systems — Production Guide
Building production-grade multi-agent systems: orchestration, reliability, and tooling.
Deep Learning Training Handbook
Engineer's handbook for DL training: optimizers, schedulers, mixed precision, and debugging.
Claude Internals — Deep Notes
Deep notes on Claude's architecture, training approach, and constitutional AI.
Self-Hosting LLMs on Edge Hardware
Complete infrastructure guide for self-hosting LLMs on edge hardware — LiteLLM, vLLM, and local deployment patterns.
14 notes · Updated regularly