Knowledge Base

My Notes

Engineering notes and deep dives I've written on LLMs, inference, RAG, and agentic systems — things I actually use and build with.

Training

LLM Fine-Tuning Handbook

Engineering reference covering SFT, LoRA, PEFT, and production fine-tuning workflows.

Open note

Inference

LLM/VLM Quantization & Inference Engineering

Handbook on quantization strategies — GPTQ, AWQ, GGUF — and inference optimization.

Open note

Inference

AWQ — Complete Guide

Deep dive into Activation-aware Weight Quantization: theory, calibration, and deployment.

Open note

Inference

vLLM KV Cache — Field Notes

Practical notes on vLLM's KV cache internals, paged attention, and throughput tuning.

Open note

Inference

Local LLM Scaling Analysis

What isn't scaling in local LLMs — bottlenecks, hardware ceilings, and real-world limits.

Open note

Inference

DFlash — Deep Dive Notes

Notes on flash attention variants and memory-efficient attention mechanisms.

Open note

RAG

RAG Deep Dive — Part 1

Comprehensive RAG study guide covering retrieval, chunking, reranking, and evaluation.

Open note

RAG

RAG Deep Dive — Part 2

Advanced RAG patterns: hybrid search, agentic retrieval, and production architecture.

Open note

Agents

LLM Memory Systems

Engineering notes on short-term, long-term, and episodic memory in LLM-based systems.

Open note

Agents

Agentic AI vs MCP

Engineering notes comparing agentic AI architectures and the Model Context Protocol.

Open note

Agents

Multi-Agent AI Systems — Production Guide

Building production-grade multi-agent systems: orchestration, reliability, and tooling.

Open note

Training

Deep Learning Training Handbook

Engineer's handbook for DL training: optimizers, schedulers, mixed precision, and debugging.

Open note

Models

Claude Internals — Deep Notes

Deep notes on Claude's architecture, training approach, and constitutional AI.

Open note

Inference

Self-Hosting LLMs on Edge Hardware

Complete infrastructure guide for self-hosting LLMs on edge hardware — LiteLLM, vLLM, and local deployment patterns.

Open note

14 notes · Updated regularly