Sahil Chachra
“Architecting Intelligence, From the Ground Up.”
AI Architect · Building AI Platforms That Scale
Most engineers pick a lane. I build the stack end-to-end — from LLM orchestration and Multi-agent reasoning to VLM-powered perception systems running in real time at scale.
In Progress
Now
What I'm building, thinking about, and poking at right now.
Shipping
Real-time VLM surveillance pipeline
Multi-camera asyncio system where raw model events pass through a sliding-window persistence gate before they touch the backend — temporal confidence filtering as the anti-hallucination layer.
Cross-platform utility apps
Building Windows, macOS, and Linux desktop apps that onboard customers into the BLUE ecosystem — paired with contributions to a patented Intel VAAPI video compression pipeline.
MLX-quantized models for Apple Silicon
Publishing every quantization variant (affine 4/5/6/8-bit, mixed-bit, MX FP4/FP8) of Granite 4.1 8B and Hy-MT2 to HuggingFace — with benchmark reports comparing quality and throughput against FP16 baselines on M5 Pro.
Thinking About
Failure modes under degraded visual conditions
VLMs hallucinate more in low-light and noisy frames. What does a deterministic noise-rejection layer look like when the model itself is the noisy signal?
Where the edge–cloud inference boundary actually sits
Not a binary choice — it's a latency/cost/accuracy curve. I'm mapping where that curve bends for real-time multi-camera workloads.
Exploring
Temporal reasoning across frames
Single-frame inference misses context that spans seconds. Exploring how to pass temporal state to models that weren't designed for it.
Where mixed-bit beats uniform quantization
For translation vs. reasoning workloads, the optimal bit-width allocation differs significantly. Mapping when mixed4_6 is worth the complexity over straight 4-bit.
Work
Projects
Things I've built, shipped, and open-sourced.
Most teams bolt safety on after deployment. This is a drop-in proxy that intercepts every LLM call — running configurable safety checks, content filters, and policy guardrails before the request reaches the model. Open-source, production-grade, zero changes to your existing LLM integration.
Evaluating VLMs in production requires more than accuracy scores — you need throughput, latency, and cost curves too. Self-hosted benchmarking platform: upload image datasets, register HuggingFace or local models, run GPU-accelerated evaluation via vLLM, and compare the full performance profile on a live leaderboard.
LLMs have a refusal problem — they refuse too much or not enough. This project maps the refusal decision boundary, then reshapes it through targeted fine-tuning. The goal: principled control over refusal behavior without degrading general capability.
When you merge two models each fine-tuned on a different domain, do you get a smarter generalist or a confused compromise? This project maps the answer empirically — testing weight interpolation, SLERP, and task vectors across domain-specialized checkpoints.
Career
Experience
Where I've built things that matter.
Joined a pre-seed startup building vision analytics for physical retail spaces. An AI stack existed — but it was expensive to run and didn't scale economically. My job: rebuild the pipeline to run on cheaper infrastructure while supporting far more cameras, inference throughput, and VLM API calls simultaneously.
- ▸Built a real-time multi-camera VLM pipeline from scratch — asyncio-native, with a sliding-window persistence gate that filters hallucinations before they reach the backend, cutting false alert rates without adding latency.
- ▸Contributing to a patented hardware-accelerated video compression algorithm (Intel VAAPI) and building cross-platform utility applications (Windows, macOS, Linux) that onboard customers into the BLUE ecosystem.
- ▸Profiled the full inference stack across quantization levels and GPU/edge targets — found and eliminated the bottlenecks preventing real-time SLA compliance.
Founding AI Engineer
Stealth Startup
AI Platform for Factories · Pre-Seed Stage
First engineer hired at a pre-seed startup building AI for factory floors. The domain was manufacturing — CAD files, machine catalogs, compliance specs, and factory-floor constraints. The job was to make an LLM reason reliably across all of it, at production scale, with no prior art to follow.
- ▸Designed a 3-tier, 27-agent orchestration framework — role-based CrewAI agents, async direct-LLM agents, and an 11-stage DAG task executor — decomposing complex manufacturing tasks into verifiable, structured sub-problems.
- ▸Built a 7-stage document enrichment pipeline (extract → plan → classify → enrich → normalize → cross-doc → assemble) with conditional enrichers and concurrency semaphores handling 5 document modalities in parallel.
- ▸Integrated bidirectional MES sync with Odoo (JSON-RPC 2.0), connecting AI-generated factory plans to live shop floor data for planned-vs-actual dashboards — the first time those two worlds talked to each other.
Visiting AI Mentor
CurrentFreelanceMesa School of Business
PG & UG AI Hackathons · Neos Kosmos Technologies
Not everything I do is full-time. At Mesa I mentor student teams during intensive AI hackathons — helping founders and postgrads go from idea to working prototype using modern AI tooling, without needing deep engineering backgrounds.
- ▸Guided teams across PG and UG cohorts through hands-on builds using Google AI Studio, Relevance AI, and n8n — covering prompt design, agent workflows, and no-code/low-code AI automation.
- ▸Helped non-technical teams translate business problems into AI-powered prototypes within hackathon time constraints.
Senior AI Engineer
Avathon
Computer Vision AI Platform · Series D
Three years scaling a computer vision platform from early contracts to enterprise deployments across hundreds of cameras. I went from training models to owning system architecture — and eventually added LLMs to a stack that was purely CV when I joined.
- ▸Scaled a computer vision platform to 600+ cameras across enterprise clients, maintaining production reliability across 30+ active use cases simultaneously.
- ▸Built a person re-ID pipeline that tracked individual customer journeys across camera zones — measuring dwell time and staff engagement patterns that didn't exist in any other data source.
- ▸Engineered an LLM + RAG layer that translated natural language requirements directly into structured CV pipeline configurations, cutting new use-case deployment cycles by 20%.
- ▸Upgraded the platform with LLaVA and Qwen VLMs, achieving up to 95% accuracy improvement across 350 cameras.
Deep Learning Engineer
Tata Consultancy Services
IT Services & Consulting · Fortune 500
Started my career applying deep learning to real automotive and mobility problems. The fundamentals I built here — training discipline, production mindset, and understanding hardware constraints — shaped everything that followed.
- ▸Accelerated data annotation pipelines using deep learning models for the Smart Mobility Group, cutting manual labeling time on automotive datasets.
- ▸Integrated trained CV models into customer-facing products across mobility and automotive domains — one of the first times I shipped ML to real users.
- ▸Awarded the Technical Excellence Award for outstanding contributions to the organization.
HuggingFace
Open Models
I quantize open-source LLMs into every MLX variant and publish them for the community — so anyone on Apple Silicon can run frontier models locally without writing a line of quantization code.
14+
Models published
3
Base models ported
7
Quant formats per model
M5 Pro
Benchmarked on
Quantized with mlx-bench — a model-agnostic pipeline for MLX quantization and benchmarking. Point it at any HuggingFace repo, get every variant and a side-by-side perf/quality report.
Capabilities
Skills
The tools and technologies I work with every day.
Drag to scrub · Hover to highlight
Medium
Writing
Thoughts on AI, engineering craft, and what it takes to build at the frontier.
Get In Touch
Let's Build Something
Open for AI consulting, platform architecture reviews, and genuinely interesting problems. If you're building at the frontier of AI, I'd love to hear about it.



