Back to Portfolio
AI & ML Insights

AI & ML Insights by Krishil Agrawal.

Deep dives into real-world machine learning systems, AI architectures, and engineering challenges — written for engineers who build, not just read.

16+Technical deep dives
100%Production-focused
0Beginner tutorials
2026Current & up-to-date
FeaturedAgentic AIGenAI / LLMsDeep Dive

The AI Solopreneur Stack: 7 Tools to Run a One-Person Business

One person. Zero employees. Full-scale output. Here's the exact AI stack — LLMs, agents, automation, content, and revenue tools — that lets a single operator run a real business in 2025.

18 min read·April 2026
All Articles15 articles
Agentic AISystem DesignGenAI / LLMs

MCP vs A2A — The 2 Protocols Every AI Developer Needs to Know

MCP connects your agent to tools. A2A connects your agent to other agents. Those are two very different problems — and confusing them will wreck your architecture before you write a single line of code.

12 min read·April 2026
SecurityDeep DiveAgentic AI

Why Agentic AI Is Still Broken: 5 Security Failures Killing Real Deployments

Agentic AI promises autonomy, but prompt injection, tool misuse, and broken trust chains are silently killing deployments. Here's what's really broken and how to fix it.

10 min read·April 2026
GenAI / LLMsSystem Design

Scaling RAG Systems: From Naive Retrieval to Agentic Chunking

80% of RAG failures happen at the chunking layer, not the LLM. Here's how to move from fixed-size splitting to intelligent, context-aware chunking.

14 min read·April 2026
GenAI / LLMsSystem DesignAgentic AI

GraphRAG vs Vector RAG — When Relationships Beat Similarity

GraphRAG beats Vector RAG in 4 specific scenarios. Learn when entity relationships outperform semantic similarity — with diagrams, examples, and code.

12 min read·April 2026
GenAI / LLMsAgentic AIDeep Dive

LLMs Don't Have Memory. So How Do They Remember?

Every LLM starts completely fresh. No memory of you, your preferences, or your last conversation. So how do AI assistants seem to remember anything? Here's the complete engineering answer.

15 min read·April 2026
Machine LearningGenAI / LLMsDeep Dive

Vectorization vs Embeddings — The Difference Every ML Engineer Must Understand

Both convert text to numbers. Both produce vectors. And yet they solve fundamentally different problems — and using them interchangeably will break your models in ways that are very hard to debug.

14 min read·April 2026
Deep LearningMachine LearningGenAI / LLMsDeep Dive

How Embedding Models Work Under the Hood

What actually happens inside an embedding model? Tokenization, lookup tables, positional encoding, multi-head attention, pooling, contrastive training — every layer explained with diagrams and code.

16 min read·April 2026
GenAI / LLMsMachine LearningSystem DesignDeep Dive

Vector RAG vs Vectorless RAG — The Complete Production Guide

Vector RAG uses embeddings and semantic search. Vectorless RAG uses BM25, keyword search, and structured queries. Neither wins every time. Here's the engineering framework for choosing — and combining — both.

15 min read·April 2026
GenAI / LLMsMachine LearningDeep Dive

The Hidden Complexity of RAG — From Beginner Surface to Builder Depth

The RAG iceberg: above the waterline is a basic pipeline. Below it are reranking, query reformulation, hallucination detection, PII masking, latency vs accuracy tradeoffs, and much more. Here's the full map.

18 min read·April 2026
System DesignCase StudiesDeep Dive

YouTube System Design — Complete Architecture

A complete technical breakdown of how YouTube serves 2.5 billion users — from raw video upload to personalized playback at planetary scale.

22 min read·May 2026
GenAI / LLMsSystem DesignDeep Dive

RAG in Production: The Complete Engineering Guide

Every question you have while building a RAG system answered — chunking, retrieval, evaluation, failure modes, vector DBs, and lessons from Morgan Stanley, Perplexity, Notion, and Cursor.

22 min read·April 2025
System DesignGenAI / LLMsDeep Dive

The End of the LLM Monolith: Why Enterprises Are Routing 80% of Traffic to SLMs

Throwing a frontier model at a simple data extraction task isn't innovation — it's burning money. How semantic routers and SLMs are cutting API costs by 85% in 2025.

16 min read·May 2025
System DesignGenAI / LLMsDeep Dive

"MCP is Dead": Why Context Bloat is Killing Your Agents

In 2026, the hottest take on Tech Twitter was that the Model Context Protocol (MCP) is dead. The reality is much harsher: MCP isn't dead, your server design just sucks. How the corsair.dev introspection pattern fixes context bloat.

14 min read·May 2026
System DesignGenAI / LLMsDeep Dive

The 3 Paradigms of AI Coding: Cursor vs. Antigravity vs. VS Code AI

We are no longer just arguing about which LLM is smarter. We are arguing about where the AI should live: inside an extension, baked into a custom editor fork, or running as an autonomous system process.

16 min read·May 2026
SecurityGenAI / LLMsDeep Dive

The Claude Mythos Paradox: When the Safest AI Becomes the Most Dangerous

What if the AI that passes every safety test is also the most dangerous AI ever built? Claude Mythos taught us that aligned and safe are not the same thing.

18 min read·May 2026
Why Read

What Makes These Articles Different

System-Level Thinking

Every article covers the full engineering stack — not just the model API, but the retrieval layer, the memory architecture, and the deployment constraints.

Production Constraints First

Topics are chosen based on failure modes in real systems — the gaps between demos and deployments that most tutorials never address.

Grounded in Research

When numbers appear — token costs, latency figures, accuracy deltas — they come from cited sources and real benchmarks, not intuition.

Stay Sharp

New AI Engineering deep-dives — every month.

RAG systems. Agentic architectures. LLM deployment patterns. No filler.