AI & ML Engineering Blogs | Krishil Agrawal

Agentic AISystem DesignGenAI / LLMs

MCP vs A2A — The 2 Protocols Every AI Developer Needs to Know

MCP connects your agent to tools. A2A connects your agent to other agents. Those are two very different problems — and confusing them will wreck your architecture before you write a single line of code.

12 min read·April 2026

SecurityDeep DiveAgentic AI

Why Agentic AI Is Still Broken: 5 Security Failures Killing Real Deployments

Agentic AI promises autonomy, but prompt injection, tool misuse, and broken trust chains are silently killing deployments. Here's what's really broken and how to fix it.

10 min read·April 2026

GenAI / LLMsSystem Design

Scaling RAG Systems: From Naive Retrieval to Agentic Chunking

80% of RAG failures happen at the chunking layer, not the LLM. Here's how to move from fixed-size splitting to intelligent, context-aware chunking.

14 min read·April 2026

GenAI / LLMsSystem DesignAgentic AI

GraphRAG vs Vector RAG — When Relationships Beat Similarity

GraphRAG beats Vector RAG in 4 specific scenarios. Learn when entity relationships outperform semantic similarity — with diagrams, examples, and code.

12 min read·April 2026

GenAI / LLMsAgentic AIDeep Dive

LLMs Don't Have Memory. So How Do They Remember?

Every LLM starts completely fresh. No memory of you, your preferences, or your last conversation. So how do AI assistants seem to remember anything? Here's the complete engineering answer.

15 min read·April 2026

Machine LearningGenAI / LLMsDeep Dive

Vectorization vs Embeddings — The Difference Every ML Engineer Must Understand

Both convert text to numbers. Both produce vectors. And yet they solve fundamentally different problems — and using them interchangeably will break your models in ways that are very hard to debug.

14 min read·April 2026

Deep LearningMachine LearningGenAI / LLMsDeep Dive

How Embedding Models Work Under the Hood

What actually happens inside an embedding model? Tokenization, lookup tables, positional encoding, multi-head attention, pooling, contrastive training — every layer explained with diagrams and code.

16 min read·April 2026

GenAI / LLMsMachine LearningSystem DesignDeep Dive

Vector RAG vs Vectorless RAG — The Complete Production Guide

Vector RAG uses embeddings and semantic search. Vectorless RAG uses BM25, keyword search, and structured queries. Neither wins every time. Here's the engineering framework for choosing — and combining — both.

15 min read·April 2026

GenAI / LLMsMachine LearningDeep Dive

The Hidden Complexity of RAG — From Beginner Surface to Builder Depth

The RAG iceberg: above the waterline is a basic pipeline. Below it are reranking, query reformulation, hallucination detection, PII masking, latency vs accuracy tradeoffs, and much more. Here's the full map.

18 min read·April 2026

System DesignCase StudiesDeep Dive

YouTube System Design — Complete Architecture

A complete technical breakdown of how YouTube serves 2.5 billion users — from raw video upload to personalized playback at planetary scale.

22 min read·May 2026

GenAI / LLMsSystem DesignDeep Dive

RAG in Production: The Complete Engineering Guide

Every question you have while building a RAG system answered — chunking, retrieval, evaluation, failure modes, vector DBs, and lessons from Morgan Stanley, Perplexity, Notion, and Cursor.

22 min read·April 2025

System DesignGenAI / LLMsDeep Dive

The End of the LLM Monolith: Why Enterprises Are Routing 80% of Traffic to SLMs

Throwing a frontier model at a simple data extraction task isn't innovation — it's burning money. How semantic routers and SLMs are cutting API costs by 85% in 2025.

16 min read·May 2025

System DesignGenAI / LLMsDeep Dive

"MCP is Dead": Why Context Bloat is Killing Your Agents

In 2026, the hottest take on Tech Twitter was that the Model Context Protocol (MCP) is dead. The reality is much harsher: MCP isn't dead, your server design just sucks. How the corsair.dev introspection pattern fixes context bloat.

14 min read·May 2026

System DesignGenAI / LLMsDeep Dive

The 3 Paradigms of AI Coding: Cursor vs. Antigravity vs. VS Code AI

We are no longer just arguing about which LLM is smarter. We are arguing about where the AI should live: inside an extension, baked into a custom editor fork, or running as an autonomous system process.

16 min read·May 2026

SecurityGenAI / LLMsDeep Dive

The Claude Mythos Paradox: When the Safest AI Becomes the Most Dangerous

What if the AI that passes every safety test is also the most dangerous AI ever built? Claude Mythos taught us that aligned and safe are not the same thing.

18 min read·May 2026

AI & ML Insights by Krishil Agrawal.

The AI Solopreneur Stack: 7 Tools to Run a One-Person Business

MCP vs A2A — The 2 Protocols Every AI Developer Needs to Know

Why Agentic AI Is Still Broken: 5 Security Failures Killing Real Deployments

Scaling RAG Systems: From Naive Retrieval to Agentic Chunking

GraphRAG vs Vector RAG — When Relationships Beat Similarity

LLMs Don't Have Memory. So How Do They Remember?

Vectorization vs Embeddings — The Difference Every ML Engineer Must Understand

How Embedding Models Work Under the Hood

Vector RAG vs Vectorless RAG — The Complete Production Guide

The Hidden Complexity of RAG — From Beginner Surface to Builder Depth

YouTube System Design — Complete Architecture

RAG in Production: The Complete Engineering Guide

The End of the LLM Monolith: Why Enterprises Are Routing 80% of Traffic to SLMs

"MCP is Dead": Why Context Bloat is Killing Your Agents

The 3 Paradigms of AI Coding: Cursor vs. Antigravity vs. VS Code AI

The Claude Mythos Paradox: When the Safest AI Becomes the Most Dangerous

What Makes These Articles Different

System-Level Thinking

Production Constraints First

Grounded in Research

New AI Engineering deep-dives —
every month.

AI & ML Insights by Krishil Agrawal.

The AI Solopreneur Stack: 7 Tools to Run a One-Person Business

MCP vs A2A — The 2 Protocols Every AI Developer Needs to Know

Why Agentic AI Is Still Broken: 5 Security Failures Killing Real Deployments

Scaling RAG Systems: From Naive Retrieval to Agentic Chunking

GraphRAG vs Vector RAG — When Relationships Beat Similarity

LLMs Don't Have Memory. So How Do They Remember?

Vectorization vs Embeddings — The Difference Every ML Engineer Must Understand

How Embedding Models Work Under the Hood

Vector RAG vs Vectorless RAG — The Complete Production Guide

The Hidden Complexity of RAG — From Beginner Surface to Builder Depth

YouTube System Design — Complete Architecture

RAG in Production: The Complete Engineering Guide

The End of the LLM Monolith: Why Enterprises Are Routing 80% of Traffic to SLMs

"MCP is Dead": Why Context Bloat is Killing Your Agents

The 3 Paradigms of AI Coding: Cursor vs. Antigravity vs. VS Code AI

The Claude Mythos Paradox: When the Safest AI Becomes the Most Dangerous

What Makes These Articles Different

System-Level Thinking

Production Constraints First

Grounded in Research

New AI Engineering deep-dives — every month.

New AI Engineering deep-dives —
every month.