Table of Contents

I was looking at a massive enterprise RAG pipeline the other day and a terrifying thought crossed my mind: what if we just built the world’s most efficient data leak engine?

We developers get so obsessed with search quality—tweaking embedding models, messing with chunk sizes—that we often completely forget about data isolation. We just dump everything into a single, monolithic vector index. PDFs, SharePoint sites, Confluence pages. It all goes into the blender. And then I realized… if HR termination policies, financial forecasts, and public marketing copy all live in the exact same vector space, my AI agent isn’t a feature. It’s a massive security liability waiting to blow up in my face.

The Problem: Assuming the LLM Knows Who is Asking

Let’s play out a scenario I was stressing over. Imagine an intern logs into the internal company chatbot and types, “What are the CEO’s bonus targets for Q4?”

The semantic search algorithm does exactly what I built it to do. It finds the most mathematically relevant text. It happily yanks the highly confidential executive compensation document out of the vector database and hands it to the LLM. The LLM, eager to please, summarizes it into a neat little bulleted list for the intern.

Why did this happen? Because I fundamentally misunderstood how LLMs work in a pipeline. The LLM has absolutely no concept of “who” is asking the question. It only knows the text I fed it. I used to think I could just slap a system prompt on it like, “Do not reveal financial data to junior employees.” But let’s be real—LLMs are so easily jailbroken it’s not even funny. I can’t rely on a text prompt to enforce cryptographic security boundaries.

My Fix: Hard RBAC at the Retrieval Layer with Azure AD B2C

I finally figured out that security can’t live in the prompt. It has to exist at the retrieval layer, completely separated from the LLM’s reasoning engine, and it has to be enforced by hard cryptographic tokens.

Building Identity-Aware Retrieval

Here is how I wired it up to stop losing sleep. By integrating Azure AD B2C (or Microsoft Entra ID), I force every single request hitting my AI API to carry a JWT (JSON Web Token). This token actually contains the user’s specific group claims, department, and role.

Then, when I build out the vector database (whether I’m using Azure AI Search, Pinecone, or Cosmos DB), I tag every single chunk of text with an Access Control List (ACL) in the metadata.

This is what my actual query logic started looking like:

import jwt
from azure.search.documents import SearchClient

def secure_rag_query(user_query, auth_header):
    # 1. Cryptographically validate the Azure AD JWT
    # If this fails, they don't even get to talk to the AI
    decoded_token = jwt.decode(auth_header, verify=True, algorithms=["RS256"])
    user_groups = decoded_token.get("groups", [])
    
    # 2. Build an OData filter for the Vector Search based on who they actually are
    # E.g., ["Marketing", "All_Employees"]
    group_filters = " or ".join([f"allowed_groups/any(g: g eq '{group}')" for group in user_groups])
    
    # 3. Execute the Identity-Aware Vector Search
    search_results = search_client.search(
        search_text=user_query,
        filter=group_filters,  # This is the hard infrastructure filter that saves the day!
        top=5
    )
    return generate_llm_response(search_results)

Why This Actually Lets Me Sleep at Night

If that intern isn’t cryptographically verified as a member of the “Executive” group, my vector database physically filters out the CEO’s compensation document during the retrieval phase. The text never even makes it into the LLM’s context window. And the LLM can’t leak what it doesn’t know.

By shifting the security burden down to the infrastructure layer, Azure AD B2C acts as an impenetrable shield. It plugs the data leaks and actually gives me a real Zero Trust AI architecture.

Related Reading: If you’re building this stuff out, you might also want to see how I’m pairing databases with Azure security in Managing State: Redis vs Cosmos DB, or why I realized I had to use the right Azure environments in Why Enterprises Are Ditching Native OpenAI for Azure.

Categorized in: