[{"content":"How Cloudflare helps you democratize AI access for your teams, developers, and applications — without losing visibility, security, or your budget.\nTL;DR — In part one of this series, I covered the AI traffic hitting your infrastructure from the outside — crawlers, training bots, the broken value exchange between content creators and AI platforms. This post is the mirror image: the AI your organization reaches out to, every day, across every team — and why that consumption is far less controlled than most organizations realize. Shadow AI usage, fragmented provider accounts, API keys scattered across codebases, and no centralized data governance are the norm, not the exception. The real challenge isn\u0026rsquo;t getting your teams to use AI — they already are. It\u0026rsquo;s building the infrastructure layer that makes that adoption sustainable, secure, and cost-controlled. Cloudflare\u0026rsquo;s AI Gateway, combined with the broader Cloudflare developer platform, is that layer. Not because of what it says on a product page, but because of what I see in the field.\nIn my previous post , I wrote about the AI traffic hitting your infrastructure from the outside — crawlers harvesting your content, training bots eroding your cache performance, and the tools Cloudflare is building to protect and monetize that exposure. The data that opened that post is worth holding onto here: 32% of all traffic across Cloudflare\u0026rsquo;s network is now automated, driven in large part by AI systems crawling the web at scale.\nThat number describes AI as something happening to you. But the same explosion of AI activity is also happening inside your organization — your teams, your developers, your applications, all reaching out to AI providers every day. Same trend. Two sides of the same wall.\nThis post is about that inward-facing side: the AI your organization reaches out to, and why that consumption is far less controlled than most organizations realize.\nThe Conversation I Keep Having It usually starts with: \u0026ldquo;We want to build an AI strategy.\u0026rdquo;\nA few questions in, the real picture comes out. I was in a conversation with a financial services company recently — mid-sized, serious security posture, the kind of team that has a process for everything. I asked their CISO how many active AI API keys existed across their environment. He looked at his security architect. The security architect looked at the lead developer. Nobody knew. Not even approximately. They had a formal AI policy. They had no idea what was actually happening underneath it.\nThat\u0026rsquo;s not an outlier. That\u0026rsquo;s the majority. One team expensing OpenAI on personal cards. Another team sharing a corporate API key over Slack. Developers hardcoding credentials directly into applications. Marketing using a third-party AI tool that bypassed procurement entirely. And somewhere in all of this, sensitive data — customer records, internal financials, source code — flowing into external models with no logging, no controls, no way to know what left and when.\nThis is not dysfunction. It is the entirely predictable result of genuinely useful technology arriving faster than organizations can govern it. People don\u0026rsquo;t wait for IT approval when a tool saves them two hours a day.\nThe uncomfortable truth: by the time most organizations decide they need an AI strategy, they already have an AI reality. The question is no longer how to start — it\u0026rsquo;s how to get control of something that started without them.\nShadow AI Is the New Shadow IT — And It Moves Faster A decade ago, the same pattern played out with SaaS. Employees adopted Dropbox, Google Docs, and Slack before IT sanctioned them. It took years for the security and compliance implications to fully surface.\nShadow AI is the same problem, but the risk profile is different in a way that matters. When someone used an unsanctioned SaaS tool, the concern was mostly contractual — data residency, liability, vendor terms. When someone pastes a customer contract or internal financial projection into a public AI model to get a quick summary, the exposure is immediate and irreversible. There is no \u0026ldquo;retrieve the data\u0026rdquo; option once it has been processed externally. And unlike a file in someone\u0026rsquo;s personal Dropbox, you may never even know it happened.\nI\u0026rsquo;ll be direct: I think most organizations are closer to an AI-related security incident than they realize. Not because AI is inherently dangerous, but because the gap between \u0026ldquo;we have a policy\u0026rdquo; and \u0026ldquo;we have technical controls that enforce that policy\u0026rdquo; is enormous right now — and that gap is where incidents happen. A policy document doesn\u0026rsquo;t stop a developer from pasting production credentials into ChatGPT to debug a problem at 11pm. Infrastructure does.\nThere\u0026rsquo;s another dimension here that connects back to something I described in part one. AI crawlers break CDN cache architectures because their access patterns are fundamentally unpredictable — bursty, broad, non-sequential, nothing like the human traffic those systems were designed for. Internal AI consumption has exactly the same property. Developers don\u0026rsquo;t call AI models in smooth, forecastable flows. They experiment in spikes, launch features that hit a model thousands of times overnight, chain multiple calls in ways that compound costs non-linearly. Static budget allocations and acceptable-use policies were not designed for this. You need infrastructure that enforces rules dynamically, at the point of execution.\nWhich brings me to the build-vs-buy question, because it comes up constantly. The first instinct when organizations decide to close the governance gap is to build something — proxy the AI traffic internally, write middleware that logs requests, stand up a key vault, add DLP scanning. I\u0026rsquo;ve seen teams spend three to six months doing this reasonably well. And then spend ongoing time maintaining it as provider APIs change, new models appear, and usage patterns shift in ways nobody anticipated.\nThat\u0026rsquo;s plumbing. It doesn\u0026rsquo;t appear in any roadmap. It consumes senior engineering time that could be building product. The honest question is: what is the actual value of owning this infrastructure yourself?\nWhat Cloudflare Built — And Why It\u0026rsquo;s Different Cloudflare\u0026rsquo;s AI Gateway is the answer to a specific question: what does the control plane for enterprise AI consumption look like if you build it the way you\u0026rsquo;d build internet infrastructure — once, at scale, for everyone?\nThe starting point is a single endpoint connecting to more than 350 models across the major providers: Anthropic, OpenAI, Google, Groq, xAI, and others. Your applications and tools point at one URL. Behind it, AI Gateway handles provider connectivity, request translation, and routing. Switching models or adding a fallback becomes configuration, not a code change.\nBut the endpoint is just the entry point. What matters is what sits around it.\nThe first thing AI Gateway gives you — and I want to be clear that this alone is often worth the conversation — is full observability over your AI consumption. Every request and response, logged. Usage broken down by application, model, provider. Real-time cost tracking. Before you can control costs, enforce security policies, or make intelligent routing decisions, you need to actually know what is happening. Who is calling what model, how often, with what payloads, at what latency.\nThis sounds basic. It is almost universally absent.\nI\u0026rsquo;ve shown this view to customers who genuinely did not know three of their applications were calling AI models in production. The reaction when that dashboard loads for the first time is consistent — it\u0026rsquo;s somewhere between relief and alarm. Relief because there\u0026rsquo;s finally a picture. Alarm because of what the picture shows.\nOnce you have visibility, the billing picture usually produces its own strong reaction. Managing separate credit accounts across providers — each with its own dashboard, billing cycle, and top-up process — is one of those operational burdens that feels manageable until you\u0026rsquo;re doing it across five providers with a dozen teams. AI Gateway consolidates everything into a single Cloudflare account balance and one monthly invoice. You load credits once. Your finance team sees one line. Your procurement team has one contract.\nMore importantly: you cannot set organizational AI spending limits when the spending is scattered across systems with no common view. Unified billing is not convenience — it\u0026rsquo;s the prerequisite for cost governance.\nThe Zero Trust Approach to API Keys Here is a pattern that should concern every security team: the API key that lives in a .env file, committed to a repository, shared with the whole engineering team, never rotated, originally created by a developer who left the company eight months ago.\nThis is not hypothetical. It is the default state of API key management in most development organizations — not because developers are careless, but because the tooling to do better has historically been friction-heavy enough that corners get cut under delivery pressure.\nAI Gateway\u0026rsquo;s integration with Cloudflare Secrets Store applies Zero Trust principles directly to this problem. Provider keys are stored centrally, encrypted, and referenced by name rather than value in code. Developers can use a key without ever seeing its actual content. Role-based access controls separate those who manage keys — typically a security or platform team — from those who consume them. Every key operation is audit-logged.\nWhen a developer leaves, you rotate one key in one place and it propagates automatically to every application referencing it. When a key is suspected of compromise, you have an audit trail that tells you exactly where and when it was used.\nThis is Zero Trust applied to your AI supply chain — the same principle that governs access to your internal applications: verify identity, enforce least privilege, log everything — applied to the credentials powering your AI features.\nThere is a deliberate parallel here to what I described in part one. Web Bot Auth solves the identity problem for external AI agents by replacing spoofable headers with cryptographic proof — you can only trust a bot\u0026rsquo;s behavior if you can trust who it claims to be. The same logic applies internally: you can only govern your organization\u0026rsquo;s AI consumption if you know, with certainty, which application used which key, when, and under whose authority. Secrets Store and audit logging are how that certainty is established on the outbound side.\nThe Data You Don\u0026rsquo;t Want Leaving the Building The data leakage question is the one that makes security and compliance teams most anxious — and it should. An employee asks AI to summarize a customer contract. A developer debugs code that contains production credentials using a public model. A support agent pastes a ticket with PII into an AI tool to draft a response faster. None of these feel like security incidents in the moment. All of them potentially are.\nIn each case, the failure mode is invisible. No error message. No alert. No indication that anything sensitive just left the building.\nAI Gateway\u0026rsquo;s firewall with Data Loss Prevention scanning intercepts this at the infrastructure layer — scanning requests against DLP profiles before they reach any model, and scanning responses on the way back. Sensitive categories can be flagged or blocked. Every match is logged with the profile that triggered and the action taken.\nI want to be honest about the limits here: DLP is not magic. Custom patterns need to be configured thoughtfully, and there will always be edge cases that slip through. But there is a significant difference between \u0026ldquo;we have no visibility into what data is going to AI models\u0026rdquo; and \u0026ldquo;we have a scanning layer that catches the most common and most serious exposure patterns.\u0026rdquo; The latter is a defensible security posture. The former isn\u0026rsquo;t.\nRouting Intelligence — Cost Control Without Limiting Access One of the most practically valuable capabilities in AI Gateway is Dynamic Routes: the ability to define traffic routing logic based on request attributes, user segments, spending thresholds, or traffic splits between models.\nIn practice, this answers the question I hear most often after organizations get over their initial concerns about cost: \u0026ldquo;How do we give everyone access to AI without the spending scaling out of control?\u0026rdquo;\nThe answer isn\u0026rsquo;t to restrict access — it\u0026rsquo;s to route intelligently. Internal productivity tools and experimentation get routed to capable but cost-efficient models. Customer-facing features requiring the highest output quality get access to premium models. Free-tier users in your product get rate-limited or directed to a lighter model. High-volume batch workloads get throttled when they don\u0026rsquo;t need to be real-time.\nNone of this requires engineering once it\u0026rsquo;s configured. It\u0026rsquo;s policy, not code. And it means you can say yes to broad AI adoption — across teams, use cases, and user tiers — while maintaining real control over where the budget actually flows.\nYou can also use Dynamic Routes to run model comparisons in production: split 10% of traffic to a newer model, compare output quality and cost, and promote or roll back based on real data rather than benchmarks. This kind of experimentation infrastructure would take meaningful engineering effort to build internally. Here it is a configuration.\nResilience — Because Production Can\u0026rsquo;t Depend on a Single Provider The last piece is the one that only becomes urgent after something goes wrong. AI providers experience outages. Model performance degrades after updates. Latency spikes under load. If your production application has a hard dependency on one model endpoint, you find out how important fallbacks are when a customer-facing feature stops responding at 2pm on a busy Tuesday.\nDynamic Routes handles fallback logic as a native capability. Define your primary model, your fallback, and the conditions for switching. Cloudflare manages the execution at the edge. Your application continues to function. The provider disruption gets absorbed at the infrastructure layer before it reaches your users.\nFor anyone building AI-powered features in production, this transforms reliability from a custom engineering project into a configuration decision.\nWhat Democratization Actually Requires \u0026ldquo;Democratizing AI\u0026rdquo; gets used loosely. Usually it means giving everyone access — a chatbot here, an API key there. That\u0026rsquo;s the easy part, and it\u0026rsquo;s not enough.\nAccess without governance doesn\u0026rsquo;t last. The first security incident, unexpected invoice, or compliance finding triggers a blanket ban — and you\u0026rsquo;re back where you started, except now you\u0026rsquo;ve burned organizational trust in AI tools on top of it. I\u0026rsquo;ve seen this play out. The technology gets banned not because it was wrong for the organization, but because the infrastructure to run it safely wasn\u0026rsquo;t in place when it mattered.\nReal democratization means broad access and the controls that make that access sustainable. Not restricting who can use AI, but governing how it\u0026rsquo;s used — automatically, at the infrastructure layer, regardless of which team or tool or model is involved.\nAI Gateway, combined with Workers AI for running models natively on Cloudflare\u0026rsquo;s network, is that infrastructure layer. And Dynamic Workers closes the loop in a way worth naming explicitly: in part one I described it as the secure execution environment for AI agents acting on the open web. The same technology serves your internal use case — when your developers want AI to generate and run code inside your own applications, Dynamic Workers is what makes that safe. One product, both directions. That\u0026rsquo;s what a platform looks like.\nThe Bottom Line The AI consumption problem is not a future problem. It exists right now, in most organizations, usually just below the surface. The CISO who doesn\u0026rsquo;t know how many API keys are active. The developer who hardcoded credentials three months ago and has since moved to another team. The finance team about to be surprised by an invoice.\nCloudflare\u0026rsquo;s answer is infrastructure that solves this at the layer where it can actually be enforced — the network — rather than relying on policy documents and developer discipline. It works whether or not your teams follow best practices, whether or not they remember to check the provider dashboard, whether or not the model they\u0026rsquo;re using today still exists next quarter.\nIn part one, I covered the AI traffic coming at your infrastructure. Here I\u0026rsquo;ve covered the AI your organization reaches out to. Together they represent the two sides of the same challenge — and the same platform answering both.\nThe infrastructure exists on both sides. The question is whether you\u0026rsquo;re using it.\nI\u0026rsquo;m a Solutions Engineer at Cloudflare. This post reflects my own field perspective and synthesis of publicly available Cloudflare product information — it does not represent an official Cloudflare position.\n← Part 1: The Internet Was Built for Humans. AI Didn\u0026rsquo;t Get the Memo. References: AI Gateway — August 2025 · Workers AI · Cloudflare Secrets Store · Dynamic Routes · Dynamic Workers ","permalink":"https://blog.macharpe.com/posts/ai-consumption-scale-cloudflare-gateway/","summary":"\u003cp\u003e\u003cem\u003eHow Cloudflare helps you democratize AI access for your teams, developers, and applications — without losing visibility, security, or your budget.\u003c/em\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003cp\u003e\u003cstrong\u003eTL;DR —\u003c/strong\u003e In \u003ca href=\"/posts/ai-internet-infrastructure-cloudflare/\"\u003epart one\u003c/a\u003e\n of this series, I covered the AI traffic hitting your infrastructure from the outside — crawlers, training bots, the broken value exchange between content creators and AI platforms. This post is the mirror image: the AI your organization reaches \u003cem\u003eout\u003c/em\u003e to, every day, across every team — and why that consumption is far less controlled than most organizations realize. Shadow AI usage, fragmented provider accounts, API keys scattered across codebases, and no centralized data governance are the norm, not the exception. The real challenge isn\u0026rsquo;t getting your teams to use AI — they already are. It\u0026rsquo;s building the infrastructure layer that makes that adoption sustainable, secure, and cost-controlled. Cloudflare\u0026rsquo;s AI Gateway, combined with the broader Cloudflare developer platform, is that layer. Not because of what it says on a product page, but because of what I see in the field.\u003c/p\u003e","title":"AI Is Everywhere in Your Organization. Is Anyone Actually in Control?"},{"content":" Disclaimer: This post reflects my own synthesis and perspective on publicly available Cloudflare research and announcements — it does not represent an official Cloudflare position.\nTL;DR — AI crawlers now represent a structural threat to how the web creates and distributes value. They consume content at massive scale, send little traffic back, and are quietly degrading CDN performance for real users. This post covers how Cloudflare is responding — not just with bot controls, but with a coherent platform: cryptographic bot identity (co-authored as an IETF standard), content monetization via Pay Per Crawl, token-efficient delivery for agents, a pub/sub AI Index to replace blind crawling, AI-aware cache architecture, and a secure execution layer for agentic code. Each piece reinforces the others. Together they represent Cloudflare\u0026rsquo;s answer to the question: what should the AI-era internet actually look like?\nI spend a lot of time talking with customers about AI. Not just about AI security, or how to block bad bots — but about a deeper, more structural question that most organizations haven\u0026rsquo;t fully confronted yet: what happens to your business when AI becomes the dominant way people discover information online?\nThat shift is no longer hypothetical. It is already underway. And at Cloudflare, we have a front-row seat to witness it — at a scale no one else can match.\nThe Crawl-to-Click Gap Nobody Talks About Let me start with a number that should make every publisher, SaaS company, and content-driven business pause: 32% of all traffic across Cloudflare\u0026rsquo;s global network is now automated. Search engine crawlers, uptime monitors, ad networks — and increasingly, AI assistants sweeping the web to fuel their knowledge bases and power their answers.\nFor decades, this worked on a kind of gentleman\u0026rsquo;s agreement. Search engines crawled your site, indexed it, and sent humans back to you. Clicks meant ad revenue. The whole SEO industry was built around that loop.\nSource: https://radar.cloudflare.com/traffic#bot-vs-human (03/04/2026)\nAI broke the loop quietly.\nWhen a user asks an AI chatbot a question that used to go to Google, they often get a complete answer — and never click through to the original source. The AI platform extracted the value. The publisher that created the content got nothing.\nCloudflare data confirms the scale of this: AI training crawlers account for nearly 80% of all AI bot traffic we see — far outpacing search-related activity. They don\u0026rsquo;t come for a quick index. They sweep entire websites, going deep into long-tail pages that human visitors rarely touch, harvesting training data for the next generation of models. And they send almost no traffic back in return.\nThis is what we call the crawl-to-click gap. It is real, it is growing, and ignoring it is no longer an option.\nSource: https://radar.cloudflare.com/ai-insights#crawl-purpose (03/04/2026)\nIt Gets Worse: AI Is Breaking Your Infrastructure Too There\u0026rsquo;s a less visible — but increasingly urgent — dimension to this problem: AI crawlers are degrading CDN cache performance for human users.\nTraditional cache architectures were designed around how humans browse the web. Popular pages get cached at the edge. Less popular content gets served from further back. The system self-tunes over time. It works beautifully for the traffic it was designed for.\nAI crawlers behave nothing like that. Rather than concentrating on popular pages, they methodically scan entire sites — including content that human visitors almost never request. Over 90% of pages ingested in large training jobs are unique by content. AI crawlers don\u0026rsquo;t share browser sessions or benefit from cached resources the way human visitors do. They can run multiple independent instances in parallel, each appearing to the CDN as a brand-new visitor.\nThe result is a systematic destruction of the cache efficiency that human traffic depends on.\nReal-world consequences are already documented: Wikipedia recorded a 50% jump in multimedia bandwidth consumption driven by AI bots bulk-scraping images for training datasets. Developer platforms like SourceHut and Read the Docs experienced degraded response times and unexpected bandwidth spikes from bots repeatedly downloading large files. In nearly every case, the only available mitigation was a blunt one — block all AI traffic, forfeiting any potential benefit.\nThe industry needs smarter answers. Cloudflare is building them.\nThe Trust Problem Underneath Everything Here\u0026rsquo;s something that gets overlooked in most conversations about AI bots: before you can control, monetize, or even correctly classify a bot, you have to be able to trust that it is who it says it is.\nToday\u0026rsquo;s bot identification methods are surprisingly fragile. A crawler can announce itself via its User-Agent header — but that header is trivially spoofed. Any actor can claim to be GPTBot or ClaudeBot. The alternative is to validate IP address ranges published by the crawler\u0026rsquo;s operator — but IP ranges are brittle. They change as cloud infrastructure evolves, are shared across multiple services on the same provider, and fall apart entirely when traffic routes through VPNs or privacy proxies.\nThis isn\u0026rsquo;t a theoretical vulnerability. It means that even a well-intentioned blocking or monetization policy can be defeated by anyone willing to forge a header.\nCloudflare\u0026rsquo;s answer is Web Bot Auth — a cryptographic identity framework for bots and agents. Instead of trusting declarations that can be faked, the proposal requires agents to prove who they are by signing their requests using a public/private key pair. The underlying mechanism is RFC 9421 (HTTP Message Signatures), a published internet standard for cryptographic request authentication.\nThe way it works: an agent generates an Ed25519 key pair and hosts its public key at a discoverable URL — its Signature-Agent directory. When making a request, the agent signs the target domain, includes a validity window and a key identifier, and attaches a web-bot-auth tag to declare the purpose. Cloudflare, acting as a reverse proxy, validates the signature against the published public key and confirms or rejects the bot\u0026rsquo;s identity.\nNo shared secrets. No IP list maintenance. No spoofable headers. Cryptographic proof.\nWhat makes this particularly significant is where Cloudflare sits in this story: not as an adopter of someone else\u0026rsquo;s standard, but as a co-author of new ones. Cloudflare researchers have filed Internet Drafts at the IETF — the body that defines how the internet works — specifically proposing the web-bot-auth architecture and the signature-agent directory mechanism as open standards. This is Cloudflare helping write the rulebook for the AI internet.\nThe industry is already moving in this direction: OpenAI has adopted RFC 9421 for their Operator product, cryptographically signing all outgoing agent requests so that any site operator can independently verify their authenticity. This is the kind of broad adoption that turns a proposal into a standard.\nFor customers, the practical implication is significant: a verified identity layer makes every other AI capability more powerful. You can\u0026rsquo;t reliably charge a crawler you can\u0026rsquo;t verify. You can\u0026rsquo;t build a fair content marketplace without provable identities. Web Bot Auth is the cryptographic foundation the rest of the ecosystem stands on.\nWhat Cloudflare Is Building — And Why This Isn\u0026rsquo;t Just Bot Management When I explain Cloudflare\u0026rsquo;s AI platform to customers, the first reaction is often: \u0026ldquo;Oh, you mean like blocking AI crawlers?\u0026rdquo;\nNot exactly. Or rather — not only.\nYes, Cloudflare was among the first to give customers granular, one-click control over AI bots as a distinct category: block, allow, or challenge specific crawlers by purpose — training, search, or real-time user-triggered — with zero engineering effort. That is the foundation.\nWhat\u0026rsquo;s built on top of that foundation is where things get genuinely interesting.\nPay Per Crawl: Reviving HTTP 402 Most people have never encountered HTTP status code 402. It appeared in the original HTTP specification as \u0026ldquo;Payment Required\u0026rdquo; and was largely forgotten for three decades. Cloudflare brought it back to life.\nPay Per Crawl gives content owners a third path beyond \u0026ldquo;block everything\u0026rdquo; or \u0026ldquo;give your work away for free.\u0026rdquo; Content owners set a price per access. AI crawlers can either receive a 402 response and decide whether to pay, or proactively declare their maximum willingness to pay upfront. Cloudflare handles the billing relationship as merchant of record — no individual contracts, no scale requirements.\nThe Web Bot Auth cryptographic identity layer is what makes this trustworthy. Because crawlers authenticate via signed requests, Cloudflare can tie billing events to verified identities — not just IP addresses or self-declared headers that anyone can forge. The payment relationship has genuine integrity.\nThe forward-looking dimension matters too. As AI agents evolve from content scrapers into autonomous actors with budgets of their own, HTTP 402 becomes their native protocol for programmatic content negotiation. Pay Per Crawl is built for today\u0026rsquo;s crawlers and designed for tomorrow\u0026rsquo;s agents.\nMarkdown for Agents: Treating Agents as First-Class Citizens Here is a data point worth sitting with: a typical blog post served as HTML might cost over 16,000 tokens when fed to an AI model. The same content served as clean Markdown can be delivered in roughly 3,000 tokens — an 80% reduction. The difference comes from stripping out navigation bars, script tags, class attributes, and all the structural noise that HTML accumulates to serve human browsers.\nCloudflare\u0026rsquo;s network can now perform this conversion in real time at the edge, triggered by a content negotiation header. An AI agent that sends Accept: text/markdown in its request receives a clean, structured response — along with a header indicating the token count and signals declaring how the content may be used: for training, search, or inference.\nNo backend changes. No new pipelines. A single dashboard toggle, available today in beta on Pro, Business, and Enterprise plans. This is what it looks like to treat agents as first-class citizens of the web.\nAI Index: From Crawling to Subscribing The most architecturally forward-looking piece of what Cloudflare is building is the AI Index.\nThe premise: rather than AI platforms dispatching blind crawlers across the open web, site owners create a structured, AI-optimized index of their content that they own and control. Cloudflare manages the underlying infrastructure — embeddings, chunking, vector search, structured APIs. What the site owner gets in return is a ready-made interface for AI builders: an MCP server, an LLMs.txt file, a search API, bulk data endpoints, and pub/sub subscriptions that deliver real-time updates whenever content changes.\nThat last element is the architectural shift. Instead of re-crawling to discover what\u0026rsquo;s new, AI builders subscribe to updates. Fresher data, lower infrastructure costs, and a permissioned relationship between creator and consumer. An aggregated Open Index bundles participating sites for broader discovery — with content quality metadata (uniqueness, depth, relevance) that lets buyers evaluate value before paying for access.\nThis is the blueprint for a healthier content ecosystem: one where value flows in both directions, and creators retain meaningful control.\nAI-Aware Caching: Infrastructure Research at the Frontier In collaboration with researchers at ETH Zurich, Cloudflare produced a peer-reviewed paper published at the 2025 ACM Symposium on Cloud Computing examining how AI crawler behavior impacts CDN cache performance. The findings confirmed what operators were experiencing in practice: AI and human traffic have fundamentally incompatible cache characteristics, and mixing them without architectural changes degrades performance for human users.\nThe research points toward dedicated cache tiers for AI versus human traffic, alternative eviction algorithms (SIEVE and S3FIFO show early promise), and machine learning-based policies that adapt to real-time traffic composition. Cloudflare is already applying early findings to reduce bandwidth costs for customers experiencing disproportionate AI traffic — with more architectural changes ahead.\nDynamic Workers: The Execution Layer for the Agentic Era When AI agents move from reading content to taking action — writing scripts, calling APIs, processing data on your behalf — they need somewhere safe to run. The current default answer is containers. Containers work, but they\u0026rsquo;re heavy: hundreds of milliseconds to start, hundreds of megabytes per instance. At the scale where every user might run multiple simultaneous agents, that cost structure doesn\u0026rsquo;t hold.\nDynamic Workers address this with Cloudflare\u0026rsquo;s isolate-based sandbox model — the same technology powering Cloudflare Workers for nearly a decade. An isolate starts in milliseconds, uses a few megabytes, and scales to millions of concurrent sandboxes with no pre-warming and full per-request isolation. That\u0026rsquo;s roughly 100 times faster than a typical container.\nPaired with Code Mode — where agents write TypeScript functions against typed APIs rather than navigating hundreds of tool definitions — the entire Cloudflare API surface becomes accessible from an agent in under 1,000 tokens. Context windows stay lean. Agents perform better.\nHow These Pieces Fit Together What makes Cloudflare\u0026rsquo;s approach genuinely distinctive is not any individual capability in isolation. It is that these layers form a coherent, mutually reinforcing platform:\nWeb Bot Auth provides the cryptographic identity layer — verified, unforgeable, standardized at the IETF. AI Crawl Control uses that identity to classify and route AI traffic with precision. Pay Per Crawl converts verified AI traffic into a commercial relationship. Markdown for Agents ensures that traffic you choose to serve is delivered efficiently. AI Index replaces crawling altogether with a structured, permissioned discovery model. AI-Aware Caching protects human user performance as AI traffic scales. Dynamic Workers provides the secure execution substrate for agents that act, not just read. No other company is simultaneously building the identity layer, the protection layer, the monetization layer, the content delivery layer, and the execution layer for the AI era — while co-authoring the open standards that govern all of them, and observing more real AI internet traffic than anyone else on the planet.\nThe Bottom Line The internet is being rebuilt around a new class of actors — AI agents that don\u0026rsquo;t click, don\u0026rsquo;t generate ad revenue, and are increasingly capable of autonomous action. The transition is already underway. The infrastructure to support it fairly — for humans, for creators, and for AI builders alike — does not yet exist in mature form.\nCloudflare is building it. Not just as a product roadmap, but as active participation in the standards bodies that will define how all of this works for decades. From IETF Internet Drafts on bot authentication architecture, to peer-reviewed cache research at top systems conferences, to an open content signals framework — the work is happening at the foundational level.\nWe don\u0026rsquo;t just have a front-row seat to the AI internet. We\u0026rsquo;re helping build the venue.\nThis covers one side of the equation: how Cloudflare helps you manage, protect, and monetize the AI traffic that hits your content and infrastructure. But there is a second, equally important question — what about the AI your own teams, developers, and applications are consuming every day? How do you make AI accessible across your organization without losing control of costs, security, or governance? That\u0026rsquo;s the topic of my next post: AI Consumption at Scale: How Cloudflare Helps You Make AI Work for Your Entire Organization → I\u0026rsquo;m a Solutions Engineer at Cloudflare. This post reflects my own synthesis and perspective on publicly available Cloudflare research and announcements — it does not represent an official Cloudflare position.\nReferences: Web Bot Auth \u0026amp; RFC 9421 · AI Crawl Control \u0026amp; Bot Categories · Pay Per Crawl · Markdown for Agents · AI Index · AI-Aware Caching · Dynamic Workers · Crawl/Refer Analysis · Traffic by Purpose \u0026amp; Industry ","permalink":"https://blog.macharpe.com/posts/ai-internet-infrastructure-cloudflare/","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eDisclaimer\u003c/strong\u003e: This post reflects my own synthesis and perspective on publicly available Cloudflare research and announcements — it does not represent an official Cloudflare position.\u003c/p\u003e\u003c/blockquote\u003e\n\u003cp\u003e\u003cstrong\u003eTL;DR —\u003c/strong\u003e AI crawlers now represent a structural threat to how the web creates and distributes value. They consume content at massive scale, send little traffic back, and are quietly degrading CDN performance for real users. This post covers how Cloudflare is responding — not just with bot controls, but with a coherent platform: cryptographic bot identity (co-authored as an IETF standard), content monetization via Pay Per Crawl, token-efficient delivery for agents, a pub/sub AI Index to replace blind crawling, AI-aware cache architecture, and a secure execution layer for agentic code. Each piece reinforces the others. Together they represent Cloudflare\u0026rsquo;s answer to the question: what should the AI-era internet actually look like?\u003c/p\u003e","title":"The Internet Was Built for Humans. AI Didn't Get the Memo."},{"content":" Disclaimer: This article reflects my personal views and experiences and does not represent the official stance of Cloudflare. It is not an official Cloudflare tutorial or documentation. The project discussed is a personal initiative created independently.\nThe Moment I Realized Simple Wasn\u0026rsquo;t Enough \u0026ldquo;It works perfectly!\u0026rdquo; I remember telling myself three months ago, watching Claude query my Cisco Meraki network in real-time. The AI assistant could check device status, monitor client connections, and even troubleshoot network issues—all through a simple API key I\u0026rsquo;d hardcoded into my Cloudflare Worker.\nThen reality hit.\nMy colleague asked a simple question: \u0026ldquo;How do we give this to customers?\u0026rdquo;\nI stared at my code. Static API keys. No user tracking. No audit logs. No enterprise SSO. This wasn\u0026rsquo;t a product—it was a demo that worked great until someone actually wanted to use it in production.\nThat conversation sparked a three-month journey into the world of OAuth 2.1, PKCE flows, JWT verification, and edge authentication. This is the story of how I transformed a weekend project into something enterprises could actually trust.\nAct 1: The Weekend Project That Actually Worked It all started with a simple idea: what if Claude could talk directly to my network infrastructure?\nI\u0026rsquo;d been playing with the Model Context Protocol —Anthropic\u0026rsquo;s new standard for connecting AI assistants to external data sources. The concept fascinated me: instead of copy-pasting network status into chat, why not let the AI query it directly?\nOne weekend, fueled by coffee and curiosity, I built the first version. About 500 lines of TypeScript running on Cloudflare Workers:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 //My first attempt - laughably simple export default { async fetch(request: Request, env: Env): Promise\u0026lt;Response\u0026gt; { //Just check if they have the magic key const apiKey = request.headers.get(\u0026#39;X-API-Key\u0026#39;); if (apiKey !== env.MERAKI_API_KEY) { return new Response(\u0026#39;Unauthorized\u0026#39;, { status: 401 }); } // If they do, give them everything! const merakiAPI = new MerakiAPIService(env); const organizations = await merakiAPI.getOrganizations(); return new Response(JSON.stringify(organizations)); } } It worked beautifully. I could ask Claude \u0026ldquo;How many devices are offline?\u0026rdquo; and get instant answers. I could troubleshoot connectivity issues through conversation. It felt magical.\nBut deep down, I knew the truth. This \u0026ldquo;authentication\u0026rdquo; was security theater. One leaked API key and anyone could access everything. No way to know who did what. No way to revoke access without changing the key everywhere.\nAct 2: The Reality Check Two weeks after deploying my \u0026ldquo;working\u0026rdquo; server, I was showing it to a Solutions Architect friend over coffee.\n\u0026ldquo;This is cool,\u0026rdquo; he said, querying network stats through Claude. Then he paused. \u0026ldquo;But how would I give this to my customer?\u0026rdquo;\nThe question hung in the air.\n\u0026ldquo;Well, they\u0026rsquo;d need the API key, and then—\u0026rdquo;\n\u0026ldquo;So I\u0026rsquo;d send the API key over email? Slack?\u0026rdquo;\nSilence.\n\u0026ldquo;And if someone leaves the company, we change the key everywhere?\u0026rdquo;\nMore silence.\n\u0026ldquo;And there\u0026rsquo;s no audit log of who accessed what?\u0026rdquo;\nI felt my enthusiasm deflating like a popped balloon. He was right. This wasn\u0026rsquo;t production-ready. It wasn\u0026rsquo;t even demo-ready for enterprise customers.\nThat night, I opened Cloudflare\u0026rsquo;s documentation and started reading about Access for SaaS . If I was going to do this right, I needed proper OAuth 2.1, enterprise SSO, the works.\nThe journey ahead looked daunting: PKCE flows, JWT verification, JWKS endpoints, RFC compliance. I estimated two weeks of work.\nI was off by an order of magnitude.\nConfiguring Cloudflare Access for SaaS Before diving into the OAuth implementation, I needed to configure Cloudflare Access to protect my MCP server and define who could access it. This involved two critical steps: creating a SaaS application and defining access policies.\nCreating the SaaS Application In the Cloudflare Zero Trust dashboard, I created a new SaaS application specifically for the Meraki MCP server:\nThe key configuration choices I made:\nApplication name: A clear identifier for the MCP server Authentication protocol: OIDC (OAuth 2.0 with OpenID Connect) Scopes: Configured the OAuth scopes the application would request Redirect URLs: Specified where Cloudflare Access would send users after authentication Front Key for Code Exchange (PKCE): Enabled for security (critical for browser-based OAuth) Allow PKCE without Client Secret: Enabled to support public clients like browser-based MCP clients This configuration generates the OAuth endpoints that my Worker would need:\nAuthorization endpoint Token endpoint JWKS (JSON Web Key Set) endpoint for JWT verification Defining Access Policies With the application created, I needed to define who could access it. Cloudflare Access uses a powerful Include/Require/Exclude/Selective policy model:\nThis policy-based approach is incredibly powerful. I can start restrictive (just my email) and gradually expand to entire teams, departments, or customer organizations—all without touching the MCP server code.\nThe Access policy is enforced before the OAuth token is issued, so by the time my Worker receives a JWT token, I know the user has passed all policy checks.\nRefresh Token Configuration One critical security feature I enabled was refresh token rotation. This is configured in the Advanced Settings of the SaaS application:\nWhat are refresh tokens?\nIn OAuth 2.1, there are two types of tokens:\nAccess Tokens - Short-lived tokens (typically 15 minutes to 1 hour) used to access protected resources. These are sent with every API request.\nRefresh Tokens - Longer-lived tokens (hours to days) used to obtain new access tokens when the old ones expire, without requiring the user to re-authenticate.\nWhy refresh tokens matter for MCP servers:\nMCP sessions can be long-lived—users might keep Claude Code or the AI Playground open for hours while working. Without refresh tokens, users would need to re-authenticate every time the access token expires (e.g., every 15 minutes), which creates a terrible user experience.\nRefresh token rotation for security:\nCloudflare Access implements refresh token rotation, which means:\nEach time a refresh token is used to get a new access token, a new refresh token is also issued The old refresh token is immediately invalidated This prevents token replay attacks—if someone steals a refresh token, it becomes useless after the legitimate user rotates it Maximum lifetime can be configured (I set mine to 24 hours for a good balance between security and UX) My configuration:\nRefresh token enabled: ✅ Yes Refresh token lifetime: 24 hours Refresh token rotation: ✅ Enabled Reuse interval: 0 seconds (immediate rotation) This setup means users authenticate once in the morning, and the MCP server automatically refreshes their access token throughout the day without interruption. After 24 hours, they\u0026rsquo;ll need to re-authenticate, which provides a good security boundary.\nAct 3: Down the OAuth Rabbit Hole Week One: The Session Problem I started diving into OAuth 2.1 with PKCE on a Monday morning. By Wednesday afternoon, I\u0026rsquo;d hit my first major wall.\nEvery OAuth tutorial I found assumed you had sessions. Server-side state. A database to store the OAuth state between the authorization request and callback. Redis, PostgreSQL, something.\nI had none of that. Cloudflare Workers are stateless. Each request is completely independent. There\u0026rsquo;s no \u0026ldquo;session\u0026rdquo; to remember the code_verifier when the user comes back from authenticating.\nI spent two days exploring workarounds. Encoding state in the redirect URL? Too long, broke some identity providers. Client-side JavaScript? Couldn\u0026rsquo;t trust it. Cookies? Edge workers and cross-domain cookies don\u0026rsquo;t mix well.\nThen I stumbled on the solution hiding in plain sight: Cloudflare KV.\nKV is a distributed key-value store that\u0026rsquo;s available globally in milliseconds. I could store the OAuth state there with a 10-minute expiration. Just long enough for the user to authenticate, but short enough to be secure.\n1 2 3 4 5 6 7 8 9 10 11 // OAuth state management with KV const state = crypto.randomUUID(); const codeVerifier = generateCodeVerifier(); const codeChallenge = await generateCodeChallenge(codeVerifier); // Store state in KV with short TTL await env.OAUTH_KV.put( `oauth_state:${state}`, JSON.stringify({ codeVerifier, redirectUri, clientId }), { expirationTtl: 600 } // 10 minutes ); Elegant. Simple. And it worked.\nWeek Three: The CORS Mystery \u0026ldquo;Why isn\u0026rsquo;t the browser version working?\u0026rdquo;\nI\u0026rsquo;d tested my OAuth flow extensively with curl and Postman. Everything worked perfectly. But when I tried using it with Cloudflare\u0026rsquo;s AI Playground—a browser-based MCP client—nothing. Just a cryptic error in the console:\n1 Request header field \u0026#39;mcp-protocol-version\u0026#39; is not allowed by Access-Control-Allow-Headers I stared at this error for three hours before I understood what was happening.\nMCP clients send a custom header—mcp-protocol-version—to negotiate which version of the protocol they support. Standard stuff for any protocol negotiation. But CORS is strict: any non-standard header must be explicitly allowed by the server.\nI had CORS headers. I thought they were complete. But they weren\u0026rsquo;t.\nThe fix was simple once I understood it, but finding it was maddening:\n1 2 3 4 5 6 7 8 9 10 11 const corsHeaders = { \u0026#39;Access-Control-Allow-Origin\u0026#39;: \u0026#39;*\u0026#39;, \u0026#39;Access-Control-Allow-Methods\u0026#39;: \u0026#39;GET, POST, OPTIONS\u0026#39;, \u0026#39;Access-Control-Allow-Headers\u0026#39;: \u0026#39;Content-Type, Authorization, Cache-Control, mcp-protocol-version\u0026#39;, // The missing piece! \u0026#39;Access-Control-Max-Age\u0026#39;: \u0026#39;86400\u0026#39;, }; // Handle preflight requests if (request.method === \u0026#39;OPTIONS\u0026#39;) { return new Response(null, { status: 204, headers: corsHeaders }); } But here\u0026rsquo;s what really frustrated me: I had to add this header in three different places in my codebase. The main MCP handler, the OAuth endpoints, and the utility functions. Miss one, and the browser client would fail with a different cryptic error.\nI made a note to myself: test with real browser clients early. Command-line tools hide CORS sins.\nWeek Five: The Performance Crisis It was working. OAuth flow complete, tokens being verified, MCP tools responding. I should have been celebrating.\nInstead, I was watching my logs with growing horror:\n1 2 3 [AUTH] JWT verification: 127ms [AUTH] JWT verification: 143ms [AUTH] JWT verification: 156ms Every single MCP request was taking 100-150 milliseconds just for authentication. Before the actual work even started. For a real-time AI conversation, that\u0026rsquo;s an eternity.\nThe problem was obvious in hindsight: I was fetching the JWKS keys from Cloudflare Access on every request.\n1 2 3 4 // My naive first attempt const jwksResponse = await fetch(env.ACCESS_JWKS_URL); const jwks = await jwksResponse.json(); const verified = await verifyJWT(token, jwks); JWKS keys don\u0026rsquo;t change often. Maybe once a year when rotating. Yet here I was, fetching them hundreds of times per minute.\nI needed caching. But not just simple caching—I needed it to be fast enough that the overhead was negligible.\nEnter the Workers Cache API.\nMost developers know about KV for caching at the edge. But Cloudflare Workers have another, less-known caching layer: the Workers Cache API. It\u0026rsquo;s the same cache that powers Cloudflare\u0026rsquo;s CDN, but accessible from your Worker code.\nSub-millisecond access times.\nI built a two-tier caching strategy:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 export class CacheService { private kv: KVNamespace; private workersCache: Cache; async getWithWorkersCache\u0026lt;T\u0026gt;( key: string, fetchFunction: () =\u0026gt; Promise\u0026lt;T\u0026gt;, options?: CacheOptions ): Promise\u0026lt;{ data: T; cacheStatus: \u0026#39;HIT\u0026#39; | \u0026#39;MISS\u0026#39; }\u0026gt; { // Layer 1: Workers Cache API (sub-millisecond) const cachedResponse = await this.workersCache.match(cacheKey); if (cachedResponse) { return { data: await cachedResponse.json(), cacheStatus: \u0026#39;HIT\u0026#39; }; } // Layer 2: KV Cache (10-50ms) const kvCached = await this.get\u0026lt;T\u0026gt;(key, options); if (kvCached) { // Backfill Workers Cache for next request await this.workersCache.put(cacheKey, new Response(JSON.stringify(kvCached))); return { data: kvCached, cacheStatus: \u0026#39;MISS\u0026#39; }; } // Layer 3: Fetch from origin (slowest) const result = await fetchFunction(); await this.set(key, result, options); return { data: result, cacheStatus: \u0026#39;MISS\u0026#39; }; } } The results were dramatic:\n1 2 3 [AUTH] JWT verification: 4ms (CACHE HIT) [AUTH] JWT verification: 3ms (CACHE HIT) [AUTH] JWT verification: 5ms (CACHE HIT) From 150ms to 5ms. A 30x improvement. Now we were talking.\nWeek Seven: The Discovery Challenge Just when I thought I was done, the browser-based MCP clients started failing with a new error:\n1 Failed to discover OAuth endpoints But my OAuth endpoints were there! I could curl them. What was going on?\nAfter diving into the MCP specification and researching OAuth standards, I discovered the issue: OAuth discovery.\nModern OAuth clients don\u0026rsquo;t hardcode endpoint URLs. They discover them dynamically using well-known endpoints defined in OAuth standards. The clients were looking for these discovery endpoints, not finding them, and giving up.\nI needed to implement three discovery mechanisms:\nOAuth Authorization Server Metadata (RFC 8414) OAuth Protected Resource Metadata (RFC 8707) JWKS Endpoint (RFC 7517) Each one with specific JSON structures and required fields. Miss a field, and different clients would fail in different ways.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // RFC 8707: Protected Resource Metadata if (pathname === \u0026#39;/.well-known/oauth-protected-resource\u0026#39;) { return new Response( JSON.stringify({ resource: baseUrl, authorization_servers: [baseUrl], bearer_methods_supported: [\u0026#39;header\u0026#39;], resource_documentation: `${baseUrl}/health`, scopes_supported: [\u0026#39;meraki:read\u0026#39;], }), { headers: { \u0026#39;Content-Type\u0026#39;: \u0026#39;application/json\u0026#39;, \u0026#39;Access-Control-Allow-Origin\u0026#39;: \u0026#39;*\u0026#39;, \u0026#39;Cache-Control\u0026#39;: \u0026#39;public, max-age=3600\u0026#39;, }, } ); } I spent a weekend implementing all three discovery endpoints, carefully checking the specifications and examples to make sure every field was correct.\nWeek Nine: Debugging in the Dark By week nine, I had a working OAuth implementation. Mostly. The problem was intermittent failures that I couldn\u0026rsquo;t reproduce locally.\nOAuth debugging is uniquely frustrating because:\nThe flow spans multiple requests across different systems State is stored in various places (cookies, KV, headers) Errors are often generic: \u0026ldquo;Invalid token\u0026rdquo; could mean a dozen different things I developed a debugging strategy out of desperation:\nFirst, comprehensive logging at every step:\n1 2 3 4 console.log(\u0026#39;[AUTH] Step 1: Checking for Bearer token\u0026#39;); console.log(\u0026#39;[AUTH] Step 2: Extracting token from header\u0026#39;); console.log(\u0026#39;[AUTH] Step 3: Fetching JWKS keys\u0026#39;); console.log(\u0026#39;[AUTH] Step 4: Verifying token signature\u0026#39;); Second, I created a 46-step authentication flow diagram. Every HTTP request, every state transition, every decision point. When something broke, I could trace exactly where in the flow it failed.\nThird, real-time log monitoring:\n1 npx wrangler tail --format pretty This combination helped me catch subtle issues:\nThe state parameter wasn\u0026rsquo;t being passed through the callback The code_verifier was being retrieved with the wrong key from KV JWKS keys were being cached with different kid (key ID) values Each fix was small. But each one was invisible until I had the right debugging tools in place.\nWeek Twelve: Putting It All Together By week twelve, all the pieces were in place. The OAuth flow worked. The caching was fast. The standards compliance was solid. But I wanted to step back and visualize the complete picture.\nI drew out the entire authentication dance—every HTTP request, every redirect, every token exchange:\nsequenceDiagram participant User as User participant Client as MCP Client(AI Playground) participant Worker as Meraki MCPWorker participant Access as CloudflareAccess participant SSO as EnterpriseSSO (Okta) Note over User,SSO: 📱 User Connects to MCP Server Client-\u003e\u003eWorker: 1. Connect to /mcp Worker--\u003e\u003eClient: 2. 401 Unauthorized(Need authentication) Note over User,SSO: 🔐 OAuth Flow Begins Client-\u003e\u003eWorker: 3. Start OAuth flow Worker--\u003e\u003eUser: 4. Show approval dialog User-\u003e\u003eWorker: 5. Click \"Approve\" Worker-\u003e\u003eAccess: 6. Redirect to Access login Note over User,SSO: 👤 Enterprise SSO Authentication Access-\u003e\u003eSSO: 7. SAML/OIDC login User-\u003e\u003eSSO: 8. Enter credentials SSO--\u003e\u003eAccess: 9. User authenticated Note over User,SSO: 🎫 Token Exchange Access--\u003e\u003eWorker: 10. Return auth code Worker-\u003e\u003eAccess: 11. Exchange code for JWT Access--\u003e\u003eWorker: 12. Cloudflare Access JWT Worker--\u003e\u003eClient: 13. Return access token Note over User,SSO: ✅ Authenticated Access Client-\u003e\u003eWorker: 14. MCP requests with Bearer token Worker-\u003e\u003eWorker: 15. Verify JWT signature Worker--\u003e\u003eClient: 16. Return Meraki data Note over User,SSO: Subsequent requests use cached token The Power of Platform Thinking One of the biggest lessons from this project: platforms beat point solutions.\nInstead of stitching together:\nAWS Lambda for compute Auth0 for authentication Redis for caching CloudFront for CDN Multiple monitoring tools Cloudflare\u0026rsquo;s unified platform provides:\nWorkers - Serverless compute at the edge Access for SaaS - Enterprise SSO and OAuth KV - Distributed key-value storage Durable Objects - Stateful edge compute Workers Cache API - Sub-millisecond caching Analytics - Built-in monitoring and logs Everything works together seamlessly, with consistent APIs and zero cross-vendor complexity.\nPerformance: The Numbers The final implementation delivers impressive performance:\nMetric Value Comparison JWT Verification (cached) ~5ms 20x faster than uncached Organization List (cached) ~10ms 40x faster than uncached Network Query (cached) ~10ms 40x faster than uncached Client List (cached) ~10ms 60x faster than uncached Cold Start \u0026lt;100ms Cloudflare Workers edge Global Availability 275+ locations Cloudflare\u0026rsquo;s network Caching Strategy:\nJWKS Keys: 1 hour TTL (rarely change) Organizations: 30 minutes TTL (moderate churn) Networks: 15 minutes TTL (moderate churn) Clients: 5 minutes TTL (high churn) Code Statistics: By the Numbers The project has grown significantly:\n2,500+ lines of TypeScript 27 MCP tools for Meraki API coverage 18 authentication test cases 3 major releases (v1.0.0 → v1.3.1) 26 pull requests merged 95%+ test coverage on auth layer Lessons From Three Months in the Trenches Looking back at the journey, a few key lessons stand out—the kind you can only learn by doing:\nStart Simple, But Know Where You\u0026rsquo;re Going My biggest mistake early on was underestimating the complexity. I thought OAuth would take two weeks. It took twelve.\nBut starting with the simple API key version wasn\u0026rsquo;t a mistake. It let me validate the core idea—AI assistants talking to network infrastructure—before investing months in authentication.\nThe progression worked:\nWeek 1: Proof of concept with API key Week 4: OAuth discovery endpoints (the foundation) Week 8: Full OAuth 2.1 with PKCE (the hard part) Week 12: Caching optimization (the polish) If I\u0026rsquo;d tried to build everything at once, I would have given up.\nDocumentation Saves Lives I cannot stress this enough: Cloudflare\u0026rsquo;s Access for SaaS documentation was my lifeline.\nBut here\u0026rsquo;s what I learned about using documentation effectively:\nReference the official specifications when you hit edge cases. They provide the \u0026ldquo;why\u0026rdquo; behind implementation details. Follow the examples exactly first. Modify later. When something doesn\u0026rsquo;t work, re-read the docs. I found answers on the third reading that I\u0026rsquo;d missed twice before. CORS is the Silent Killer CORS issues won\u0026rsquo;t show up in your terminal. They won\u0026rsquo;t fail your unit tests. They\u0026rsquo;ll work perfectly with curl.\nThen a user will try it in a browser and nothing will work.\nTest with real browser clients early. Like, week one early. I learned this the hard way.\nCache Everything That Doesn\u0026rsquo;t Move One of the most impactful optimizations came from understanding what doesn\u0026rsquo;t change:\nJWKS keys? Rotate maybe once a year. Cache for an hour. Organization lists? Rarely change. Cache for 30 minutes. Network data? Updates occasionally. Cache for 15 minutes. Client connections? Constantly changing. Cache for 5 minutes. 1 2 3 4 5 6 const CacheTTL = { JWKS_KEYS: 3600, // 1 hour ORGANIZATIONS: 1800, // 30 minutes NETWORKS: 900, // 15 minutes CLIENTS: 300, // 5 minutes }; This simple strategy took my average response time from 150ms to 5ms.\nObservability is Not Optional When things went wrong—and they did, often—I needed to see what was happening. Real-time log monitoring with Wrangler became my debugging superpower:\n1 2 3 4 5 # Watch the logs scroll by in real-time npx wrangler tail --format pretty # Only show me the errors npx wrangler tail --format pretty | grep ERROR I could watch the OAuth flow happen step by step, catch errors as they occurred, and understand exactly where things were breaking.\nWhat\u0026rsquo;s Next The project continues to evolve, and there\u0026rsquo;s still work to do.\nThe MCP Client Compatibility Challenge While the OAuth implementation works perfectly with Cloudflare AI Playground and Claude Code (CLI), I discovered that not all MCP clients handle OAuth the same way:\nClient Status What Works What Doesn\u0026rsquo;t Cloudflare AI Playground ✅ Works perfectly Everything - Claude Code (CLI) ✅ Works perfectly Everything - Claude.ai (Web) ⚠️ Partial OAuth succeeds Handshake stalls after initialize Claude Desktop ❌ Fails - mcp-remote can\u0026rsquo;t resolve callback URI This taught me an important lesson: implementing standards correctly doesn\u0026rsquo;t guarantee compatibility. Each MCP client interprets the OAuth flow slightly differently, and some have bugs or limitations that only surface with remote OAuth-protected servers.\nThe fact that Claude Code works perfectly suggests the problems are solvable—it\u0026rsquo;s just a matter of understanding what each client expects.\nFuture Enhancements Beyond client compatibility, I\u0026rsquo;m exploring several exciting integrations:\nMCP Portal Integration - I\u0026rsquo;m currently facing a domain conflict scenario: my MCP portal and the SaaS Access App for OAuth both use the same domain (macharpe.com), which prevents the integration from working properly. But I\u0026rsquo;ll figure it out eventually! The MCP Portal would provide a centralized interface for managing MCP server connections.\nWeb Assets Integration - Experiment with API Shield\u0026rsquo;s schema validation to protect the various OAuth and MCP endpoints. This would add an additional security layer by validating request/response payloads against OpenAPI schemas, preventing malformed or malicious requests from reaching the Worker.\nComplete audit logging - Build a full compliance trail for enterprise security teams, tracking every authentication attempt, token issuance, and API call with detailed metadata for SOC2/ISO27001 compliance.\nTry It Yourself The complete source code is available on GitHub: https://github.com/macharpe/meraki-mcp-cloudflare The repository includes:\nComplete OAuth 2.1 implementation Authentication flow documentation (46-step detailed guide) Multi-layer caching system 27 MCP tools for Meraki network management Production deployment guide Comprehensive test suite Quick Start 1 2 3 4 5 6 7 8 9 10 11 12 13 # Clone the repository git clone https://github.com/macharpe/meraki-mcp-cloudflare.git cd meraki-mcp-cloudflare # Install dependencies npm install # Configure environment variables cp .env.example .dev.vars # Edit .dev.vars with your credentials # Deploy to Cloudflare Workers npm run deploy The Moment It All Clicked Three months after that conversation with my Solutions Architect friend, I was back at the same coffee shop.\n\u0026ldquo;Remember when you asked how I\u0026rsquo;d give this to customers?\u0026rdquo; I said, pulling up the Cloudflare AI Playground on my laptop.\nHe nodded.\n\u0026ldquo;Watch this.\u0026rdquo;\nI connected to the MCP server. Instead of API keys, a browser window opened. Cloudflare Access login. He entered his Okta credentials. Two-factor authentication. Approved.\nThe AI assistant came online, authenticated as him, with full audit logging of every action.\n\u0026ldquo;Now that,\u0026rdquo; he said, \u0026ldquo;is something I can show to customers.\u0026rdquo;\nSee It in Action Want to see how the OAuth flow and MCP server work in practice? Here are two videos demonstrating the system in action:\nOAuth Authentication Flow This video shows the complete OAuth 2.1 authentication flow—from initial connection to successful authentication with Cloudflare Access and enterprise SSO:\nYour browser does not support the video tag. You can download the video instead. Watch as the MCP client initiates the OAuth flow, redirects to Cloudflare Access, authenticates through SSO, and exchanges the authorization code for a JWT token—all in real-time.\nMCP Tools in Claude Code Once authenticated, the MCP server exposes 27 tools for managing Cisco Meraki networks. This video demonstrates the tool discovery and usage through Claude Code CLI:\nYour browser does not support the video tag. You can download the video instead. You can see how Claude can query network organizations, list networks, check device status, monitor client connections, and troubleshoot issues—all through natural language conversation.\nReflections: What This Journey Taught Me Building an enterprise-grade OAuth system taught me something unexpected: complexity is often the enemy of security.\nMy first version with the hardcoded API key felt wrong because it was. But the solution wasn\u0026rsquo;t to add complexity for its own sake. It was to use the right primitives—OAuth 2.1, PKCE, JWT tokens, SSO—and implement them correctly.\nCloudflare\u0026rsquo;s platform made this possible. Instead of stitching together Auth0, AWS Lambda, Redis, and monitoring tools across three vendors, I had Workers, Access, KV, and Durable Objects working together seamlessly. Same APIs. Same deployment pipeline. Same observability stack.\nThe integration costs? Zero. The cross-vendor debugging? None. The mental overhead? Dramatically lower.\nThis is the power of platforms over point solutions.\nThree Months, 26 Pull Requests, One Mission The journey from that first weekend prototype to v1.3.1 took:\nThree months of evening and weekend work 26 pull requests merged (each one teaching me something) 2,500+ lines of carefully crafted TypeScript Countless hours of research, debugging, and head-scratching But the result is something I\u0026rsquo;m proud of: a production-ready system running in 275+ locations worldwide, serving real-time AI queries with enterprise-grade security.\nIf you\u0026rsquo;re building AI integrations—whether for network management, customer support, or data analysis—I hope this story inspires you to tackle authentication the right way from the start.\nThe tooling exists. The standards are mature. Platforms like Cloudflare make enterprise security accessible to anyone willing to invest the time to learn it properly.\nAnd yes, it\u0026rsquo;ll take longer than you think. But it\u0026rsquo;s worth it.\nQuestions or feedback?\nFeel free to reach out via email or connect on LinkedIn . I\u0026rsquo;m always happy to discuss authentication, MCP, or edge computing!\nStar the repo if you find it useful! ⭐\nSpecial thanks to the Cloudflare team for their excellent documentation and platform, and to the Anthropic team for pioneering the Model Context Protocol standard.\n","permalink":"https://blog.macharpe.com/posts/oauth-journey-meraki-mcp/","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eDisclaimer\u003c/strong\u003e: This article reflects my personal views and experiences and does not represent the official stance of Cloudflare. It is not an official Cloudflare tutorial or documentation. The project discussed is a personal initiative created independently.\u003c/p\u003e\u003c/blockquote\u003e\n\u003ch2 id=\"the-moment-i-realized-simple-wasnt-enough\"\u003eThe Moment I Realized Simple Wasn\u0026rsquo;t Enough\u003c/h2\u003e\n\u003cp\u003e\u0026ldquo;It works perfectly!\u0026rdquo; I remember telling myself three months ago, watching Claude query my Cisco Meraki network in real-time. The AI assistant could check device status, monitor client connections, and even troubleshoot network issues—all through a simple API key I\u0026rsquo;d hardcoded into my Cloudflare Worker.\u003c/p\u003e","title":"The Three-Month Journey to Enterprise Authentication: Building an OAuth-Secured AI Assistant for Network Management"},{"content":"Introduction Over the past two posts (Building a Scalable Zero Trust Demo environment with Cloudflare and Terraform (Part 1) and Automating Cloudflare Zero Trust at Scale: Terraform, Multi-Cloud, and Identity (Part 2) ), we\u0026rsquo;ve explored the foundations of building a scalable Zero Trust demo environment and how to automate its deployment with Cloudflare and Terraform. In Part 1, we laid the groundwork by designing a robust, modular Zero Trust architecture. Part 2 took things further, demonstrating how to streamline and scale this setup using Infrastructure as Code principles. Now, in part 3 (final part), we will explore advanced use cases you can demonstrate with this environment.\nWhat\u0026rsquo;s new since Part 2: Site to Site connectivity with WARP to WARP Client (GCP \u0026lt;\u0026ndash;\u0026gt; Azure) Added an additional subnet on GCP to host the WARP VM and subsequent VMs. Added a Cloud Nat Gateway to GCP to avoid allocing public IPs for each VMs. Added a NAT Gateway on Azure (because this on September 30, 2025 and this ) All workloads are accessible via private ip address (except the two with the WARP Connector, accessible with addresses in 100.96.0.0/12 subnet range) Added L3 reachability between GCP (WARP Subnet) \u0026ndash;\u0026gt; AWS (ssh) and Azure (WARP Subnet) \u0026ndash;\u0026gt; AWS (ssh) New use case added to GCP: Remote Desktop Connection (RDP) through WARP. On the 18th of July, 2025, All Developer Edition Orgs will be deactivated (source ) and therefore I had to refactor the OKTA part of the code for the demo to work. Hence I have decided to remove completely the Okta module to put it in a separate project. After some refactoring and cleanup, Terraform code is now 4400+ lines of code, 62 files and 14 directories (even if the quantity does not mean quality!) with 147 resources Let\u0026rsquo;s dive into the updated architecture and what can be showcased with this environment.\nArchitecture Schema updated PNG and SVG are available here .\nUse Case 1: OKTA groups synchronization and user deprovisioning First thing first, let us talk about authentication and authorization.\nIn order to synchronize your okta groups and build policies upon them as well as be able to deprovision users dynamically, you will need to use SCIM (System for Cross-domain Identity Management). This is a standard for automating the exchange of user identity information between identity domains, or IT systems\nFirst, you need an identity provider. As you can see in the diagram, I have integrated with several common Identity Provider (IdP) such as Okta, Entra ID, Google and Github. However to keep the demo \u0026ldquo;free\u0026rdquo;, I am going to showcase the SCIM with Okta. The main reason is that SCIM is available for free on a dev tenant on Okta which does not hold true for Entra ID.\nThe second choice that had to be made is whether to use OpenID Connect (OIDC) or SAML as an authentication aprotocol. Both are viable options. If you are interested in more details you can go and attend the Explore Authentication and Authorization Protocols (15 min) from Okta.\nThird, because the Okta Dev tenant only allows for 5 registered applications and because, when using OIDC, you need two separate app in Okta (source ), I have choosen SAML. Not to mention that, as of today, SAML remains very popular in large enterprises.\nThe Okta integration allows you to synchronize IdP groups and automatically deprovision users using SCIM.\nGroup membership change reauthentication: Revoke a user\u0026rsquo;s active session when their group membership changes in IdP. This will invalidate all active Access sessions and prompt for reauthentication for any WARP session policies. Access will read the user\u0026rsquo;s updated group membership when they reauthenticate.\nBeyond basic group synchronization, this OKTA-Cloudflare integration demonstrates true zero-trust principles in action. Unlike traditional perimeter-based security where users gain broad network access, each application request is individually evaluated against current identity state, device posture, and contextual signals. The SCIM integration ensures that privilege changes are near-instantaneous—when a developer is removed from the \u0026lsquo;prod-access\u0026rsquo; group in OKTA, their production access revokes within the SCIM sync interval, typically under 5 minutes. This level of granular, real-time access control would require complex custom development with traditional infrastructure.\nOn a side note, not all the contextual information is checked continuously. For a thorough list, you can check out this exhaustive list provided by Cloudflare.\nWith this OKTA integration in place, you now have a robust foundation for dynamic access management. When an employee joins a new project team, changes roles, or leaves the organization, their group memberships in OKTA automatically propagate to Cloudflare Access within minutes. This eliminates the manual overhead of updating permissions across multiple systems and significantly reduces security risks from stale access rights. The SCIM provisioning ensures that user lifecycle management becomes seamless—no more forgotten accounts lingering with unnecessary privileges or delays in granting access to new team members.\nLet see it in action:\nYour browser does not support the video tag. SCIM: Provisionning and deprovisionning of a user in Okta and Cloudflare\nWith this robust OKTA integration establishing our identity foundation, we can now extend these same zero-trust principles beyond web applications to infrastructure access. While centralized identity management solves the \u0026ldquo;who\u0026rdquo; question, infrastructure components like servers and databases present unique challenges around the \u0026ldquo;how\u0026rdquo; of secure access. Traditional infrastructure access relies on VPNs that create broad network trust zones, but with our identity-aware foundation in place, we can now apply the same granular, contextual policies to SSH and other infrastructure protocols. Let\u0026rsquo;s see how this transforms server access from a network-level decision into an application-level one.\nUse Case 2: SSH with Access for Infrastructure Traditional SSH access creates a web of vulnerabilities: shared SSH keys that never rotate, VPN tunnels that grant excessive network access, and bastion hosts that become single points of failure and attack. Cloudflare Access transforms SSH into a web-native experience while maintaining full protocol compatibility and eliminating the need to distribute and manage SSH keys across hundreds of developers.\nHere\u0026rsquo;s what makes this approach uniquely powerful:\nZero Client Installation: Developers access servers through their existing web browser, authenticated via your existing OKTA integration. No VPN clients, no SSH key distribution, no network configuration changes required—reducing onboarding time from hours to minutes.\nProtocol-Level Zero Trust: Unlike VPNs that grant network-level access, Cloudflare evaluates every SSH connection request individually. Each server becomes its own protected resource with granular access policies. A developer might have access to staging servers but not production, or access might be time-bounded for specific maintenance windows.\nComprehensive Audit and Session Recording: Every SSH session flows through Cloudflare\u0026rsquo;s infrastructure, enabling complete audit logs, session recording, and real-time monitoring. Traditional SSH access often creates audit blind spots—especially when users connect directly or through shared bastion hosts.\nThis granular approach transforms infrastructure access from a binary network decision into nuanced, contextual authorization—as demonstrated in this policy configuration:\nThis same granular control would require complex scripting and multiple tools in traditional environments. The elegance lies in consistency—whether accessing a web application or SSH server, the same identity provider, the same policies, and the same audit trail apply. Infrastructure access becomes as controllable and observable as SaaS application access.The elegance lies in consistency—whether accessing a web application or SSH server, the same identity provider, the same policies, and the same audit trail apply. Infrastructure access becomes as controllable and observable as SaaS application access.\nBelow is a more in-depth diagram of how SSH connection is established under the hood.\nsource: https://blog.cloudflare.com/intro-access-for-infrastructure-ssh/#so-how-does-cloudflares-ssh-proxy-work Along the setup process, one key step is the generation of the Cloudflare SSH Certificate Authority used by Cloudflare (source ). Then you will need to paste the public key onto your server so that it can validate the short-lived SSH certificate generated. This approach maintains the cryptographic integrity of SSH while eliminating long-lived credentials—each session uses a unique, short-lived certificate that automatically expires.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #====================================================== # Short lived Certificate CA for Infrastructure Access #====================================================== locals { gateway_ca_certificate = jsondecode(data.http.short_lived_cloudflare_ssh_ca.response_body) } data \u0026#34;http\u0026#34; \u0026#34;short_lived_cloudflare_ssh_ca\u0026#34; { url = \u0026#34;https://api.cloudflare.com/client/v4/accounts/${var.cloudflare_account_id}/access/gateway_ca\u0026#34; request_headers = { \u0026#34;X-Auth-Email\u0026#34; = var.cloudflare_email \u0026#34;X-Auth-Key\u0026#34; = var.cloudflare_api_key \u0026#34;Content-Type\u0026#34; = \u0026#34;application/json\u0026#34; } } Let us see it in action:\nQuery available infrastructure: I retrieve the available targets through the web interface—no need to remember server IPs or maintain connection configs Seamless authentication: I login as a user listed under \u0026ldquo;Username\u0026rdquo; (matthieu) through our existing OKTA integration Transparent certificate management: I don\u0026rsquo;t need any certificate or private key—notice the complete absence of key management as Cloudflare handles certificate generation transparently Your browser does not support the video tag. Example of login to SSH for Infrastructure Access App\nBeyond the smooth login experience, you also get comprehensive auditability. Every session is automatically recorded and can be easily reviewed for compliance audits.\nExample of session log download and decryption\nSession logs can be easily decrypted and reviewed using Cloudflare\u0026rsquo;s provided tooling, making compliance audits straightforward and eliminating the operational overhead of managing separate audit solutions.\nN.B.: To make it easy to decrypt the log, I have created an alias. I have pasted the code below (source: github and View SSH logs )\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 decrypt_log_ssh() { /Applications/ssh-log-cli/ssh-log-cli decrypt -i \u0026#34;$1\u0026#34; -k /Applications/ssh-log-cli/sshkey # Find the newly created decrypted file decrypted_file=$(ls | grep -E \u0026#34;$1.*-decrypted.zip\u0026#34;) if [ -n \u0026#34;$decrypted_file\u0026#34; ]; then unzip \u0026#34;$decrypted_file\u0026#34; -d \u0026#34;${decrypted_file::-11}\u0026#34; # Navigate to the unzipped directory and cat term_data.txt unzipped_dir=\u0026#34;${decrypted_file::-11}\u0026#34; if [ -d \u0026#34;$unzipped_dir\u0026#34; ]; then cd \u0026#34;$unzipped_dir\u0026#34; if [ -f \u0026#34;term_data.txt\u0026#34; ]; then less term_data.txt else echo \u0026#34;term_data.txt not found in the unzipped directory.\u0026#34; fi # Return to the original directory cd - else echo \u0026#34;Unzipped directory not found.\u0026#34; fi else echo \u0026#34;Could not find the decrypted .zip file.\u0026#34; fi } Cloudflare\u0026rsquo;s SSH access implementation demonstrates how zero-trust principles can secure traditional infrastructure protocols, but it still requires users to have SSH clients installed and configured. What if we could eliminate even that requirement? This leads us to perhaps the most innovative capability in Cloudflare\u0026rsquo;s zero-trust arsenal: bringing the entire terminal experience directly into the browser. While our previous use case showed how to secure SSH connections, this next evolution asks a fundamental question: why should secure infrastructure access require any client software at all?\nUse Case 3: Browser-rendered Terminal This use case showcases Cloudflare\u0026rsquo;s most innovative infrastructure access capability—rendering full terminal sessions directly in the browser with zero client-side installation.\nWhy This Changes Everything? Universal Access: Any device with a browser becomes a secure terminal. Developers can troubleshoot production issues from their personal laptop, a conference room computer, or even a tablet—all without compromising security posture.\nSandboxed Execution: The terminal session runs entirely within the browser sandbox, isolated from the local device. No SSH keys stored locally, no cached credentials, no residual access after the browser session ends.\nRich Protocol Support: Unlike simple web shells, this supports full terminal features—colors, cursor movement, file transfers, even curses-based applications like htop or vim work seamlessly.\nCloudflare\u0026rsquo;s Technical Innovation: This capability leverages Cloudflare\u0026rsquo;s global network and browser-rendering technology to stream terminal sessions as web content. The server never sees the client device directly—all communication flows through Cloudflare\u0026rsquo;s encrypted tunnels, with the browser handling rendering and input capture.\nReal-World Scenario: During a production incident, an on-call engineer receives an alert while at dinner. Using their phone\u0026rsquo;s browser, they authenticate via OKTA (including MFA), access the affected server through Cloudflare Access, run diagnostic commands, and implement a hotfix—all without installing apps or storing credentials on their personal device.\nSecurity Elegance: Traditional remote access creates persistent attack surfaces—VPN clients, stored keys, cached sessions. Browser-rendered terminals are ephemeral by design. When the browser tab closes, access terminates completely. No residual network connectivity, no cached credentials, no client-side artifacts.\nThis represents the evolution from \u0026rsquo;trust the network\u0026rsquo; (VPN) to \u0026rsquo;trust the identity\u0026rsquo; (Zero Trust) to \u0026rsquo;trust nothing persistently\u0026rsquo; (ephemeral access). It\u0026rsquo;s infrastructure access reimagined for the cloud-native era.\nYour browser does not support the video tag. Browser rendered Terminal in Action\nLook how smooth the experience and also how responsive the browser based terminal is. Please note that at no point in time I have entered a Private Key or username and password to SSH into the server. Everything is handled by the OKTA.\nBehind the scene it uses short-lived certificates allowing certificate based authentication without having long lived keys. Below is how you define in Terraform the short-lived certificate. (documentation section 3,4,5 and 6)\n1 2 3 4 5 6 7 #=================================================== # Short lived Certificate for Browser Rendered App #=================================================== resource \u0026#34;cloudflare_zero_trust_access_short_lived_certificate\u0026#34; \u0026#34;zero_trust_access_short_lived_certificate_database_browser\u0026#34; { app_id = cloudflare_zero_trust_access_application.ssh_aws_browser_rendering.id zone_id = var.cloudflare_zone_id } You can also easily audit it which is crucial for compliance purpose. Below I can also retrieve the justification from Cloudflare log.\nCloudflare\u0026rsquo;s browser-rendered terminal solution elegantly handles Linux/Unix infrastructure, but modern enterprises operate in heterogeneous environments. The reality is that Windows servers—particularly Domain Controllers—represent some of the most critical and frequently accessed infrastructure components in enterprise environments. While Cloudflare\u0026rsquo;s terminal-based solutions excel for command-line operations, Windows infrastructure often requires GUI access for administration tasks that simply can\u0026rsquo;t be accomplished through a terminal. This presents an interesting challenge: how do we apply the same zero-trust principles we\u0026rsquo;ve established for SSH to protocols like RDP that were never designed with modern security paradigms in mind? Our final use case addresses this gap by demonstrating how Cloudflare extends zero-trust protection to Windows infrastructure, maintaining our consistent security posture across the entire technology stack.\nUse Case 4: RDP Connection to a Windows Domain Controller Accessing Domain Controllers via RDP is a common enterprise requirement, but it presents significant security challenges. Domain Controllers are high-value targets that store sensitive Active Directory data, making their security paramount.\nThe Security Challenge Traditional RDP implementations rely on username/password authentication and require opening ports directly to the internet. This approach is vulnerable to:\nBrute-force attacks against exposed RDP ports Credential stuffing attacks Lack of modern Zero Trust controls like MFA and device compliance Unlike modern web applications that integrate natively with identity providers (Okta, Azure AD) and support MFA with short-lived tokens, RDP lacks built-in Zero Trust mechanisms.\nZero Trust Approach With Cloudflare Zero Trust, we can eliminate these risks by:\nKeeping all inbound ports closed on the domain controller Requiring identity verification through SAML providers Enforcing device posture compliance Creating audit trails for all access attempts Implementation Strategy Note: While browser-based RDP or RBI (Remote Browser Isolation) would be ideal for this use case, these features are currently in Enterprise-only beta. For this demonstration using free tier capabilities, we\u0026rsquo;ll implement RDP over WARP.\nPrerequisites WARP client installed and enrolled on admin devices SAML integration configured with your identity provider Device posture checks defined for OS compliance Domain controller accessible via internal network Gateway Network Policies We\u0026rsquo;ll implement two Gateway Network Policies in order of precedence:\nRDP IT Admin Policy - Allows access for verified IT administrators RDP Default Deny Policy - Blocks all other access attempts Policy precedence ensures that the most specific rules are evaluated first, with the default deny acting as a safety net.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 #======================================================== # Gateway Network Policy: Windows RDP Server #======================================================== # Allow RDP for IT Admins resource \u0026#34;cloudflare_zero_trust_gateway_policy\u0026#34; \u0026#34;rdp_admin_access_policy\u0026#34; { account_id = var.cloudflare_account_id name = \u0026#34;RDP - IT Admin Access Policy\u0026#34; description = \u0026#34;Allow RDP access for IT administrators\u0026#34; enabled = true action = \u0026#34;allow\u0026#34; precedence = \u0026#34;2\u0026#34; filters = [\u0026#34;l4\u0026#34;] traffic = \u0026#34;net.dst.ip == ${var.gcp_windows_vm_internal_ip} and net.dst.port == ${var.cf_domain_controller_rdp_port} and net.protocol == \\\u0026#34;tcp\\\u0026#34;\u0026#34; identity = \u0026#34;any(identity.saml_attributes[*] == \\\u0026#34;groups=${var.okta_itadmin_saml_group_name}\\\u0026#34;) or any(identity.saml_attributes[*] == \\\u0026#34;groups=${var.okta_infrastructureadmin_saml_group_name}\\\u0026#34;)\u0026#34; device_posture = \u0026#34;any(device_posture.checks.passed[*] == \\\u0026#34;${var.cf_latest_macOS_version_posture_id}\\\u0026#34;) or any(device_posture.checks.passed[*] == \\\u0026#34;${var.cf_latest_windows_version_posture_id}\\\u0026#34;) or any(device_posture.checks.passed[*] == \\\u0026#34;${var.cf_latest_linux_kernel_version_posture_id}\\\u0026#34;)\u0026#34; rule_settings = { block_page_enabled = false insecure_disable_dnssec_validation = false notification_settings = { enabled = false } } } # Block all other RDP access resource \u0026#34;cloudflare_zero_trust_gateway_policy\u0026#34; \u0026#34;rdp_default_deny_policy\u0026#34; { account_id = var.cloudflare_account_id name = \u0026#34;RDP - Default Deny Policy\u0026#34; description = \u0026#34;Deny RDP access for others\u0026#34; enabled = true action = \u0026#34;block\u0026#34; precedence = \u0026#34;999\u0026#34; filters = [\u0026#34;l4\u0026#34;] traffic = \u0026#34;net.dst.ip == ${var.gcp_windows_vm_internal_ip} and net.dst.port == ${var.cf_domain_controller_rdp_port} and net.protocol == \\\u0026#34;tcp\\\u0026#34;\u0026#34; rule_settings = { block_page_enabled = false block_reason = \u0026#34;This website is blocked by RDP - Default Deny Policy\u0026#34; insecure_disable_dnssec_validation = false notification_settings = { enabled = true msg = \u0026#34;This website is blocked by RDP - Default Deny Policy\u0026#34; } } } How This Creates Zero Trust Security? This configuration ensures that access to the domain controller requires:\nIdentity verification: User must be authenticated via SAML and belong to authorized groups (ITAdmin or InfrastructureAdmin) Device compliance: Device must pass posture checks for latest OS versions Network isolation: Traffic flows through Cloudflare\u0026rsquo;s secure edge, with no exposed ports on the domain controller Audit trail: All access attempts are logged for security monitoring When an unauthorized user or non-compliant device attempts RDP access, the default deny policy blocks the connection, protecting your critical infrastructure.\nThis approach transforms a traditionally vulnerable service into a zero-trust secured connection, significantly reducing your attack surface while maintaining administrative functionality.\nHaving demonstrated four distinct approaches to zero-trust infrastructure access—from web applications through Linux servers to Windows domain controllers—we\u0026rsquo;ve built a comprehensive security architecture that spans the entire enterprise technology stack. But technical capability alone doesn\u0026rsquo;t determine solution viability. In real-world deployments, the financial impact of infrastructure decisions can make or break adoption. Understanding the true cost of our multi-cloud zero-trust environment is crucial for making informed decisions about production deployments and scaling strategies.\nFinOps: Cost Analysis of the Multi-Cloud Zero Trust Demo Environment Financial Operations (FinOps) is critical when deploying multi-cloud infrastructure, even for demonstration purposes. While this setup aims to leverage free tiers where possible, understanding the true cost helps in making informed decisions about production deployments.\nCost Monitoring with Infracost I\u0026rsquo;ve integrated Infracost into the CI/CD pipeline to provide automated cost estimation. Infracost is an excellent tool that:\nAnalyzes Terraform plans before deployment Provides cost breakdowns by resource type Integrates seamlessly with GitLab/GitHub workflows Helps prevent cost surprises in infrastructure changes Current Cost Breakdown (June 7, 2025) Total Monthly Cost: $208.58 (~$0.29/hour)\nN.B.: the cost of the windows Server 2016 Datacenter is not included above (working on this issue)\nThe cost breakdown reveals some interesting insights.\nMajor Cost Drivers: 1. AWS NAT Gateway: $37.96/month (18% of total cost)\nRequired for private subnet internet access Fixed monthly charge regardless of usage Alternative: NAT instances for cost optimization 2. GCP Windows RDP Server: $31.51/month (15% of total cost)\nIncludes compute instance + Windows licensing Windows Server 2016 Datacenter licensing: $33.58/month additional Combined Windows costs: $65.09/month (31% of total) 3. Azure NAT Gateway: $32.85/month (16% of total cost)\nSimilar cost structure to AWS Demonstrates consistency in NAT Gateway pricing across providers Optimized Instances: Multiple small compute instances (e2-micro, t3.micro): $8.76 each Storage costs remain minimal: \u0026lt;$2/month per instance Network transfer costs are negligible due to low demo traffic Multi-Cloud Cost Comparison Key Insight: NAT Gateways represent the highest per-resource cost across all providers, highlighting the importance of network architecture decisions in multi-cloud deployments.\nConclusion These four use cases demonstrate why Cloudflare\u0026rsquo;s approach to Zero Trust represents more than just feature parity with traditional solutions—it\u0026rsquo;s a fundamental reimagining of how secure access should work.\nTraditional enterprise security stacks require separate tools for identity management, VPN access, privileged access management, session recording, and audit logging. Each tool adds complexity, integration challenges, and potential security gaps.\nCloudflare consolidates this entire stack into a unified platform where the same identity policies, the same audit trails, and the same security controls apply whether users are accessing SaaS applications, internal web services, SSH servers, or browser-based terminals.\nThe result isn\u0026rsquo;t just better security—it\u0026rsquo;s simpler operations, lower costs, and genuinely better user experience. Developers spend less time fighting access tools and more time building products.\nOutputs: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 AWS_EC2_INSTANCES = [ { \u0026#34;internal_ip\u0026#34; = \u0026#34;172.16.69.***\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflared-replica-aws-0\u0026#34; \u0026#34;public_ip_nat\u0026#34; = \u0026#34;63.178.77.46\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@172.16.69.*** -i modules/keys/out/aws_ssh_cloudflared_key_pair_0\u0026#34; \u0026#34;tunnel\u0026#34; = { \u0026#34;cf_tunnel_id\u0026#34; = \u0026#34;*********\u0026#34; \u0026#34;cf_tunnel_status\u0026#34; = \u0026#34;healthy\u0026#34; \u0026#34;cf_tunnel_version\u0026#34; = \u0026#34;2025.5.0\u0026#34; } }, { \u0026#34;internal_ip\u0026#34; = \u0026#34;172.16.69.***\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-aws\u0026#34; \u0026#34;public_ip_nat\u0026#34; = \u0026#34;63.178.77.46\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@172.16.69.*** -i modules/keys/out/aws_ssh_service_key_pair\u0026#34; }, ] AZURE_VMS = { \u0026#34;cloudflare-warp-connector-azure-0\u0026#34; = { \u0026#34;internal_ip\u0026#34; = \u0026#34;192.168.71.***\u0026#34; \u0026#34;public_dns\u0026#34; = \u0026#34;cloudflare-warp-connector-azure-0.westeurope.cloudapp.azure.com\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;4.185.47.5\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@warp_ip -i modules/keys/out/azure_ssh_key_pair_0\u0026#34; } \u0026#34;cloudflare-zero-trust-demo-azure-1\u0026#34; = { \u0026#34;internal_ip\u0026#34; = \u0026#34;192.168.71.***\u0026#34; \u0026#34;public_dns\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-azure-1.westeurope.cloudapp.azure.com\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;4.185.47.5\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@192.168.71.*** -i modules/keys/out/azure_ssh_key_pair_1\u0026#34; } } GCP_COMPUTE_INSTANCES = [ { \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.70.***\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-infrastructure-access-gcp\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;34.40.74.151\u0026#34; \u0026#34;tunnel\u0026#34; = { \u0026#34;cf_tunnel_id\u0026#34; = \u0026#34;*********\u0026#34; \u0026#34;cf_tunnel_status\u0026#34; = \u0026#34;healthy\u0026#34; \u0026#34;cf_tunnel_version\u0026#34; = \u0026#34;2025.5.0\u0026#34; } }, { \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.85.***\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-warp-connector-gcp-0\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;34.40.74.151\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@warp_ip -i modules/keys/out/gcp_vm_key_pair_0\u0026#34; }, { \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.85.***\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-gcp-1\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;34.40.74.151\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@10.156.85.*** -i modules/keys/out/gcp_vm_key_pair_1\u0026#34; }, { \u0026#34;gcp_windows_username\u0026#34; = \u0026#34;inod\u0026#34; \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.90.***\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;windows-rdp-server-gcp\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;34.40.74.151\u0026#34; \u0026#34;tunnel\u0026#34; = { \u0026#34;cf_tunnel_id\u0026#34; = \u0026#34;*********\u0026#34; \u0026#34;cf_tunnel_status\u0026#34; = \u0026#34;healthy\u0026#34; \u0026#34;cf_tunnel_version\u0026#34; = \u0026#34;2025.5.0\u0026#34; } }, ] MY_IP = { \u0026#34;IPv4\u0026#34; = \u0026#34;*********\u0026#34; } SSH_FOR_INFRASTRUCTURE_ACCESS = { \u0026#34;bob\u0026#34; = \u0026#34;ssh bob@10.156.70.***\u0026#34; \u0026#34;jose\u0026#34; = \u0026#34;ssh jose@10.156.70.***\u0026#34; \u0026#34;matthieu\u0026#34; = \u0026#34;ssh matthieu@10.156.70.***\u0026#34; } Lessons Learned Start simple, scale complexity gradually - Beginning with basic SSH access before advancing to browser-rendered terminals and RDP proved essential for troubleshooting and understanding the platform\nNetwork architecture drives costs - NAT Gateways represented 30%+ of total infrastructure costs across all providers, making network design optimization crucial for production deployments\nIdentity provider integration is foundational - Investing time in proper SCIM configuration and group mapping early prevents significant rework later as policies become more complex\nDocumentation beats automation - While Infrastructure as Code enables reproducibility, clear documentation of policy logic and troubleshooting steps proved more valuable for team collaboration\nMulti-cloud consistency requires discipline - Each cloud provider\u0026rsquo;s unique networking patterns and service names created maintenance overhead that requires careful module design and naming conventions\nTechnical remaining Challenges I am still facing the same issue as described in part 2. I have posted it on the Cloudflare Community here . Some issues still arise when dealing specifically with Azure resources (ressources not deleted, connection reset, etc\u0026hellip;) I still have not found a way to retrieve the WARP IP addresses to use them in my outputs.tf ssh command. Roadmap 📖 Use the Entra ID integration Use case for WARP Connector (Site-to-Site, Site-to-Internet\u0026hellip;) link to the documentation Observability use case with Datadog What\u0026rsquo;s Next? 👉 The code has been released on Github, check it out: https://github.com/macharpe I am sure the code can be refactored in a better way and enhanced to simplify it. I have posted it but I intend to maintain it (it is work in progress)\nThis isn\u0026rsquo;t just a demo-it\u0026rsquo;s a blueprint for modern security. Stay tuned to transform your Zero Trust strategy from concept to reality. 🔒\nLet me know if you\u0026rsquo;d like further refinements or if you have specific areas you want to expand!\n","permalink":"https://blog.macharpe.com/posts/zero-trust-demo-part-3/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eOver the past two posts (\u003ca href=\"/posts/zero-trust-demo-part-1/\"\u003eBuilding a Scalable Zero Trust Demo environment with Cloudflare and Terraform (Part 1)\u003c/a\u003e\n and \u003ca href=\"/posts/zero-trust-demo-part-2/\"\u003eAutomating Cloudflare Zero Trust at Scale: Terraform, Multi-Cloud, and Identity (Part 2)\u003c/a\u003e\n), we\u0026rsquo;ve explored the foundations of building a scalable Zero Trust demo environment and how to automate its deployment with Cloudflare and Terraform. In Part 1, we laid the groundwork by designing a robust, modular Zero Trust architecture. Part 2 took things further, demonstrating how to streamline and scale this setup using Infrastructure as Code principles. Now, in part 3 (final part), we will explore advanced use cases you can demonstrate with this environment.\u003c/p\u003e","title":"Zero Trust for Real-World Scenarios: Use Cases and Extensions (Part 3 - Final)"},{"content":"Introduction In Part 1 , we demonstrated how Terraform can streamline reproducible security configurations. In this follow-up, I\u0026rsquo;ll show how to extend those principles across AWS, Azure, and GCP using Cloudflare Zero Trust. You\u0026rsquo;ll see how the project\u0026rsquo;s modular structure, automation, and dynamic routing reduce manual security tasks by up to 80%—based on my own benchmarks.\nWhat\u0026rsquo;s new since Part 1: Custom subnets and improved network segmentation Automated device profiles and dynamic WARP routing Expanded multi-cloud support with updated diagrams Terraform code is now 4100+ lines of code, 87 files and 21 directories (even if the quantity does not mean quality!) with 143 resources Let\u0026rsquo;s dive into the updated architecture and key modules powering this environment.\nArchitecture Breakdown Key Components Automation and code versioning: Github, Terraform and VSCode Cross-cloud tunnel setups: GCP, AWS and Azure Multi-OS Cloudflare Agent setup: Cloudflare WARP Identity provider configurations: Okta, Azure AD (Entra ID) and Google Cloudflare Zero-trust Platform Security Policies: centralized management including posture, group membership, etc\u0026hellip; SaaS app integrations (Meraki, Salesforce, etc\u0026hellip;) Access App: Cloudflare AppLauncher Observability: Datadog The setup integrates automation (GitHub, Terraform), cross-cloud tunnels (AWS, Azure, GCP), multi-OS Cloudflare WARP agents, multiple identity providers (Okta, Azure AD, Google), centralized security policies, SaaS app integrations, Cloudflare AppLauncher, and observability with Datadog.\nNow that we have a high-level sense of what components are part of the project, let us delve into the code.\nProject Structure Overview A robust automation project demands not just effective code, but a clear, maintainable structure. From the outset, this Terraform repository was designed for modularity, reusability, and collaboration across teams.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ├── modules/ │ ├── azure/ # Azure AD │ ├── cloudflare/ │ ├── keys/ │ ├── okta/ │ └── warp-routing/ # Dynamic subnet calculation ├── scripts/ ├── doc/ # documentation ├── variables.tf ├── main.tf ├── outputs.tf ├── provider.tf ├── vm-aws-instance.tf ├── vm-azure-instance.tf ├── vm-gcp-instance.tf └── ... Why This Structure? Separation of Concerns: Each cloud provider and major function (identity, networking, security) is isolated in its own module. This keeps codebases clean, reduces merge conflicts, and accelerates onboarding for new contributors. Reusability: Modules can be reused across environments (dev, staging, prod) or even in other projects, simply by adjusting variables and wiring. Scalability: As the environment grows-adding more SaaS integrations, tunnels, or regions-the structure supports incremental change without major refactoring. Documentation-Driven: The /doc directory contains up-to-date architecture diagrams and dependency graphs, ensuring that the \u0026ldquo;why\u0026rdquo; behind each component is as accessible as the \u0026ldquo;how.\u0026rdquo; (contains excalidraw diagram as well as mermaid graph) N.B.: I have decided to focus this blog post on two modules, namely cloudflare module and warp-routing modules and also talk a bit about the VM creation because they are at the heart of the project for the first one and the second one is a neat way to deal with routing (more on that later)\nBefore initializing your Terraform project, you need a good way to store the different API keys that we are going to be using.\nEnvironment variables All the API keys and secret are stored in environment variable. More over, I am using direnv to have environment variables pertaining to this particular project loaded when I browse to the project folder. It is a very neat way to declutter your .profile.\nAdvantages of using environment variables Avoids Hardcoding Sensitive Data: Storing API keys directly in Terraform configuration files or version control exposes them to anyone with access to those files. Environment variables keep secrets out of source code, reducing the risk of accidental leaks. Obfuscation: Environment variables help obfuscate sensitive values, making it harder for unauthorized users to access API keys just by reading configuration files. Integration with CI/CD: Environment variables are easily managed in continuous integration and deployment pipelines, where secrets can be injected securely at runtime without being stored in code repositories. Terraform Variable Precedence: Terraform supports passing variable values through environment variables using the TF_VAR_ prefix (e.g., TF_VAR_api_key). This method is prioritized after CLI arguments and .tfvars files, but before prompting the user, making it a robust and convenient option for secret injection. Seamless Environment Switching: Environment variables allow you to easily switch between development, staging, and production environments without modifying configuration files or maintaining separate versions for each environment. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # Terraform Zero Trust Project # Cloudflare export TF_VAR_cloudflare_api_key=\u0026#34;\u0026#34; export TF_VAR_cloudflare_account_id=\u0026#34;\u0026#34; export TF_VAR_cloudflare_email=\u0026#34;\u0026#34; export TF_VAR_cloudflare_zone_id=\u0026#34;\u0026#34; # AWS export AWS_ACCESS_KEY_ID=\u0026#34;\u0026#34; export AWS_SECRET_ACCESS_KEY=\u0026#34;\u0026#34; # Azure export TF_VAR_azure_tenant_id=\u0026#34;\u0026#34; export TF_VAR_azure_subscription_id=\u0026#34;\u0026#34; # Google export GOOGLE_APPLICATION_CREDENTIALS=\u0026#34;path_to_your_json_file\u0026#34; export TF_VAR_gcp_project_id=\u0026#34;\u0026#34; # Okta export TF_VAR_okta_org_name=\u0026#34;\u0026#34; export TF_VAR_okta_api_token=\u0026#34;\u0026#34; # Datadog export TF_VAR_datadog_api_key=\u0026#34;\u0026#34; Key Take-aways All sensitive API keys and secrets are stored as environment variables, managed via direnv for project-specific loading. This approach avoids hardcoding secrets, integrates well with CI/CD, supports seamless environment switching, and leverages Terraform\u0026rsquo;s variable precedence for secure secret injection. Now that we have our environment variables defined, let\u0026rsquo;s see how we programmatically declare our workloads (VMs).\nVM Creation: Secure provisioning across Cloud This section details how VMs are securely provisioned in AWS, Azure, and GCP using Terraform, with a focus on SSH key management, cloud-init automation, and least-privilege networking.\nSSH Key Generation \u0026amp; Injection Terraform programmatically generates and injects SSH keys using provider-specific methods while avoiding hardcoded secrets:\nAWS Example (vm-aws-instance.tf)\n1 2 3 4 5 6 7 8 9 10 # Centralized SSH key module manages key pairs module \u0026#34;ssh_keys\u0026#34; { source = \u0026#34;./modules/keys\u0026#34; } # AWS key pair resource resource \u0026#34;aws_key_pair\u0026#34; \u0026#34;aws_ec2_service_key_pair\u0026#34; { key_name = \u0026#34;aws_ssh_service\u0026#34; public_key = module.ssh_keys.aws_ssh_service_public_key # Centralized key from module } Cloud-Init Templating Cloud-init configurations are dynamically populated using Terraform variables for cross-cloud consistency:\nAWS Cloudflared Init (scripts/aws-cloudflared-init.yaml):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #cloud-config hostname: ${hostname} package_update: true package_upgrade: true packages: - wget - curl - traceroute - build-essential - hping3 - net-tools - unzip runcmd: - sudo mkdir -p --mode=0755 /usr/share/keyrings - curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | sudo tee /usr/share/keyrings/cloudflare-main.gpg \u0026gt;/dev/null - echo \u0026#39;deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared any main\u0026#39; | sudo tee /etc/apt/sources.list.d/cloudflared.list - sudo apt-get update \u0026amp;\u0026amp; sudo apt-get install cloudflared - sudo cloudflared service install ${tunnel_secret_aws} - sudo timedatectl set-timezone Europe/Paris # Datadog Agent installation - \u0026#39;echo \u0026#34;DD_API_KEY=${datadog_api_key} DD_SITE=${datadog_region}\u0026#34; \u0026gt; /tmp/dd_env.log\u0026#39; - \u0026#39;DD_API_KEY=${datadog_api_key} DD_SITE=${datadog_region} bash -c \u0026#34;$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)\u0026#34; \u0026gt; /tmp/dd_install.log 2\u0026gt;\u0026amp;1\u0026#39; Terraform Variable Injection:\n1 2 3 4 5 6 user_data = templatefile(\u0026#34;${path.module}/scripts/aws-cloudflared-init.yaml\u0026#34;, { hostname = \u0026#34;${var.aws_ec2_cloudflared_name}-${count.index}\u0026#34; tunnel_secret_aws = module.cloudflare.aws_extracted_token datadog_api_key = var.datadog_api_key datadog_region = var.datadog_region }) This ensures secure secret handling while maintaining reusable templates.\nSecurity Group/Firewall Rules Zero-trust networking is enforced through provider-specific security configurations:\nCloud SSH Access ICMP Egress Unique Feature AWS Restricted to user IP + Cloudflared SG Allowed from user IP Full outbound Layered SG for tunnel/service separation Azure NSG rules limited to user IP Restricted to user IP Full outbound Warp connector VM with custom init GCP Firewall rule with target tags Restricted to user IP Block SSH egress Ephemeral instances via preemptible scheduling GCP Firewall Example (vm-gcp-instance.tf):\n1 2 3 4 5 6 7 8 resource \u0026#34;google_compute_firewall\u0026#34; \u0026#34;allow_ssh_from_my_ip\u0026#34; { allow { protocol = \u0026#34;tcp\u0026#34; ports = [\u0026#34;22\u0026#34;] } source_ranges = [\u0026#34;${data.http.my_ip.response_body}/32\u0026#34;] # Dynamic IP restriction target_tags = [\u0026#34;ssh-cf-tunnel-only\u0026#34;] # Tag-based enforcement } Critical Design Choice: All providers block default SSH access except from the user\u0026rsquo;s current IP and authorized tunnel components\nCross-Cloud Consistency The implementation achieves security parity through:\nCentralized SSH Key Modules Dynamic Cloud-Init Templating IP Restriction Patterns Tag-Based Firewalling This ensures identical security posture whether deploying to AWS, Azure, or GCP while respecting each provider\u0026rsquo;s native tooling.\nKey Take-aways VMs are provisioned in AWS, Azure, and GCP with the necessary SSH keys and required agents/software installed at boot. This enables secure, automated access and management across all cloud environments. Now let us dive into the Cloudflare module that powers the whole setup.\nCloudflare Module This is the cloudflare module structure with terraform files being named to be self explanatory (I hope):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 └── cloudflare ├── cloudflare-app-policies.tf ├── cloudflare-apps.tf ├── cloudflare-gateway-policy.tf ├── cloudflare-tags.tf ├── cloudflared-tunnel-main.tf ├── device-profiles.tf ├── dns-records.tf ├── outputs.tf ├── provider.tf ├── rule-groups.tf ├── scripts │ └── latest_osx_version_posture.sh ├── short-lived-certificate.tf └── variables.tf If we have a look at cloudflared-tunnel-main.tf, the creation of the tunnel is straight forward:\n1 2 3 4 5 6 7 8 9 10 11 12 13 #===================================== # GCP Tunnel #===================================== resource \u0026#34;cloudflare_zero_trust_tunnel_cloudflared\u0026#34; \u0026#34;gcp_cloudflared_tunnel\u0026#34; { account_id = var.cloudflare_account_id name = var.cf_gcp_tunnel_name config_src = \u0026#34;cloudflare\u0026#34; } data \u0026#34;cloudflare_zero_trust_tunnel_cloudflared_token\u0026#34; \u0026#34;gcp_tunnel_cloudflared_token\u0026#34; { account_id = var.cloudflare_account_id tunnel_id = cloudflare_zero_trust_tunnel_cloudflared.gcp_cloudflared_tunnel.id } We retrieve the token for authentication so that it can be passed on to the VM to initiate the connection towards Cloudflare. Now that we\u0026rsquo;ve defined the tunnel, let\u0026rsquo;s see how authentication is handled.\nAs part of the SSH Access for Infrastructure use case, we need to generate a Cloudflare SSH Certificate Authority (CA) and this can be done (currently) only via API. I have integrated this API call in the Terraform code and store it into a \u0026ldquo;local\u0026rdquo; called gateway_ca_certificate that is, in turns, passed on to on output to be consumed elsewhere. This is very handy because we need this as part of the initialization script for the VM supporting the SSH Infrastructure Access use case.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 #====================================================== # Short lived Certificate CA for Infrastructure Access #====================================================== locals { gateway_ca_certificate = jsondecode(data.http.short_lived_cloudflare_ssh_ca.response_body) } data \u0026#34;http\u0026#34; \u0026#34;short_lived_cloudflare_ssh_ca\u0026#34; { url = \u0026#34;https://api.cloudflare.com/client/v4/accounts/${var.cloudflare_account_id}/access/gateway_ca\u0026#34; request_headers = { \u0026#34;X-Auth-Email\u0026#34; = var.cloudflare_email \u0026#34;X-Auth-Key\u0026#34; = var.cloudflare_api_key \u0026#34;Content-Type\u0026#34; = \u0026#34;application/json\u0026#34; } } [...] output \u0026#34;gateway_ca_certificate\u0026#34; { description = \u0026#34;The Cloudflare Gateway CA certificate\u0026#34; value = local.gateway_ca_certificate.result.public_key sensitive = true } Then let see what the Infrastructure App looks like in Terraform since this is the most interesting use case here.\nInfrastructure Access Application First, we define the Target and the Application (type = \u0026ldquo;infrastructure\u0026rdquo;)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 #=============================== # Access for Infrastructure App #=============================== # Creating the Target resource \u0026#34;cloudflare_zero_trust_access_infrastructure_target\u0026#34; \u0026#34;ssh-gcp-instance\u0026#34; { account_id = var.cloudflare_account_id hostname = var.cf_target_name ip = { ipv4 = { ip_addr = var.gcp_vm_internal_ip } } } # Creating the infrastructure Application resource \u0026#34;cloudflare_zero_trust_access_application\u0026#34; \u0026#34;ssh_gcp_infrastructure\u0026#34; { account_id = var.cloudflare_account_id type = \u0026#34;infrastructure\u0026#34; name = var.cf_infra_app_name logo_url = \u0026#34;https://upload.wikimedia.org/wikipedia/commons/0/01/Google-cloud-platform.svg\u0026#34; tags = [cloudflare_zero_trust_access_tag.zero_trust_demo_tag.name] session_duration = \u0026#34;1h\u0026#34; target_criteria = [{ port = \u0026#34;22\u0026#34;, protocol = \u0026#34;SSH\u0026#34; target_attributes = { hostname = [var.cf_target_name] }, }] policies = [{ ...see below }] } We associate the target with var.gcp_vm_internal_ip which represents the private IP address of the GCP VM. Then we specify the port (22) and the protocol (SSH).\nOnce we have done the definition of Application, we need to define a policy to access it.\nPolicy Definition This is how we define a policy for the Infrastructure application (\u0026ldquo;ssh-gcp-instance\u0026rdquo;).\nYou will be able to access the app if you meet one of the following criteria (include block): (1) you belong to saml group okta_infrastructureadmin_saml_group_name or (2) you belong to saml group okta_contractors_saml_group_name or (3) you have an email associated with domain cf_email_domain. You will require (require block) to: (1) have the WARP Client in Gateway mode (device_posture = {integration_uid = var.cf_gateway_posture_id}) (2) use an authentication method including \u0026ldquo;MFA\u0026rdquo; (3) use an authentication method excluding \u0026ldquo;SMS\u0026rdquo; The cf_email_domain is useful, especially if you have contractors which do not have a user definition in your Identity Provider (in this example Okta)\nFinally you setup a connection_rule. In the example below, I allow users to ssh into the machine with their email prefix (e.g. my email is bob@macharpe.com , then I can login as \u0026ldquo;bob\u0026rdquo;)\nN.B.: Cloudflare will not create new users on the target. UNIX users must already be present on the server (source )\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 policies = [{ name = \u0026#34;SSH GCP Infrastructure Policy\u0026#34; decision = \u0026#34;allow\u0026#34; allowed_idps = [var.cf_okta_identity_provider_id] auto_redirect_to_identity = true allow_authenticate_via_warp = false include = [ { saml = { identity_provider_id = var.cf_okta_identity_provider_id attribute_name = \u0026#34;groups\u0026#34; attribute_value = var.okta_infrastructureadmin_saml_group_name } }, { saml = { identity_provider_id = var.cf_okta_identity_provider_id attribute_name = \u0026#34;groups\u0026#34; attribute_value = var.okta_contractors_saml_group_name } }, { email_domain = { domain = var.cf_email_domain } } ] require = [ { device_posture = { integration_uid = var.cf_gateway_posture_id } }, { auth_method = { auth_method = \u0026#34;mfa\u0026#34; } } ] exclude = [ { auth_method = { auth_method = \u0026#34;sms\u0026#34; } } ] connection_rules = { ssh = { allow_email_alias = true usernames = [] # None } } }] N.B.: there is currently a technical limitation, Infrastructure Access Application do not support \u0026ldquo;reusable policy\u0026rdquo; and therefore this policy is defined within the app definition.\nTalking about a reusable policy (which can be applied to as many application as you want), here is an example of the Terraform definition of one (e.g. Salesforce Policy)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 #======================= # POLICY for Salesforce #======================= resource \u0026#34;cloudflare_zero_trust_access_policy\u0026#34; \u0026#34;salesforce_policy\u0026#34; { account_id = var.cloudflare_account_id decision = \u0026#34;allow\u0026#34; name = \u0026#34;Salesforce Policy\u0026#34; session_duration = \u0026#34;0s\u0026#34; include = [ { group = { id = cloudflare_zero_trust_access_group.sales_rule_group.id } }, { group = { id = cloudflare_zero_trust_access_group.sales_engineering_rule_group.id } } ] require = [ { device_posture = { integration_uid = var.cf_gateway_posture_id } }, { group = { id = cloudflare_zero_trust_access_group.country_requirements_rule_group.id } }, { group = { id = cloudflare_zero_trust_access_group.latest_os_version_requirements_rule_group.id } }, { auth_method = { auth_method = \u0026#34;mfa\u0026#34; } } ] } In the particular policy we make use of \u0026ldquo;Rule Groups\u0026rdquo;: country_requirements_rule_group and latest_os_version_requirements_rule_group\nThe first one sets the location from which you can access the application (include) but also countries from which you may not login (exclude). Below is the code.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 resource \u0026#34;cloudflare_zero_trust_access_group\u0026#34; \u0026#34;country_requirements_rule_group\u0026#34; { account_id = var.cloudflare_account_id name = \u0026#34;Country Requirements\u0026#34; include = [ { geo = { country_code = \u0026#34;FR\u0026#34; } }, { geo = { country_code = \u0026#34;DE\u0026#34; } }, { geo = { country_code = \u0026#34;US\u0026#34; } }, { geo = { country_code = \u0026#34;GB\u0026#34; } } ] exclude = [ { geo = { country_code = \u0026#34;CN\u0026#34; } }, { geo = { country_code = \u0026#34;RU\u0026#34; } } ] } Key Take-aways The Cloudflare module manages Zero Trust tunnels, device profiles, DNS records, policies, and short-lived SSH CA certificates via Terraform. Infrastructure Access Applications are defined for secure SSH access, with policies enforcing group membership, device posture, and MFA. Policies can be application-specific or reusable (e.g., for Salesforce), and leverage rule groups for granular access control (such as country or OS version restrictions. One thing needs a bit more explanation here is how do we programmatically defined the routes in the WARP client so that connections to workloads are routed through Cloudflare, not locally? This is what the warp-routing module is designed for.\nwarp-routing module The warp-routing module is probably the smartest one of the setup. Essentially, it consists of 3 python scripts:\n1 2 3 4 5 6 └── modules └── warp-routing └── scripts ├── generate_subnets_aws.py ├── generate_subnets_azure.py └── generate_subnets_gcp.py These scripts were inspired by Cloudflare documentation itself (source ), there is a tool to calculate which subnets to exclude.\nThe script gets a Private Subnet (e.g. 10.156.70.0/24) as an input. The script infers the corresponding RFC1918 subnet to which the input belongs (e.g. 10.156.70.0/24 belongs to 10.0.0.0/8) The script calculates all the subnets belonging to the base_network (e.g. 10.0.0.0/8) but excluding the subnet that we have as an input (e.g. 10.156.70.0/24) This will turn out to be very useful to programmatically update the routes of the WARP clients to make sure that this subnet (e.g. 10.156.70.0/24) is not routed locally but instead sent to Cloudflare so it can eventually reach the VM.\nN.B: by default Cloudflare exclude all RFC1918 networks from being routed through the WARP Client.\nHere is a snippet of the output file generated in json format:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 { \u0026#34;metadata\u0026#34;: { \u0026#34;generated_at\u0026#34;: \u0026#34;2025-05-19T19:05:19.088335\u0026#34;, \u0026#34;script_version\u0026#34;: \u0026#34;1.1\u0026#34;, \u0026#34;input_received\u0026#34;: \u0026#34;10.156.70.0/24\u0026#34;, \u0026#34;normalized_exclusion\u0026#34;: \u0026#34;10.156.70.0/24\u0026#34;, \u0026#34;base_network\u0026#34;: \u0026#34;10.0.0.0/8\u0026#34; }, \u0026#34;exclusions\u0026#34;: [ { \u0026#34;address\u0026#34;: \u0026#34;10.0.0.0/9\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;GCP Excluded Subnet 1\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;calculated\u0026#34; }, { \u0026#34;address\u0026#34;: \u0026#34;10.192.0.0/10\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;GCP Excluded Subnet 2\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;calculated\u0026#34; }, { \u0026#34;address\u0026#34;: \u0026#34;10.160.0.0/11\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;GCP Excluded Subnet 3\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;calculated\u0026#34; }, [...] ], \u0026#34;validation\u0026#34;: { \u0026#34;rfc1918_compliant\u0026#34;: true, \u0026#34;complete_coverage\u0026#34;: true } } warp_routing_subnets_calculation.tf In Terraform, I execute the script and I, then, declare a resource \u0026ldquo;local_file\u0026rdquo; that I will be able to reuse elsewhere. This is the terraform file warp_routing_subnets_calculation.tf calling these scripts (example for AWS):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #============================================= # SUBNET CALCULATION FOR WARP ROUTING AWS VMs #============================================= resource \u0026#34;null_resource\u0026#34; \u0026#34;python_script_gcp_infrastructure\u0026#34; { provisioner \u0026#34;local-exec\u0026#34; { command = \u0026#34;python3 ${path.root}/modules/warp-routing/scripts/generate_subnets_gcp.py ${var.gcp_ip_cidr_range}\u0026#34; } triggers = { script_hash = filesha256(\u0026#34;${path.module}/scripts/generate_subnets_gcp.py\u0026#34;) } } data \u0026#34;local_file\u0026#34; \u0026#34;gcp_subnet_output\u0026#34; { filename = \u0026#34;${path.root}/modules/warp-routing/output/warp_subnets_including_all_except_gcp_internal_subnet.json\u0026#34; depends_on = [null_resource.python_script_gcp_infrastructure] } Now that we have generated the json file, let\u0026rsquo;s see how we are making use of it to define custom device profiles\ndevice-profiles.tf In the device-profiles.tf located in module/cloudflare, we need to\nRead the default_profile so that we can retrieve the default excluded routers Define where the json generated files are located Ensure that the Terraform does not intend to read them before the scripts have run (this is done via: depends_on = [var.cf_aws/gcp/azure_json_subnet_generation]) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 data \u0026#34;cloudflare_zero_trust_device_default_profile\u0026#34; \u0026#34;main\u0026#34; { account_id = var.cloudflare_account_id } data \u0026#34;local_file\u0026#34; \u0026#34;azure_config\u0026#34; { filename = \u0026#34;${path.root}/modules/warp-routing/output/warp_subnets_including_all_except_azure_internal_subnet.json\u0026#34; depends_on = [var.cf_azure_json_subnet_generation] # Reference the null_resource from the warp-routing module } data \u0026#34;local_file\u0026#34; \u0026#34;gcp_config\u0026#34; { filename = \u0026#34;${path.root}/modules/warp-routing/output/warp_subnets_including_all_except_gcp_internal_subnet.json\u0026#34; depends_on = [var.cf_gcp_json_subnet_generation] # Reference the null_resource from the warp-routing module } data \u0026#34;local_file\u0026#34; \u0026#34;aws_config\u0026#34; { filename = \u0026#34;${path.root}/modules/warp-routing/output/warp_subnets_including_all_except_aws_internal_subnet.json\u0026#34; depends_on = [var.cf_aws_json_subnet_generation] # Reference the null_resource from the warp-routing module } Once that\u0026rsquo;s done we define a local {} which is going to build the final_exclude_routes which is going to:\nExclude all default RFC1918 subnets which have been infered by the script Include all the routes in these RFC1918 subnets except the ones to which the different VMs belong to. This is achieved via for loops and here is the final_exclude_routes.\n1 2 3 4 5 6 7 8 # Final merged configuration final_exclude_routes = merge( local.default_routes, # Base routes local.azure_routes, # Azure exceptions local.gcp_routes, # GCP exceptions local.aws_routes, # AWS exceptions local.custom_cgnat_map # Custom CGNAT ranges ) Now that we have the final_exclude_routes, we can use it in the definition of custom device profiles as per the below example (I have purposefully remove some part of the definition for readability\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #============================== # Customized profile for WARP #============================== resource \u0026#34;cloudflare_zero_trust_device_custom_profile\u0026#34; \u0026#34;client_custom_route_profile\u0026#34; { account_id = var.cloudflare_account_id enabled = true name = \u0026#34;Zero-Trust demo local laptop (mac)\u0026#34; description = \u0026#34;This profile is for the local laptop (running macos) for my zero-trust demo\u0026#34; precedence = random_integer.client_precedence.result match = \u0026#34;os.name == \\\u0026#34;${var.cf_device_os}\\\u0026#34;\u0026#34; [...] support_url = \u0026#34;Zero-TrustDemo-LaptopProfile\u0026#34; # Exclude routes configuration exclude = [for route in values(local.final_exclude_routes) : { address = route.address description = route.description }] [...] } In the exclude section we use a for loop to circle through the final_exclude_routes\nN.B.1: you have noted that the support_url is equal to \u0026ldquo;Zero-TrustDemo-LaptopProfile\u0026rdquo;. This is very handy to check which profile is applied to your device (I used the tip shared here ).\nEssentially you can issue \u0026ldquo;warp-cli settings support-url\u0026rdquo; to know which profile is being applied (example below on my local laptop)\n1 2 macharpe@macharpe-mac:~ % warp-cli settings support-url macOSProfile N.B.2: To set the precedence, I have used a random integer between 0 and 99 to ensure that these profiles will super-seed whatever custom device profil was created\nOnce you have run the Terraform code, you will see 3 new device profiles under Settings \u0026gt; Warp Client \u0026gt; Device Settings:\nAnd you can look at the Split Tunnels section \u0026gt; Exclude IPs and domains \u0026gt; Manage to see the results\nKey Take-aways The warp-routing module uses Python scripts to dynamically calculate subnet exclusions for each cloud, ensuring only desired private IP ranges are routed through Cloudflare WARP. This allows programmatic, accurate split-tunneling configuration, improving security and connectivity. Device profiles are created in Terraform to enforce these routing rules, with precedence set to ensure they override other profiles. We have covered quite a bit of ground so far, let us see what it looks like in action.\nIn Action This is a video showing the full terraform apply and some screenshots\nYour browser doesn't support HTML5 video. You can download the video instead.\nFinal Output This is what the final output looks like. It makes it easy to ssh into the different workloads. Plus it gives you some information about the tunnel status and version.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 Apply complete! Resources: 143 added, 0 changed, 0 destroyed. Outputs: AWS_EC2_INSTANCES = [ { \u0026#34;internal_ip\u0026#34; = \u0026#34;172.16.69.72\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflared-replica-aws-0\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;3.75.62.84\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@172.16.69.72 -i modules/keys/out/aws_ssh_cloudflared_key_pair_0\u0026#34; \u0026#34;tunnel\u0026#34; = { \u0026#34;cf_tunnel_id\u0026#34; = \u0026#34;8ebd931c-019b-4e42-bace-30a807ce3413\u0026#34; \u0026#34;cf_tunnel_status\u0026#34; = \u0026#34;healthy\u0026#34; \u0026#34;cf_tunnel_version\u0026#34; = \u0026#34;2025.5.0\u0026#34; } }, { \u0026#34;internal_ip\u0026#34; = \u0026#34;172.16.69.78\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-aws\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;3.75.62.84\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@172.16.69.78 -i modules/keys/out/aws_ssh_service_key_pair\u0026#34; }, ] AZURE_VMS = { \u0026#34;cloudflare-warp-connector-azure-0\u0026#34; = { \u0026#34;internal_ip\u0026#34; = \u0026#34;192.168.71.4\u0026#34; \u0026#34;public_dns\u0026#34; = \u0026#34;cloudflare-warp-connector-azure-0.westeurope.cloudapp.azure.com\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;72.144.30.23\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh az-admin@72.144.30.23 -i modules/keys/out/azure_ssh_key_pair_0\u0026#34; } \u0026#34;cloudflare-zero-trust-demo-azure-1\u0026#34; = { \u0026#34;internal_ip\u0026#34; = \u0026#34;192.168.71.5\u0026#34; \u0026#34;public_dns\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-azure-1.westeurope.cloudapp.azure.com\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;135.220.17.236\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh az-admin@135.220.17.236 -i modules/keys/out/azure_ssh_key_pair_1\u0026#34; } } GCP_COMPUTE_INSTANCES = [ { \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.70.2\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-infrastructure-access-gcp\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;34.159.92.67\u0026#34; \u0026#34;tunnel\u0026#34; = { \u0026#34;cf_tunnel_id\u0026#34; = \u0026#34;901c7e10-5352-4e0d-bed8-7c4493b04d07\u0026#34; \u0026#34;cf_tunnel_status\u0026#34; = \u0026#34;healthy\u0026#34; \u0026#34;cf_tunnel_version\u0026#34; = \u0026#34;2025.5.0\u0026#34; } }, { \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.70.4\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-gcp-0\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;35.234.78.229\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@35.234.78.229 -i modules/keys/out/gcp_vm_key_pair_0\u0026#34; }, { \u0026#34;internal_ip\u0026#34; = \u0026#34;10.156.70.3\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;cloudflare-zero-trust-demo-gcp-1\u0026#34; \u0026#34;public_ip\u0026#34; = \u0026#34;34.40.56.2\u0026#34; \u0026#34;ssh\u0026#34; = \u0026#34;ssh ubuntu@34.40.56.2 -i modules/keys/out/gcp_vm_key_pair_1\u0026#34; }, ] MY_IP = { \u0026#34;IPv4\u0026#34; = \u0026#34;xxx.xxx.xxx.xxx\u0026#34; (obfuscated) } SSH_COMMAND = { \u0026#34;bob\u0026#34; = \u0026#34;ssh bob@10.156.70.2\u0026#34; \u0026#34;jose\u0026#34; = \u0026#34;ssh jose@10.156.70.2\u0026#34; \u0026#34;matthieu\u0026#34; = \u0026#34;ssh matthieu@10.156.70.2\u0026#34; } Conclusion This project demonstrates how to build a production-ready Zero Trust environment across multiple clouds using Terraform. Key takeaways:\nSecurity First Least privilege access + short-lived credentials + MFA enforcement Modularity Wins Provider-specific modules enable easy cross-cloud expansion Documentation Matters Clear architecture diagrams and variable descriptions accelerated onboarding Automate Everything Cloud-init scripts and Terraform modules reduced manual configuration errors The complete codebase and documentation are available in the project repository. For a hands-on demo, deploy the environment using the provided terraform.tfvars.example as a template.\nLessons Learned Multi-Cloud Complexity Maintaining consistent security policies across AWS/Azure/GCP required careful coordination of security group rules and IAM roles. Zero Trust Tradeoffs While Cloudflare\u0026rsquo;s device posture checks add security, they introduced initial complexity in tunnel token management. Terraform Limitations Azure Warp Connector required manual UI setup due to Terraform provider limitations, highlighting the importance of hybrid automation approaches. Testing Challenges Implementing terraform apply rollbacks for failed multi-cloud deployments required careful state file management. Technical challenges I am still facing some technical challenges with the custom device profile resource in Terraform, probably due to Cloudflare API. Below is the error I get when I \u0026ldquo;terraform apply\u0026rdquo; for the second time. It looks close to this and this . 1 [{\u0026#34;code\u0026#34;:2004,\u0026#34;message\u0026#34;:\u0026#34;bad device request\u0026#34;}] I have not found a way to retrieve the CGNAT (Carried-Grade NAT) IP assigned to WARP Client when you have enable Override local interface IP in Cloudflare UI. I am still facing challenges while using Microsoft Azure API (specifically while destroying the setup: terraform destroy). There is currently no terraform resource to create a WARP Connector tunnel which is expected as this is in beta. Roadmap Use the Entra ID integration Use case for WARP Connector (Site-to-Site, Site-to-Internet\u0026hellip;) link to the documentation SaaS Application in Cloudflare Access managed by Terraform Observability use case with Datadog What\u0026rsquo;s Next Looking ahead, Part 3 will explore advanced use cases you can demonstrate with this environment.\n👉 Follow me on GitHub to get notified when I release the code: https://github.com/macharpe This isn\u0026rsquo;t just a demo-it\u0026rsquo;s a blueprint for modern security. Stay tuned to transform your Zero Trust strategy from concept to reality. 🔒\nThis is Part 2 of a 3-part series on building scalable Zero Trust demo environments.\n","permalink":"https://blog.macharpe.com/posts/zero-trust-demo-part-2/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn \u003ca href=\"https://blog.macharpe.com/posts/zero-trust-demo-part-1/\"\n   target=\"_blank\" rel=\"noopener noreferrer\"\u003ePart 1\u003c/a\u003e\n, we demonstrated how Terraform can streamline reproducible security configurations. In this follow-up, I\u0026rsquo;ll show how to extend those principles across AWS, Azure, and GCP using Cloudflare Zero Trust. You\u0026rsquo;ll see how the project\u0026rsquo;s modular structure, automation, and dynamic routing reduce manual security tasks by up to 80%—based on my own benchmarks.\u003c/p\u003e\n\u003ch3 id=\"whats-new-since-part-1\"\u003eWhat\u0026rsquo;s new since Part 1:\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eCustom subnets and improved network segmentation\u003c/li\u003e\n\u003cli\u003eAutomated device profiles and dynamic WARP routing\u003c/li\u003e\n\u003cli\u003eExpanded multi-cloud support with updated diagrams\u003c/li\u003e\n\u003cli\u003eTerraform code is now 4100+ lines of code, 87 files and 21 directories (even if the quantity does not mean quality!) with 143 resources\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eLet\u0026rsquo;s dive into the updated architecture and key modules powering this environment.\u003c/p\u003e","title":"Automating Cloudflare Zero Trust at Scale: Terraform, Multi-Cloud, and Identity (Part 2)"},{"content":" Disclaimer: This article reflects my personal views and experiences and does not represent the official stance of Cloudflare. It is not an official Cloudflare tutorial or documentation. The project discussed is a personal initiative created independently.\nIntroduction As a Solutions Engineer at Cloudflare, I frequently work with customers exploring Zero Trust security solutions. While Cloudflare offers a 50-user free tier perfect for initial testing, I identified a gap: there was no simple, scalable way to quickly demonstrate the full power of Cloudflare\u0026rsquo;s Zero Trust platform in a controlled demo environment.\nThis led me to create an automated demo infrastructure using Terraform that showcases Cloudflare\u0026rsquo;s capabilities while being easy to deploy, customize, and tear down. In this multi-part series, I\u0026rsquo;ll walk you through how I built this solution.\nWhy This Matters The Challenge\nSales Engineers and Solutions Architects often need to:\nQuickly spin up demo environments Showcase multiple security features simultaneously Customize demos for specific customer use cases Manage costs while maintaining realistic scenarios The Solution\nA fully automated, Infrastructure-as-Code approach that:\nDeploys in minutes, not hours Scales from 10 to 10,000 users Demonstrates real-world Zero Trust architectures Costs pennies per demo (with automatic cleanup) Genesis of the Project This project emerged from a common frustration: traditional demo setups were:\nTime-consuming to configure Difficult to reproduce consistently Hard to customize for specific scenarios Expensive to maintain By leveraging Terraform and Cloudflare\u0026rsquo;s APIs, I created a solution that addresses all these pain points.\nArchitecture The demo environment consists of several key components working together:\nCloudflare Zero Trust Core: Gateway, Access, WARP client management Identity Integration: Automated user provisioning with various IdPs Application Layer: Sample web applications protected by Cloudflare Access Monitoring \u0026amp; Analytics: Built-in logging and analytics dashboards Infrastructure as Code: Complete Terraform configuration for reproducibility Requirements Before getting started, you\u0026rsquo;ll need:\nA Cloudflare account (Free tier works, but paid features shown in advanced demos) Terraform installed (v1.0+) Basic understanding of Zero Trust concepts API tokens with appropriate permissions (Optional) AWS, GCP, or Azure account for application hosting Key Components 1. Terraform Modules The project is organized into reusable Terraform modules:\n1 2 3 4 5 6 7 8 9 terraform-zero-trust-demo/ ├── modules/ │ ├── zero-trust-config/ │ ├── users/ │ ├── applications/ │ ├── policies/ │ └── analytics/ ├── examples/ └── docs/ 2. User Management Automated creation of:\nDemo users with varied attributes Group assignments WARP device enrollments Authentication methods 3. Application Protection Pre-configured examples for:\nSaaS application protection Self-hosted application access SSH and RDP access Browser isolation scenarios 4. Security Policies Demonstrates:\nDNS filtering HTTP/S inspection Data Loss Prevention (DLP) Browser Isolation policies Device posture checks Security Features and Benefits The demo environment showcases these key security capabilities:\nFeature Advantage Business Benefit SSH with Access for Infrastructure Secure, policy-driven SSH access to critical non-web applications; eliminates static key risks via short-lived certificates and granular access controls Reduces attack surface, improves compliance, and simplifies SSH credential management; enables secure remote work and faster incident response Enforce MFA to access Applications Enforces multi-factor authentication (MFA) for sensitive apps, supporting various authentication methods (biometrics, OTP, etc.) Strengthens access security, reduces risk of unauthorized access, and helps meet regulatory compliance requirements SCIM provisioning (groups and users) Automates user and group lifecycle management, syncing identities between IdP and Cloudflare Minimizes manual errors, accelerates onboarding/offboarding, and reduces operational overhead Browser-rendered terminal Enables SSH access via browser without client software or complex configuration Enhances user productivity, simplifies IT support, and supports secure remote work AppLauncher Central dashboard to access all entitled applications in one place Streamlines user experience, reduces login fatigue, and increases productivity Device Posture Continuously verifies device health and compliance before granting access (core Zero Trust principle) Reduces risk from compromised or non-compliant devices, enforces security policies dynamically Secure Access to SaaS applications Applies granular, context-aware security policies to SaaS and cloud apps Protects sensitive data, prevents lateral movement, and simplifies SaaS access management SSO Integration (Identity Broker) Seamlessly integrates with identity providers (IdP) for Single Sign-On across all resources Improves user convenience, centralizes authentication, and reduces password-related risks SSH Auditability Logs every SSH command and session for full visibility and traceability Enables compliance auditing, accelerates incident investigations, and supports regulatory needs Secure Web Gateway (Network) Applies network-layer security policies to control access to websites and non-HTTP apps Blocks malicious sites, enforces acceptable use, and reduces risk of data exfiltration Tunnel Availability and failover Ensures continuous, resilient access with automatic failover for secure tunnels Maximizes uptime, supports business continuity, and reduces risk of service disruption Terraforming the project Infrastructure-as-Code (IaC) enables automated, consistent, and repeatable deployment of Cloudflare Zero Trust resources and policies. Reduces manual errors, accelerates provisioning, supports version control and auditability, and simplifies scaling and rollback of security configurations. Unique Value What makes this demo environment special:\nSpeed: Deploy complete environments in \u0026lt; 5 minutes Flexibility: Easily customize for different industries and use cases Realistic: Simulates real-world scenarios with authentic traffic patterns Cost-Effective: Automatic cleanup prevents unnecessary charges Educational: Well-documented code helps teams learn Terraform and Zero Trust Statistics on the code The project has grown significantly:\n1,200+ lines of Terraform code 15+ reusable modules 50+ configurable variables Supports 3 major cloud providers (AWS, GCP, Azure) Demo deployment time: \u0026lt; 5 minutes Teardown time: \u0026lt; 2 minutes Some Technical Limitations While powerful, the demo has some constraints:\nAPI Rate Limits: Bulk operations may hit Cloudflare API limits User Limits: Free tier supports up to 50 users (paid plans for larger demos) IdP Integration: Some IdP features require manual configuration Application Hosting: Sample apps need to be hosted separately Realistic Traffic: Doesn\u0026rsquo;t simulate actual user behavior patterns What\u0026rsquo;s Next In Part 2, I\u0026rsquo;ll provide a detailed walkthrough of:\nSetting up your Cloudflare account and API tokens Configuring the Terraform backend Deploying your first demo environment Customizing for specific scenarios In Part 3, we\u0026rsquo;ll explore:\nAdvanced configurations Multi-tenant demos Custom application integration Monitoring and analytics setup Want to try it yourself?\nThe complete code is available on my GitHub: https://github.com/macharpe Questions or suggestions?\nFeel free to reach out via email or connect on LinkedIn . I\u0026rsquo;m always looking to improve this project and help others build better demos!\nThis is Part 1 of a 3-part series on building scalable Zero Trust demo environments.\n","permalink":"https://blog.macharpe.com/posts/zero-trust-demo-part-1/","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eDisclaimer\u003c/strong\u003e: This article reflects my personal views and experiences and does not represent the official stance of Cloudflare. It is not an official Cloudflare tutorial or documentation. The project discussed is a personal initiative created independently.\u003c/p\u003e\u003c/blockquote\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAs a Solutions Engineer at Cloudflare, I frequently work with customers exploring Zero Trust security solutions. While Cloudflare offers a 50-user free tier perfect for initial testing, I identified a gap: there was no simple, scalable way to quickly demonstrate the full power of Cloudflare\u0026rsquo;s Zero Trust platform in a controlled demo environment.\u003c/p\u003e","title":"Building a Scalable Zero Trust Demo environment with Cloudflare and Terraform (Part 1)"},{"content":"Welcome! 👋 I\u0026rsquo;m excited to announce the launch of my new blog! After years of sharing technical content on LinkedIn and GitHub, I decided it was time to have a dedicated space for more in-depth articles, tutorials, and insights.\nWhy a New Blog? As a Senior Sales Engineer working with cutting-edge cloud and security technologies, I\u0026rsquo;ve accumulated a wealth of knowledge that I want to share with the broader community. While LinkedIn posts are great for quick insights, a blog allows me to:\n📝 Write more detailed technical tutorials 🔍 Share deep dives into complex topics 💡 Document solutions to challenging problems 🎯 Create a centralized knowledge base 📊 Track and organize content more effectively The Tech Stack This blog is built with modern, performant technologies that align with my professional interests:\nHugo Static Site Generator I chose Hugo for several reasons:\nBlazing Fast: Builds in milliseconds, not seconds No Dependencies: Single binary, no Node.js required Markdown-Powered: Write content in simple markdown Powerful: Built-in templating, taxonomies, and multilingual support Active Community: Large ecosystem of themes and plugins PaperMod Theme The PaperMod theme provides:\nClean, modern design Dark mode support Fast, lightweight SEO optimized Mobile responsive Built-in search Social media integration Cloudflare Workers Deploying on Cloudflare Workers offers:\nGlobal Edge Network: Sub-50ms response times worldwide Zero Cold Starts: Always-on, instant responses Free Tier: 100,000 requests/day at no cost Built-in Security: DDoS protection, SSL/TLS Simple Deployment: One command deployment What to Expect I\u0026rsquo;ll be writing about topics including:\nCloudflare Technologies: Zero Trust, Workers, Pages, R2, D1 Cloud Infrastructure: AWS, Azure, GCP, multi-cloud architectures Security: Zero Trust, SASE, network security, compliance DevOps: Infrastructure as Code, Terraform, CI/CD Networking: SD-WAN, routing, network optimization AI/ML: Model Context Protocol, LLM integration, AI infrastructure Career Development: Sales engineering, technical pre-sales, certifications Content Philosophy My approach to technical content:\nPractical First: Real-world examples and working code Production Ready: Solutions you can actually use Explain the Why: Not just what to do, but why it matters S-Tier Quality: Following modern design and engineering principles Open Source: Share code and configurations freely Stay Connected I\u0026rsquo;d love to hear from you! Connect with me on:\nLinkedIn - For professional networking and quick insights GitHub - For code, projects, and contributions Email - For direct communication Subscribe Want to stay updated? The blog has an RSS feed you can add to your favorite RSS reader.\nThank You Thank you for visiting! I\u0026rsquo;m looking forward to sharing knowledge, learning together, and connecting with the community.\nLet\u0026rsquo;s build something amazing! 🚀\nHave questions or suggestions for future posts? Feel free to reach out via email or connect on LinkedIn .\n","permalink":"https://blog.macharpe.com/posts/welcome/","summary":"\u003ch2 id=\"welcome-\"\u003eWelcome! 👋\u003c/h2\u003e\n\u003cp\u003eI\u0026rsquo;m excited to announce the launch of my new blog! After years of sharing technical content on LinkedIn and GitHub, I decided it was time to have a dedicated space for more in-depth articles, tutorials, and insights.\u003c/p\u003e\n\u003ch2 id=\"why-a-new-blog\"\u003eWhy a New Blog?\u003c/h2\u003e\n\u003cp\u003eAs a Senior Sales Engineer working with cutting-edge cloud and security technologies, I\u0026rsquo;ve accumulated a wealth of knowledge that I want to share with the broader community. While LinkedIn posts are great for quick insights, a blog allows me to:\u003c/p\u003e","title":"Welcome to My New Blog!"},{"content":"","permalink":"https://blog.macharpe.com/about/","summary":"","title":"About Me"}]