June 29, 2026 · 13 min read · AI Agencyai.txtGEOAI Policy

ai.txt: The Policy File That Tells AI How to Use Your Content

Everything you need to know about ai.txt, the emerging standard for declaring your website's AI usage policies. Covers the specification, how it differs from robots.txt and llms.txt, implementation steps, and why AI agencies should implement it for their clients.

Shubhamraj Singh Product Manager · Program Manager · Marketing Strategist

Your Website Needs a Policy Layer for AI, and robots.txt Is Not Enough

Every website owner is now navigating a world where their content is consumed by two fundamentally different audiences: human readers and AI systems. The challenge is that these audiences use your content in fundamentally different ways. A human reads your blog post and moves on. An AI system might index it for search retrieval, feed it into a training dataset, summarise it in a chatbot response, or cite it in a generated answer.

Until recently, website owners had no standardised way to express preferences about how AI systems should use their content. robots.txt handles access control. llms.txt curates what content to surface to language models. But neither addresses the policy question: “You can access my content, but here is how I want you to use it.”

That is the gap ai.txt fills. And as someone who has been implementing GEO strategies and AI governance configurations for clients through our AI Agency practice, I can tell you that this file is becoming an essential part of every website’s AI readiness stack.

What Is ai.txt?

ai.txt is a proposed policy file that websites place at their root directory to declare preferences about how AI systems should interact with their content. Think of it as a machine-readable policy statement. It does not block access (that is robots.txt’s job). It does not curate content for LLMs (that is llms.txt’s job). Instead, it declares what AI systems are allowed and not allowed to do with the content they access.

The specification exists as an IETF internet-draft, developed under the AIPREF (AI Preferences) working group. It is not yet a universally adopted standard, but it is gaining traction precisely because it addresses a problem that robots.txt was never designed to solve: the distinction between crawling for indexing purposes and crawling for model training purposes.

Here is a simplified example of what an ai.txt file looks like:

# ai.txt - AI Usage Policy
# https://example.com/ai.txt

Operator: Example Corp
Contact: [email protected]

Policy: Search-Retrieval
Allow: yes

Policy: Model-Training
Allow: no

Policy: Content-Summarisation
Allow: yes
Attribution: required

The file sits at https://yourdomain.com/ai.txt, alongside your existing robots.txt. Any AI system that respects the standard reads this file and adjusts its behaviour accordingly.

Why robots.txt Cannot Solve This Problem

I have worked with AI agencies and businesses that initially assumed robots.txt was sufficient for managing AI interactions. It is not, and understanding why requires understanding what robots.txt was designed to do.

robots.txt is an access control mechanism. It tells crawlers, “You can visit these pages” or “You cannot visit these pages.” It is binary, path-based, and crawler-specific. You can block GPTBot from crawling /private/ or allow PerplexityBot to access /blog/. But robots.txt cannot express conditional permissions. It cannot say, “You can access this content for search purposes, but you cannot use it for training your model.”

This limitation creates a difficult choice for website owners. Either you allow AI crawlers full access (and accept that your content might be used for training), or you block them entirely (and lose visibility in AI-powered search results). There is no middle ground with robots.txt alone.

ai.txt provides that middle ground. With ai.txt in place, you can declare: “Yes, AI search engines can retrieve and cite my content in answers. No, AI companies cannot use my content to train their foundation models.” This distinction is critical for businesses that want AI search visibility without surrendering their intellectual property.

The Three-File AI Governance Stack

After implementing AI governance configurations across multiple client websites through our AI Agency engagements, I have settled on a three-file approach that I recommend to every organisation. Each file serves a distinct purpose, and they work together as a complete system.

robots.txt: Access Control

robots.txt controls which AI crawlers can access which parts of your site. It is the enforcement layer, the file with “teeth.” If you block a crawler in robots.txt, responsible AI companies will comply. This is where you make binary allow/disallow decisions for specific crawlers and paths.

For a detailed breakdown of how to configure robots.txt for AI crawlers in 2026, see my robots.txt configuration guide.

llms.txt: Content Curation

llms.txt tells language models which content is most relevant and how it is structured. It is a curated guide for AI systems, helping them understand your site’s information architecture, identify authoritative pages, and navigate your content hierarchy. Think of it as a concierge for LLMs visiting your site.

ai.txt: Policy Declaration

ai.txt declares your preferences for how AI systems should use the content they access. It is the policy layer that sits on top of access control and content curation. Where robots.txt says “can you enter,” and llms.txt says “what should you look at,” ai.txt says “what can you do with what you find.”

Together, these three files give website owners comprehensive control over their AI interactions. Implementing all three is what I consider the baseline for AI-ready web infrastructure in 2026, and it is a service every AI Agency should offer its clients.

The AIPREF Working Group and Standardisation Efforts

The ai.txt specification is being developed through the AIPREF (AI Preferences) working group, which is focused on creating machine-readable standards for expressing AI usage preferences. The working group includes representatives from AI companies, publishers, and standards organisations.

The key design principles of the specification include:

Machine readability. AI systems should be able to parse ai.txt automatically without human interpretation. The file format uses simple key-value pairs with standardised policy categories.

Granular permissions. Rather than a binary allow/deny, ai.txt supports multiple policy categories: search retrieval, model training, content summarisation, content generation, and more. Website owners can set different permissions for each category.

Operator identification. The file includes fields for identifying the website operator and providing contact information for AI companies that need to negotiate specific usage terms.

Extensibility. The specification is designed to accommodate new AI use cases as they emerge. New policy categories can be added without breaking existing implementations.

It is important to note that the AIPREF specification is still in draft status. It has not been ratified as an RFC (Request for Comments), and adoption across AI companies varies. However, the trajectory is clear: the industry needs a standardised way to express AI usage preferences, and ai.txt is the leading proposal.

What ai.txt Policies Can Express

Based on the current draft specification and practical implementations I have deployed for AI Agency clients, here are the key policy categories that ai.txt supports:

Search Retrieval

This policy governs whether AI search engines (Perplexity, ChatGPT Search, Google AI Overviews) can retrieve and cite your content in their generated answers. For most websites, this should be set to “Allow: yes” because AI search visibility is becoming a significant traffic and authority driver.

Setting this to “no” means AI search engines should not surface your content in answers, even if they have crawled it. This is useful for content that is behind paywalls or that requires specific licensing for any form of republication.

Model Training

This is the policy that generates the most debate. Setting “Allow: no” for model training tells AI companies that your content should not be included in datasets used to train or fine-tune language models. This is the preference that robots.txt alone cannot express, because blocking a crawler entirely also blocks search retrieval.

For most businesses I work with, the recommended setting is “Allow: no” for model training. Your content marketing investments should benefit your brand, not train a competitor’s AI model.

Content Summarisation

This governs whether AI systems can generate summaries of your content. Most websites benefit from allowing summarisation because it increases visibility and drives referral traffic. The key nuance here is the “Attribution: required” directive, which signals that any summarisation should include proper attribution to your site.

Content Generation

This policy addresses whether AI systems can use your content as source material for generating new content. Setting this to “no” means AI systems should not use your articles, documentation, or creative works as seeds for generating derivative content.

Implementing ai.txt: A Step-by-Step Guide

Here is the implementation process I follow for every client website:

Step 1: Audit Your Current AI Governance

Before creating ai.txt, audit what you already have in place. Check your robots.txt for AI crawler directives. Check whether you have an llms.txt file. Review your site’s terms of service for any existing AI usage clauses. This audit ensures your ai.txt aligns with your existing policies rather than contradicting them.

Step 2: Define Your AI Usage Policies

Work with your legal team (or your AI Agency’s advisory team) to define clear positions on each policy category. The key questions to answer:

Do you want AI search engines to cite your content? (Most businesses: yes)
Do you want your content used for model training? (Most businesses: no)
Do you want AI systems to summarise your content? (Most businesses: yes, with attribution)
Do you want AI systems to use your content for generating new content? (Most businesses: no)

Step 3: Create the ai.txt File

Create a plain text file named ai.txt with your policy declarations. Here is a recommended template for most business websites:

# ai.txt - AI Usage Policy Declaration
# Generated: 2026-06-29
# Website: https://yourdomain.com
# Specification: IETF Internet-Draft (AIPREF)

Operator: Your Company Name
Contact: [email protected]
Policy-URL: https://yourdomain.com/ai-policy

# Search Retrieval - Allow AI search engines to cite our content
Policy: Search-Retrieval
Allow: yes
Attribution: required

# Model Training - Do not use our content for training AI models
Policy: Model-Training
Allow: no

# Content Summarisation - Allow with attribution
Policy: Content-Summarisation
Allow: yes
Attribution: required

# Content Generation - Do not use as seed for new content
Policy: Content-Generation
Allow: no

Step 4: Deploy and Reference

Place the file at your domain root so it is accessible at https://yourdomain.com/ai.txt. Then reference it in your robots.txt file:

# Reference to AI usage policy
AI-Policy: /ai.txt

This cross-reference helps AI crawlers discover your ai.txt file when they read your robots.txt, which is typically the first file any crawler checks.

Step 5: Align with Legal Documentation

Update your website’s terms of service and privacy policy to reference your ai.txt file and the policies it contains. While ai.txt itself does not carry legal enforcement power, aligning it with your legal documentation creates a stronger position if you ever need to challenge unauthorised AI usage of your content.

The Legal Landscape: No Mandate, But Growing Momentum

As of mid-2026, there is no legal mandate requiring websites to implement ai.txt, and there is no legal requirement for AI companies to honour it. However, the regulatory environment is moving in a direction that favours transparency and explicit consent for AI data usage.

India’s IT Rules emphasise transparency in how digital platforms handle user data, and this principle is extending to how AI systems interact with published content. The EU AI Act requires transparency in training data sources for high-risk AI systems. The US is developing guidelines for AI data governance that may eventually reference standards like ai.txt.

The practical reality is that major AI companies are building their systems to respect machine-readable preferences because the alternative, ignoring publisher preferences, creates regulatory, legal, and reputational risks they want to avoid. Implementing ai.txt now positions your website ahead of the regulatory curve.

For AI agencies advising clients, the recommendation is clear: implement ai.txt as a best practice, not because it is legally required today, but because it establishes a documented policy position that becomes increasingly valuable as regulations mature.

How AI Agencies Should Deploy ai.txt for Clients

If you are running or working with an AI Agency, ai.txt implementation should be part of your standard GEO and AI readiness services. Here is how I approach it:

Include it in every AI audit. When evaluating a client’s AI readiness, check for robots.txt AI directives, llms.txt, and ai.txt. Most clients will have none of these properly configured, which is an immediate service opportunity.

Bundle it with GEO services. ai.txt implementation pairs naturally with Generative Engine Optimization. Clients who want AI search visibility also need policy controls over how their content is used beyond search. Offering the complete three-file governance stack (robots.txt AI configuration, llms.txt, ai.txt) as a packaged service creates clear value.

Conduct quarterly reviews. The AI ecosystem evolves rapidly. New AI systems launch, new crawlers appear, and the AIPREF specification itself is still evolving. Schedule quarterly reviews of clients’ ai.txt files to ensure policies remain current and aligned with their business objectives and the latest trends in the AI Agency space.

Document everything. Keep records of when ai.txt was implemented, what policies were declared, and any changes made over time. This documentation trail becomes valuable if a client ever needs to demonstrate that they had clear, published AI usage policies in place.

Common Mistakes to Avoid

Do not rely on ai.txt alone for enforcement. ai.txt is a policy declaration, not an access control mechanism. If you want to block a specific AI crawler, you still need to do that in robots.txt. ai.txt expresses preferences. robots.txt enforces boundaries.

Do not set overly restrictive policies. Blocking everything, including search retrieval, means your content disappears from AI-powered search results. In a world where AI search is capturing an increasing share of user queries, blanket restrictions hurt more than they help.

Do not forget to update. AI usage policies are not set-and-forget. New AI applications emerge, your business priorities evolve, and the regulatory landscape shifts. Treat ai.txt as a living document that requires periodic review.

Do not create contradictions between files. If your robots.txt allows GPTBot to crawl your site but your ai.txt says “Model-Training: Allow: no,” that is a coherent position (allow access, restrict usage). But if your robots.txt blocks a search crawler that your ai.txt allows for search retrieval, you have created a contradiction that AI systems will resolve by defaulting to the more restrictive interpretation.

The Bigger Picture: AI Governance Is Now a Web Standard

The emergence of ai.txt alongside robots.txt and llms.txt signals a broader shift. AI governance is becoming a standard part of web infrastructure, just like HTTPS, structured data, and accessibility compliance. Websites that ignore this shift will lose control over how their content is used by AI systems, just as websites that ignored SEO lost visibility in traditional search.

For businesses, the action item is straightforward: implement the three-file AI governance stack now, before it becomes an industry expectation. For AI agencies, offering AI governance implementation as a core service, alongside content strategy and AI automation, positions you as a comprehensive partner in your clients’ AI journey.

The AI governance conversation is only going to grow louder. The organisations that establish clear, machine-readable policies today will be the ones best positioned to navigate whatever regulations, standards, and AI capabilities emerge tomorrow.

Need help implementing GEO for your website? Get help with AI automation.

Enjoyed this article?

Subscribe to get my latest insights on product management, program management, and growth strategy.

Subscribe to Newsletter