Being referenced by AI tools in response to questions — on Google AI Overviews, ChatGPT Search, or Perplexity — is not just a matter of having good content. It's a technical SEO issue masquerading as something else. Websites that get referenced in AI responses share common characteristics that make them more likely to be easily parsed, verified, and attributed by large language models.
When your agency clients ask why their competitors always seem to pop up in AI responses but not them, the answer is seldom "because they have better content." It's more likely a technical SEO issue.
Structured Data Is the Clearest Signal You Can Send
Language models learn from content on the web, but AI citation systems — especially Google's — require real-time access to information at the moment when the system fetches the webpage in question. When an AI-based system uses a website to decide whether to cite it, structuring information allows the system to understand what the page is all about.
The most relevant schema types when considering AI citation are not the ones that the majority of agencies would consider first. Certainly, there is Article and FAQ schema. However, the most relevant schemas are those that identify entity identity and authority: Person schema (with sameAs pointing to LinkedIn, Wikipedia, or Wikidata), Organization schema with its foundingDate and knowsAbout, and Speakable schema — a less-utilised schema that tells AI which sections of a webpage can be quoted.
The Speakable schema labels certain CSS selectors or XPaths as the elements of a page that should be spoken or cited. In Google's documentation, Speakable was created for voice assistants — but the purpose serves equally well for AI in selecting excerpts that can be quoted.
Page-Level Authority Signals That AI Retrieval Systems Use
AI search runs on a chain of trust. An AI-referred web page must undergo an immediate validation process that occurs during retrieval, not during training. The validation process considers numerous technical factors that represent authority.
Author Identity
Pages with a clear author identity — including name, credentials, and a link to an author biography page with structured data — receive consistent citations in AI search more frequently than those without authors or organisation bylines. From a client perspective, all blog posts and articles must include human authorship and a linked schema-structured author biography page.
Publication and Update Dates
AI systems are very sensitive to timeliness, especially for rapidly changing fields such as SEO and AI optimization. Your Article schema's datePublished and dateModified properties should match reality. If an article has been updated, the dateModified field must be updated as well. Pages that have not been touched since original publication are not preferred for citation when there are recent-state searches.
E-E-A-T Signals for Entities
Experience, Expertise, Authoritativeness, and Trustworthiness form the evaluation criteria used by Google's AI Overviews system as a citation filter. Technical considerations for E-E-A-T signals include: author schema with links to external profiles, Organization schema with sameAs linking to Google My Business, NAP consistency within the website, and internal links that concentrate PageRank on the pages you want cited.
Site Architecture Decisions That Help or Hurt AI Citation
The structure of the website plays an important role in terms of the completeness of the AI's knowledge graph about the content on the website. Topic clusters — where a pillar page links to supporting pages and vice versa — provide AI with information about the comprehensive coverage of the subject by the website, as opposed to having a single isolated article on the topic.
When websites have isolated content that is not clustered, only individual pages get cited because the retrieval system cannot understand topical authority from the link graph. With topic clusters, several pages on the same topic can be cited due to topical authority inferred through the link structure.
This means that when creating topic clusters for agency clients, each piece of content must belong to a defined topic cluster. There must be bidirectional links between pages and a high-level summary of the cluster's topic on the pillar page.
Crawlability and Citation: The Most Overlooked Connection
Citation by AI bots requires the page to be available first. This seems self-evident, yet the technical considerations are not always addressed. Pages with disallowed robot instructions, strict rate limits, and render-blocking JavaScript will prevent full page content rendering by an AI crawling system — meaning only visible page content gets a chance at citation.
Pages targeted for AI citation must pass tests for:
- Full content availability in non-JS rendered HTML
- No robot instructions blocking AI bot user-agents
- Fast loading — under 2.5 seconds — which correlates with increased crawling frequency
The list of AI crawling user-agents now includes PerplexityBot (Perplexity), GPTBot (OpenAI), and Google's extended bots used for crawling AI-friendly pages.
Many websites inadvertently block access to AI bots through general rules applied to all user-agents. Verification of permissions for AI bots in robots.txt is now a standard part of the technical SEO audit process.
The LLMs.txt Opportunity
There is a new protocol in development similar to robots.txt but tailored specifically for AI-based systems: the /llms.txt file. This plain text file, sitting at the root of the website, gives AI systems a means to know what the key pages on the website are, the subjects which the website speaks with authority about, and how it should represent the website's content.
For agency clients operating in niche markets with meaningful competition, becoming one of the first websites to implement llms.txt provides a structural advantage — the AI systems reading that file get access to the key authority claims of the website. Implementing such a file is a low-cost activity that takes under an hour.
Pam Harper
Founder of Harper Media Group. 20+ years of web development, 12+ years of technical SEO. Specializing in technical SEO, structured data, and AI optimization — delivered white-label for agencies.
About Pam Harper