Documentation Health Score: A Practical Audit for AI-Readiness
By ResolvCmd
If you are evaluating whether your documentation is ready to support an AI-grounded system (a RAG-based search tool, an AI helpdesk add-on, an internal knowledge assistant), you need a way to grade it. “Looks good” is not a grade. “Mostly up to date” is not a grade. The team needs a number that tells them where their knowledge base is on the AI-readiness curve and which documents are dragging it down.
This post lays out a simple manual audit you can run on a 25-document sample in about three hours. It is platform-agnostic and works on any knowledge base.
TL;DR
Sample 25 documents. Score each on six yes/no/partial dimensions. Add the scores up. Use the total to identify which documents are dragging your AI accuracy down. Below is the rubric, the criteria, and the math.
The six dimensions
Each dimension scores 0 to 2: 0 fails, 1 partial, 2 passes.
Dimension 1: Step-shape
Is this document formatted as numbered steps, when steps are what’s needed?
| Score | Criteria |
|---|---|
| 0 | Document is pure prose. No explicit steps. Procedural information embedded in paragraphs. |
| 1 | Document has some structure (headings, bullets) but key procedures are still narrative. |
| 2 | Document is clearly step-shaped where appropriate: numbered steps, explicit verbs, verifiable outcomes. Policies are still allowed to be prose, but operational documents are step-shaped. |
This dimension matters because AI tools pull short pieces of documents (rarely the whole document) when generating an answer. A piece pulled from a numbered procedure preserves the procedure. A piece pulled from a paragraph often does not.
Dimension 2: Currency
Is the document still true today?
| Score | Criteria |
|---|---|
| 0 | Last updated more than 18 months ago on a fast-moving topic. References to deprecated software. EOL technology mentioned without an “as of” qualifier. |
| 1 | Last updated 12 to 18 months ago. Some currency signals present but not actively maintained. |
| 2 | Last updated within 12 months on operational topics, or within 24 months on stable policy topics. No deprecated references. |
A stale document is worse than a missing document. The AI cannot tell that “Office 365 license assignment” is now Microsoft 365.
Dimension 3: Consistency
Does this document contradict another document covering the same topic?
| Score | Criteria |
|---|---|
| 0 | This document covers the same procedure as another document with materially different steps. |
| 1 | Some redundancy with other documents but the divergence is in scope or detail, not in instructions. |
| 2 | Document is the canonical source for its topic. No other document covers the same procedure. |
Two documents disagreeing on the same procedure produce inconsistent answers from any AI grounded on them. The user gets one answer this time and a different answer next time.
Dimension 4: Coverage
Is the topic actually documented at the granularity the team needs?
| Score | Criteria |
|---|---|
| 0 | Topic exists at high level but key sub-procedures are missing. (e.g., “User onboarding” doc exists, but “MFA enrollment for new users” is not covered.) |
| 1 | Topic is covered but with notable gaps (assumes context, skips edge cases, missing common scenarios). |
| 2 | Topic is covered comprehensively, including common edge cases and the most-searched related sub-procedures. |
Dimension 5: Source attribution
Can a reader trace every claim in this document to a source?
| Score | Criteria |
|---|---|
| 0 | Document contains unsourced claims, broken internal links to docs that no longer exist, or unattributed paraphrases of vendor procedures. |
| 1 | Most claims are sourced but some internal references are stale or some paraphrased external content lacks links. |
| 2 | Every claim links somewhere. Internal references resolve. External claims link to vendor or authoritative sources. |
Dimension 6: Type clarity
Is this document clearly typed as a policy or a runbook?
| Score | Criteria |
|---|---|
| 0 | Document mixes policy content (what should happen) and runbook content (how to make it happen) without clear separation. |
| 1 | Document is mostly one type but has some content that belongs in the other. |
| 2 | Document declares its type explicitly and stays in that lane. |
An AI tool that doesn’t distinguish policy from runbook will surface a policy when the user asked for a runbook (or vice versa).
Calculating the score
For each document, sum the six dimensions. A single document scores between 0 and 12.
Sample 25 documents at random. Stratify across categories if your knowledge base is split (e.g., 10 from runbooks, 10 from policies, 5 from procedures). Sum the per-document scores. The 25-document corpus maximum is 300 points.
Health score interpretation
| Score range (25-doc sample) | Interpretation |
|---|---|
| 250-300 | Excellent. Your knowledge base is genuinely AI-ready. Most retrievals will produce accurate, sourced answers. |
| 200-249 | Healthy. Real RAG accuracy will be good with focused improvement on the documents that scored lowest. |
| 150-199 | Typical. Most enterprise knowledge bases score here. AI projects will work for high-traffic happy paths but struggle on the long tail. |
| 100-149 | Poor. AI grounded on this corpus will hallucinate often. Major rewrite or rebuild work needed. |
| Below 100 | Critical. AI projects will fail at scale. Address foundational documentation problems before deploying AI broadly. |
How to run the audit
A practical 3-hour audit:
- Sample 25 documents at random from your most-used knowledge sources.
- For each document, score the six dimensions using the rubric above. Note specific issues that drove each 0 or 1 score.
- Sum the scores to produce both per-document scores and the corpus-level total.
- Identify patterns. Which dimension lost the most points? Which documents scored lowest? Which categories of document underperform?
- Build a remediation list. The lowest-scoring documents are your highest-leverage rebuild candidates.
A common pattern: step-shape and currency account for the majority of point loss in most knowledge bases.
What to do next
The manual audit gives you a snapshot. For ongoing visibility, you want continuous tracking across your whole corpus, not a sample. That’s where a Knowledge Intelligence platform comes in.
ResolvCmd’s Knowledge Studio profiles every document you connect (IT Glue, Hudu, Confluence, Google Drive, more), surfaces the documents that need attention, and helps your team improve them. The audit becomes continuous rather than periodic.
If you want a starting view of how your knowledge base looks, start a free trial and connect a knowledge source. Studio surfaces your highest-priority improvement candidates within hours.
Frequently asked questions
What does a “good” documentation health score look like in practice?
A 220-260 range on the 300-point scale is typical of teams that are getting real value from RAG-grounded AI. Below 150, AI projects struggle. Above 280 is rare and usually means the team has invested significantly in documentation operations.
Should I score every document or just sample?
For an initial diagnostic, a 25-document random sample is enough. The sample tells you the general state and the patterns of weakness. For ongoing health tracking, you want continuous visibility across the whole corpus.
How often should I re-score?
If you are improving documentation actively, monthly is reasonable. If you are not actively improving, quarterly is enough to track drift.
Does this score apply to documentation outside IT operations?
Yes. The dimensions are universal to operational documentation: customer support runbooks, legal procedures, DevOps incident response, HR policies. The scoring rubric does not change. The example queries vary by vertical.
What is the relationship between this score and actual AI accuracy?
The dimensions in this rubric were chosen because each one shows up repeatedly in the literature on RAG failure modes (see Gartner, McKinsey, and Squirro sources). The expected pattern is that documents scoring poorly on step-shape and currency produce the largest share of failed retrievals when grounded by an AI system. Improving the bottom-scoring documents tends to produce the largest accuracy gain per unit of work.
Sources
- Gartner Innovation Insight: Use RAG as a Service to Boost Your AI-Ready Data
- Top Knowledge Management Trends 2026
- Documentation 2026: From Human-Centric to AI-First
- What Is AI-Ready Documentation? A 2026 Definition
- Why Most RAG Projects Fail (And How to Diagnose Yours)
Ready to turn your documentation into instant resolutions?
Start Free TrialMore in AI Readiness
How to Make Your Google Drive Documentation AI-Ready
Google Drive is where most teams accidentally store their documentation: in folders, in Docs, in PDFs, in Sheets, in versions named v2_FINAL_real_v3. Six concrete patterns to make your Drive AI-ready without migrating.
How to Make Your Confluence AI-Ready
Confluence is the most common knowledge base feeding AI tools in IT operations, and one of the most common to underperform. Six concrete fixes for making your Confluence spaces AI-ready: space architecture, page hierarchy, macros, attachments, and freshness.