2026-05-15

Rethinking a Documentation Ecosystem: From Static Documentation to an Intelligent Publishing Pipeline

For a long time, technical documentation was treated as a final destination. You write pages. You publish them. They live somewhere inside a more or less elegant documentation portal. End of story.

Over the past few weeks, I worked on something very different: transforming an existing MkDocs documentation system into a true multi-channel publishing pipeline capable of feeding:

a rich documentation website,
an embedded support chatbot knowledge base,
and eventually AI/RAG systems able to reason over the documentation itself.

And honestly, the moment the first V1 actually worked “for real” was a genuinely great technical moment.

The Real Problem: MkDocs Was Not the Final Destination

At first, the need sounded almost trivial:

Automatically publish part of a MkDocs documentation system into a customer support platform integrated with a chatbot.

But very quickly, several constraints appeared.

The MkDocs rendering was already extremely rich:

Markdown snippets,
macros,
admonitions,
Mermaid diagrams,
tabs,
anchors,
API blocks,
complex internal cross-links,
Material for MkDocs transformations.

And most importantly: all of this already existed inside the HTML generated by MkDocs itself.

That realization completely changed the architecture of the project.

The Decision That Unlocked Everything

The classic reflex would have been:

Markdown → HTML → target platform

But that would have required rebuilding the entire rendering logic.

Instead, we chose another direction:

mkdocs build
    ↓
HTML already rendered by MkDocs
    ↓
Extraction + cleanup
    ↓
Publication

This became the core of the entire pipeline.

MkDocs remained the canonical rendering engine.

The publication system was no longer performing “Markdown conversion”, but intelligent post-processing of the final rendered output.

And that changes everything.

Because from that point onward:

snippets are already resolved,
macros are already executed,
admonitions already exist,
anchors already exist,
links are already computed.

The documentation is no longer raw text: it is already structured content.

The Moment It Started Becoming Really Interesting

Very quickly, the project stopped being “just” a documentation export.

It started touching much deeper concerns:

multi-channel publishing,
HTML normalization,
content transformation,
article synchronization,
asset management,
link rewriting,
identifier mapping,
chatbot compatibility,
future AI-readable knowledge structures.

In other words:

the documentation was gradually becoming an abstraction layer.

The Fascinating Problem of Internal Links

One of the most interesting technical moments came from understanding how article URLs were generated on the target platform.

Final article URLs were not known in advance.

They were dynamically generated when articles were created.

Which meant that an internal link such as:

../../../concepts/item/

could not be converted correctly immediately.

And this is where the architecture shifted toward something much cleaner:

a two-pass synchronization model.

Final V1 Architecture

The logic became:

First pass

create articles,
retrieve runtime metadata,
build a deployment mapping.

Second pass

rewrite all internal links,
inject article relationships,
finalize content.

From a distance, this may sound like a small detail.

But in reality, this is exactly the kind of issue that transforms a project from:

“a quick export script”
into a real editorial pipeline.

The Embedded Chatbot: The Reality of Constraints

Another fascinating aspect was working on rendering inside the embedded support chatbot.

And there, the reality of the modern web hit immediately:

what works inside a full website does not necessarily work inside an embedded component.

Some things worked perfectly:

simple tables,
code blocks,
callouts,
minimal HTML.

Others much less:

complex CSS,
tabs,
buttons,
JavaScript-driven components,
certain visual classes.

So we started defining a true “safe HTML grammar” for chatbot environments.

What is fascinating here is that it almost brings us back to a form of web sobriety:

robust HTML,
clear structure,
strong semantics,
minimal JavaScript dependency.

And paradoxically, this often makes the content more readable.

Mermaid, or How to Keep Diagrams “Readable” by AI

One of the most interesting challenges involved Mermaid diagrams.

The MkDocs site rendered them perfectly. The embedded chatbot, however, could not interpret them correctly.

The obvious solution would have been simple:

Mermaid
    ↓
SVG
    ↓
Final image

But this introduced a much subtler problem.

An image is perfect for humans. Much less so for an AI agent.

And the goal was not only to obtain a visually correct rendering. It was also necessary to preserve the logical structure of the diagram in order to feed the search and reasoning capabilities of the conversational system itself.

The final solution ended up being much more elegant.

We implemented a pipeline that:

automatically detects Mermaid blocks,
automatically generates SVGs,
automatically uploads assets,
reinjects compatible images into the final HTML,
while still preserving the original Mermaid flow “hidden” inside the HTML.

In other words:

humans see a perfectly rendered diagram,
the chatbot still sees the Mermaid source structure,
the AI agent can therefore continue exploiting:
- nodes,
- relationships,
- labels,
- flow logic.

And that was not all.

Because diagrams can become numerous over time, we also introduced a hashing system able to detect automatically whether a diagram actually changed.

Meaning:

Mermaid source
    ↓
Hash
    ↓
Diagram already known?
       ├── yes → reuse existing asset
       └── no  → regenerate + upload

Result:

no unnecessary uploads,
no duplicated assets,
much faster synchronization,
implicit diagram versioning.

This was probably one of the moments where the project truly stopped being “just displayed content”.

The diagram simultaneously became:

a visual object,
a documentation object,
a logical object,
and an AI-readable object.

And honestly, that was a very satisfying architectural moment.

What the Project Is Becoming

This is probably the most interesting part.

Initially:

“Export documentation.”

Today, much less so.

The system is gradually starting to resemble:

a documentation transformation engine,
a publishing pipeline,
a multi-target synchronization system,
a knowledge structuring layer,
and potentially a foundation usable by AI systems.

Without ever intentionally building a CMS.

In fact, almost the opposite:

we deliberately keep:

Markdown,
Git,
MkDocs,
simple files,
a readable architecture.

But we progressively add an intelligence layer on top.

One of the Most Interesting Lessons

I think one of the major lessons of this project is this:

documentation is not merely content.

It is structure.

A graph.

A representation of how a system actually works.

And once that structure becomes clean:

it can be published,
transformed,
connected,
queried,
visualized,
and tomorrow, probably reasoned over by AI agents.

The boundary between “documentation”, “knowledge graph”, and “intelligent assistance system” starts becoming surprisingly blurry.

And this is exactly the kind of technical grey zone I love exploring.