Turning Confluence Specs into Machine-Readable Intelligence

29 September 2025 - 3 mins read time

When I started working on AI-driven requirements analysis, I hit an early roadblock:
our specifications were written for humans — not for machines.

They looked beautiful on Confluence: colorful tables, nested bullet points, “helpful” formatting.
But when you strip that down to text, it turns into chaos — currency values glued to platform tags, headers lost in whitespace, and sections that look like poetry to humans but nonsense to a model.

I realized that before AI could reason about specs, I needed to teach it how to read them.

Stage 1: Cleaning the Noise

The first step was fetching Confluence pages via API and feeding them into BeautifulSoup.
At first, I tried a simple .get_text() extraction — and… lost the things that were mattered.
Tables were flattened, context was gone.
One line said:

“Price WEBNZD XX and APPUSD XX”

Another:

“Price WEBUSD XX”

The model thought these were the same thing.
And technically, it wasn’t wrong — it just couldn’t tell why they were different.

So I added a structural pre-processor:

Extract tables separately with their headers preserved.
Normalize platform names (WEB, APP, DESKTOP) as entities.
Clean markup but preserve hierarchy using indentation and separators.

Now the output wasn’t just text — it was semantically structured text.

Stage 2: Breaking Down Human Thought

Once I had clean input, I moved to semantic chunking.
I used Gemini to split the document into atomic requirements — “one intent per chunk.”
Each requirement became a JSON object with ID, status, and metadata.

I had to rewrite my prompt multiple times before it started producing stable structure.
The trick was to make the model understand that it’s not summarizing text — it’s mapping responsibilities.
That shift turned vague paragraphs into something a test engineer could actually verify.

Stage 3: Making It Searchable

Once chunks were ready, I embedded them with Gemini embeddings and stored them in FAISS.
This opened the door to semantic retrieval — I could now query not by keyword, but by meaning.

For example:

“Show me all requirements that define experiment pricing logic for USD.”

It doesn’t care if the document says “promo,” “variant,” or “A/B test.”
If the intent is similar, FAISS brings it up.

This is when the tool became a memory system for product logic.
Specs stopped being static documents and started behaving like a knowledge graph.

Stage 4: Connecting the Dots

The most powerful insight came when I compared new specs to older ones.
Suddenly I could spot patterns like:

The same logic written twice in different words.
A “temporary experiment” that quietly became production logic.
A new feature missing the fallback condition defined elsewhere.

In traditional QA, you might catch that weeks later — during regression or, worse, from a user.
Here, AI flagged it before the ticket even hit sprint planning.

What I Learned Along the Way

BeautifulSoup saves you, but only if you treat specs as data, not documents.
LLMs are not parsers. They’re reasoning engines — but only after you teach them to respect boundaries.
FAISS indexing changes everything. Once you store meaning, not words, QA becomes proactive, not reactive.
Clean data is the most valuable thing that we have. The less garbage you feed the model, the smarter it looks.

This project tought me that real AI engineering isn’t glamorous.
It’s wrestling with messy inputs, broken assumptions, and half-finished specs until something intelligent emerges.