Turning Confluence Specs into Machine-Readable Intelligence

When I started working on AI-driven requirements analysis, I hit an early roadblock:
our specifications were written for humans — not for machines.

They looked beautiful on Confluence: colorful tables, nested bullet points, “helpful” formatting.
But when you strip that down to text, it turns into chaos — currency values glued to platform tags, headers lost in whitespace, and sections that look like poetry to humans but nonsense to a model.

I realized that before AI could reason about specs, I needed to teach it how to read them.


Stage 1: Cleaning the Noise

The first step was fetching Confluence pages via API and feeding them into BeautifulSoup.
At first, I tried a simple .get_text() extraction — and… lost the things that were mattered.
Tables were flattened, context was gone.
One line said:

“Price WEBNZD XX and APPUSD XX”

Another:

“Price WEBUSD XX”

The model thought these were the same thing.
And technically, it wasn’t wrong — it just couldn’t tell why they were different.

So I added a structural pre-processor:

Now the output wasn’t just text — it was semantically structured text.


Stage 2: Breaking Down Human Thought

Once I had clean input, I moved to semantic chunking.
I used Gemini to split the document into atomic requirements — “one intent per chunk.”
Each requirement became a JSON object with ID, status, and metadata.

I had to rewrite my prompt multiple times before it started producing stable structure.
The trick was to make the model understand that it’s not summarizing text — it’s mapping responsibilities.
That shift turned vague paragraphs into something a test engineer could actually verify.


Stage 3: Making It Searchable

Once chunks were ready, I embedded them with Gemini embeddings and stored them in FAISS.
This opened the door to semantic retrieval — I could now query not by keyword, but by meaning.

For example:

“Show me all requirements that define experiment pricing logic for USD.”

It doesn’t care if the document says “promo,” “variant,” or “A/B test.”
If the intent is similar, FAISS brings it up.

This is when the tool became a memory system for product logic.
Specs stopped being static documents and started behaving like a knowledge graph.


Stage 4: Connecting the Dots

The most powerful insight came when I compared new specs to older ones.
Suddenly I could spot patterns like:

In traditional QA, you might catch that weeks later — during regression or, worse, from a user.
Here, AI flagged it before the ticket even hit sprint planning.


What I Learned Along the Way


This project tought me that real AI engineering isn’t glamorous.
It’s wrestling with messy inputs, broken assumptions, and half-finished specs until something intelligent emerges.