Why I Built an AI-Powered Requirements Analyzer
Most QA engineers start where the code begins.
But in practice at this point it’s to late. The problems are already there. In implementation, in code.
We all lost the time. Developers while impleming wrong solution. QA while testing the one.
Over time, I realized that many “bugs” we find in testing were actually born long before the first commit.
They start as ambiguous lines in a Confluence spec, or a missing assumption no one noticed.
And that’s when I decided to build an AI-driven analyzer - not for software, but for requirements.
From Idea to Architecture (and a Few Dead Ends)
At first, I thought it would be simple: just feed the Confluence page into a large model and ask it to find gaps.
That worked great for one page — and completely broke for the next.
The problem wasn’t AI, it was structure.
Confluence specs are semi-organized chaos: nested sections, HTML fragments, tables mixed with plain text.
I had to completely rethink how to turn this human-readable mess into machine-readable data.
So I built a modular pipeline:
- Fetch the Confluence page and strip all visual formatting.
- Parse and clean the HTML into a consistent textual layout.
- Chunk it semantically — each atomic requirement becomes a JSON block (
REQ-001,REQ-002, …). - Embed everything using Gemini embeddings and index with FAISS.
- Analyze each requirement in context, not isolation.
Each stage broke at least twice before it worked.
Parsing tables alone took several iterations — Gemini had a habit of merging currency and platform names into one token (WEBUSD, APPEUR), which completely ruined consistency.
Fixing that meant rewriting prompts, adding examples, and learning a painful truth: prompt engineering is 20% creativity and 80% debugging human ambiguity.
Teaching the System to Ask, Not Tell
I didn’t want a model that just “finds issues.”
I wanted something that thinks like a QA engineer: it asks, “what’s missing?”, “what’s inconsistent?”, “what happens if…?”
To achieve that, I built two complementary passes:
- General analysis — the model reviews the entire spec for structural gaps and risks.
- Detailed analysis — it compares each requirement against semantically similar ones from previous specs in the index.
When a new requirement is too close to an old one — but not identical — that’s usually where contradictions hide.
This is where embeddings shine: they don’t care about wording, they care about meaning.
And meaning is where most product issues live.
When AI Meets QA Reality
Once I had a working version, I discovered something interesting:
The analyzer wasn’t replacing QA — it was extending it.
Engineers started using it not as a verdict generator, but as a thinking aid.
Instead of “show me what’s wrong,” they’d say, “show me what I missed.”
The output wasn’t a list of errors — it was a structured set of questions.
And that, to me, is the real shift AI brings to quality engineering.
Lessons Learned (So Far)
- AI is great at pattern recognition, terrible at domain reasoning — you have to teach it the vocabulary of your product.
- Good prompts are like good test cases: clear scope, expected output, minimal ambiguity.
- The hardest bugs aren’t in code, they’re in assumptions — and LLMs are surprisingly good at surfacing them.
- Building AI tools for QA isn’t about replacing humans. It’s about scaling how we think.
This project started as an experiment to see if AI could read specs like a tester.
Now it’s evolving into something much bigger — a system that helps teams write better requirements, catch contradictions early, and shift quality left where it truly belongs.
And yes, the first few versions were messy, slow, and occasionally hallucinated half a table.
But honestly — so did our specs.