Smarter test runs: How we use AI and Qase

If you’ve ever stared at a wall of logs after a test fails, you know the feeling: it’s not the failure itself that slows you down, it’s the messy detective work that follows.

You jump between tabs, scroll through endless stack traces, and try to piece together what actually happened, or you scroll through the Playwright trace to figure all that out.

This becomes even a bigger problem, if a test would fail, and in your absence, a developer needs to troubleshoot what exactly happened. It’s exhausting, especially when the problem turns out to be something simple – and in many occasions it is.

We wanted to cut out the noise. We wanted a simple solution, with no massive overhaul.

We added a small AI layer on top of our existing Playwright + Qase setup. Nothing dramatic. But it changed the day-to-day experience more than we expected.

Before: All the data, none of the clarity

A typical failing test gave us:

An error message that might or might not be helpful
A long stack trace
Console logs buried under more logs
Playwright trace log, where you go through screenshots, network events etc.

All the information was technically there, but it felt like solving a puzzle every single time.

After: A bit of AI that immediately points you in the right direction

Now, when a test fails:

Our custom reporter gathers the important pieces (title, steps, error, stack).
Instead of just writing that out to a file, it sends a short, structured request to an AI model asking for a breakdown: “What went wrong, what’s the evidence, and what should a developer look at first?”
The model returns a simple explanation.
We save the summary locally and attach it directly inside Qase.

So when you open a failed test run in Qase, you see an actual narrative instead of a pile of raw data. You have a full diagnostic explanation of the failure and what the developer should check and do.

Why this combo works

Qase already gives us a neat place to look at results. AI adds the connective tissue: the part that explains what the results mean.

No new dashboards. No “AI-only mode.” Just better explanations in the same place everyone is already looking: Qase.

What AI actually does

No magic auto-fixing. No rewriting selectors for us. What it does:

Points out flaky patterns (timeouts, missing elements, navigation jumps)
Flags questionable selectors
Suggests practical fixes
Highlights the exact log lines related to the failure

It’s basically someone saying, “Hey, check this part right here.”

What we gained

Pain Before	What We Get Now
Time wasted figuring out where to start	A short summary you can read in seconds
Repeatedly diagnosing the same types of issues	Pattern awareness across runs
Junior engineers stuck in logs	Clear pointers on what to investigate
Long handoffs between QA and developers	Everyone sees the same explanation

What we didn’t do

We didn’t automate everything.
We didn’t build a huge ML system.
We never send sensitive or full internal logs, only what we decide to pass in.
And if AI is down, tests still behave exactly the same.

This is an add-on, not a dependency.

Developer experience today

Run the tests. If something fails, the failure still shows up, but now with:

A root-cause line
A snippet of evidence
A suggestion for how to fix it

You get the signal, not the noise.

Why this matters even outside our team

This approach has a bunch of second-order benefits:

New teammates onboard faster
More consistency in how failures are analyzed
Better test hygiene (poor selectors get called out immediately)
You build a historical trail of why things broke, not just that they broke

It turns automated tests into something closer to a feedback system.

Final thought

Automated tests already tell you when something breaks. With a tiny bit of AI, they can start telling you why. You don’t need anything fancy to replicate this: