Smarter Prompts, Stronger Tests: My Journey to AI-Powered QA
Pre-trained LLMs can generate test cases from requirements, but often underperform. In our benchmark, GPT-5 detected only 67% of seeded defects. We developed a specialized prompting method that analyzes requirements and applies advanced test design techniques. Claude Sonnet 4, trained with domain, action-state, complementary, extreme value testing, and the single fault assumption, achieved 100% defect detection on the same benchmark and over 98% across nine programs.
Traditional prompting failed due to biases; instead, we used special prompting via Langfuse, Claude, and our Harmony tool. This talk shows how to build, debug, and refine prompts, train LLMs to select rigorous tests, and overcome key challenges in AI-driven software testing.
Value for the audience:
Audience will learn how to overcome common prompting biases and why traditional methods like few shot or chain of thought prompting are not enough for effective test design.
Audience will learn how advanced LLM training and test design techniques can achieve over 98% defect detection, surpassing both testers and baseline models.
Audience will learn practical strategies to build, debug, and refine prompts, ensuring successful AI driven software testing projects.
Problems addressed:
Test design takes a long time, and good test design requires special knowledge, especially when multiple test design techniques must be applied in parallel.
Many defects are discovered after release, resulting in more than $1 trillion bugfix costs each year worlwide.
Talk language: English
Level: Advanced
Target group: manual testers, test automation engineers, QA leads, business analysts
Company:
4Test-Plus Kft.
Dr. Istvan Forgacs