The holdout test

In machine learning, you hold back some data that the model never sees during training. This lets you check if the model actually learned the pattern or just memorized the examples.

We might need the same thing for AI-generated code.

Right now, most of our tests are “public”. The AI can see them, learn from them, and optimize for them. This works for basic functionality. But it creates a risk.

The AI might generate code that passes all your tests but doesn’t actually solve the problem. Like writing an if statement for every number between 1 and 2000 instead of using a proper algorithm.

The code technically works. It passes your tests. But it’s brittle and will break the moment you need to handle 2001.

So we need two types of tests. Public tests that guide the AI toward the right solution. And private tests that the AI never sees.

The private tests are your real validation. They test the business logic that actually matters. They try malformed inputs, edge cases, performance under load. They check whether the solution actually works, not just whether it responds correctly to known inputs.

This creates a new discipline. Someone needs to write these holdout tests. Someone who understands the domain deeply enough to know how the system might fail in the real world.

The AI helps with implementation. Humans focus on validation.

September 4, 2025