Skip to content

6 · Testing your model & what "accuracy" means

You trained a model. But is it any good? Don't guess — test it. And here's the golden rule of testing, used by every real ML maker:

Test on examples the model has never seen before. If you only check it on the exact photos you trained it with, of course it looks perfect — it basically memorized those. The real question is how it does on new stuff.

So make some fresh test examples. Give a thumbs up you didn't capture during training — maybe with your other hand, or standing somewhere new — and watch the confidence bars. Accuracy is just the score for how often the model gets the right answer. If you test it 10 times and it's right 9 times, that's 90% accuracy. Higher is better; nothing real is ever perfect.

A couple of words you'll see while testing:

  • Confidence — how sure the model is about one guess, shown as a percent. High confidence is not the same as being correct! A model can be 99% confident and still wrong. (Sound familiar? That's the same trap as an AI sounding sure but being wrong.)
  • Prediction — the model's actual answer (the label with the highest confidence).

When your model gets something wrong, you've found gold — because now you can make it better. This is the real maker loop:

StepWhat you do
1. Test on new examplesFind where it gets confused
2. DiagnoseWhy? Bad lighting? A label it rarely saw? A sneaky background clue?
3. Add better training dataMore variety, more examples of the confusing case
4. Re-train and test againSee if accuracy went up

Notice that fixing the model almost always means fixing the training data, not clicking a magic "be smarter" button. Back to Lesson 3: garbage in, garbage out — and better in, better out.

Think about it. Your "thumbs up vs. thumbs down" model nails it at your desk but fails by the window. What's probably different there (hint: a feature you didn't mean to teach), and what new examples would you add to fix it?

Sources