6 · Testing your model & what "accuracy" means
You trained a model. But is it any good? Don't guess — test it. And here's the golden rule of testing, used by every real ML maker:
Test on examples the model has never seen before. If you only check it on the exact photos you trained it with, of course it looks perfect — it basically memorized those. The real question is how it does on new stuff.
So make some fresh test examples. Give a thumbs up you didn't capture during training — maybe with your other hand, or standing somewhere new — and watch the confidence bars. Accuracy is just the score for how often the model gets the right answer. If you test it 10 times and it's right 9 times, that's 90% accuracy. Higher is better; nothing real is ever perfect.
A couple of words you'll see while testing:
- Confidence — how sure the model is about one guess, shown as a percent. High confidence is not the same as being correct! A model can be 99% confident and still wrong. (Sound familiar? That's the same trap as an AI sounding sure but being wrong.)
- Prediction — the model's actual answer (the label with the highest confidence).
When your model gets something wrong, you've found gold — because now you can make it better. This is the real maker loop:
| Step | What you do |
|---|---|
| 1. Test on new examples | Find where it gets confused |
| 2. Diagnose | Why? Bad lighting? A label it rarely saw? A sneaky background clue? |
| 3. Add better training data | More variety, more examples of the confusing case |
| 4. Re-train and test again | See if accuracy went up |
Notice that fixing the model almost always means fixing the training data, not clicking a magic "be smarter" button. Back to Lesson 3: garbage in, garbage out — and better in, better out.
Think about it. Your "thumbs up vs. thumbs down" model nails it at your desk but fails by the window. What's probably different there (hint: a feature you didn't mean to teach), and what new examples would you add to fix it?
Sources
- MIT RAISE. Day of AI — hands-on activities on training and testing models. https://raise.mit.edu/
- Code.org. How AI Works (how models are evaluated and improved). https://code.org/curriculum/how-ai-works