The Problem with One Split
One random split can be “lucky” or “unlucky.” You might get:
- 95% accuracy on one split
- 92% accuracy on another split
- 97% accuracy on yet another split
Which one is right? You don’t know. That’s the problem.
What we want: A more stable estimate that doesn’t depend on one random split.
Solution: Cross-validation - test multiple splits and average the results.
How K-Fold Cross-Validation Works
Here’s the idea:
- Split data into k folds (usually k=5 or k=10)
- Train on k-1 folds, test on the remaining fold
- Repeat k times, each time using a different fold as test
- You get k scores
- Average them for a more stable estimate
Using cross_val_score
Scikit-learn makes this easy with cross_val_score:
What These Numbers Mean
Individual scores: Performance on each fold
- Shows variation across different splits
- If scores vary a lot (high std), model is unstable
- If scores are similar (low std), model is stable
Mean score: Average performance
- More reliable than a single split
- Better estimate of true performance
Standard deviation: How much scores vary
- Low std = stable model
- High std = model performance depends on which data it sees
Try Different K Values
Try it yourself: Change cv=5 to cv=10 and see if scores become more stable:
What to notice: More folds (higher k) usually means:
- More stable estimates (lower std)
- But more computation time
- Common choices: k=5 or k=10
Why Cross-Validation Is Better
Single split:
- One number: 95% accuracy
- No idea if that’s typical or unusual
- Can’t see variance
Cross-validation:
- Multiple scores: [0.94, 0.96, 0.95, 0.93, 0.95]
- Mean: 0.946 (more reliable)
- Std: 0.011 (shows stability)
- Confidence interval: [0.924, 0.968]
Much more informative.
Stratified K-Fold
For classification, use stratified k-fold to keep class proportions:
Why stratified? Ensures each fold has the same class distribution as the full dataset. Important for imbalanced datasets.
Key Takeaways
Before moving forward:
- One split is unreliable - Can be lucky or unlucky
- Cross-validation tests multiple splits - More stable estimate
- Mean and std matter - Both tell you something
- Stratified for classification - Keeps class proportions
What’s Next?
In the next page, we’ll use cross_validate to get multiple metrics at once. This lets you see accuracy, precision, recall, F1, and ROC AUC all together.