Intermediate 25 min

The Problem with One Split

One random split can be “lucky” or “unlucky.” You might get:

  • 95% accuracy on one split
  • 92% accuracy on another split
  • 97% accuracy on yet another split

Which one is right? You don’t know. That’s the problem.

What we want: A more stable estimate that doesn’t depend on one random split.

Solution: Cross-validation - test multiple splits and average the results.

How K-Fold Cross-Validation Works

Here’s the idea:

  1. Split data into k folds (usually k=5 or k=10)
  2. Train on k-1 folds, test on the remaining fold
  3. Repeat k times, each time using a different fold as test
  4. You get k scores
  5. Average them for a more stable estimate
Split Into 5 Folds Evaluate Repeat 5x Full Dataset Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Train on Folds 2-5 Test on Fold 1 Score 1 Average Scores

Using cross_val_score

Scikit-learn makes this easy with cross_val_score:

🐍 Python Cross-Validation with cross_val_score
📟 Console Output
Run code to see output...

What These Numbers Mean

Individual scores: Performance on each fold

  • Shows variation across different splits
  • If scores vary a lot (high std), model is unstable
  • If scores are similar (low std), model is stable

Mean score: Average performance

  • More reliable than a single split
  • Better estimate of true performance

Standard deviation: How much scores vary

  • Low std = stable model
  • High std = model performance depends on which data it sees

Try Different K Values

Try it yourself: Change cv=5 to cv=10 and see if scores become more stable:

🐍 Python Compare Different K Values
📟 Console Output
Run code to see output...

What to notice: More folds (higher k) usually means:

  • More stable estimates (lower std)
  • But more computation time
  • Common choices: k=5 or k=10

Why Cross-Validation Is Better

Single split:

  • One number: 95% accuracy
  • No idea if that’s typical or unusual
  • Can’t see variance

Cross-validation:

  • Multiple scores: [0.94, 0.96, 0.95, 0.93, 0.95]
  • Mean: 0.946 (more reliable)
  • Std: 0.011 (shows stability)
  • Confidence interval: [0.924, 0.968]

Much more informative.

Stratified K-Fold

For classification, use stratified k-fold to keep class proportions:

🐍 Python Stratified Cross-Validation
📟 Console Output
Run code to see output...

Why stratified? Ensures each fold has the same class distribution as the full dataset. Important for imbalanced datasets.

Key Takeaways

Before moving forward:

  1. One split is unreliable - Can be lucky or unlucky
  2. Cross-validation tests multiple splits - More stable estimate
  3. Mean and std matter - Both tell you something
  4. Stratified for classification - Keeps class proportions

What’s Next?

In the next page, we’ll use cross_validate to get multiple metrics at once. This lets you see accuracy, precision, recall, F1, and ROC AUC all together.