Feb 1, 2026

Intermediate 25 min

The Problem with One Split

One random split can be “lucky” or “unlucky.” You might get:

95% accuracy on one split
92% accuracy on another split
97% accuracy on yet another split

Which one is right? You don’t know. That’s the problem.

What we want: A more stable estimate that doesn’t depend on one random split.

Solution: Cross-validation - test multiple splits and average the results.

How K-Fold Cross-Validation Works

Here’s the idea:

Split data into k folds (usually k=5 or k=10)
Train on k-1 folds, test on the remaining fold
Repeat k times, each time using a different fold as test
You get k scores
Average them for a more stable estimate

Using cross_val_score

Scikit-learn makes this easy with cross_val_score:

🐍 Python Cross-Validation with cross_val_score

📟 Console Output

Run code to see output...

What These Numbers Mean

Individual scores: Performance on each fold

Shows variation across different splits
If scores vary a lot (high std), model is unstable
If scores are similar (low std), model is stable

Mean score: Average performance

More reliable than a single split
Better estimate of true performance

Standard deviation: How much scores vary

Low std = stable model
High std = model performance depends on which data it sees

Try Different K Values

Try it yourself: Change cv=5 to cv=10 and see if scores become more stable:

🐍 Python Compare Different K Values

📟 Console Output

Run code to see output...

What to notice: More folds (higher k) usually means:

More stable estimates (lower std)
But more computation time
Common choices: k=5 or k=10

Why Cross-Validation Is Better

Single split:

One number: 95% accuracy
No idea if that’s typical or unusual
Can’t see variance

Cross-validation:

Multiple scores: [0.94, 0.96, 0.95, 0.93, 0.95]
Mean: 0.946 (more reliable)
Std: 0.011 (shows stability)
Confidence interval: [0.924, 0.968]

Much more informative.

Stratified K-Fold

For classification, use stratified k-fold to keep class proportions:

🐍 Python Stratified Cross-Validation

📟 Console Output

Run code to see output...

Why stratified? Ensures each fold has the same class distribution as the full dataset. Important for imbalanced datasets.

Key Takeaways

Before moving forward:

One split is unreliable - Can be lucky or unlucky
Cross-validation tests multiple splits - More stable estimate
Mean and std matter - Both tell you something
Stratified for classification - Keeps class proportions

What’s Next?

In the next page, we’ll use cross_validate to get multiple metrics at once. This lets you see accuracy, precision, recall, F1, and ROC AUC all together.

Progress 71%

Page 5 of 7

← Previous → Next

Sign In