Comparing Two Models
Let’s compare Logistic Regression with Random Forest. The key: use the same cross-validation splits for both models.
Why Same CV Splits Matter
Using the same splits ensures:
- Fair comparison - Both models see the same data splits
- Reduced variance - Differences are due to models, not data splits
- Statistical validity - Can compare means properly
Without same splits: One model might get “easier” test folds, making comparison unfair.
Try Different Hyperparameters
Try it yourself: Change Random Forest hyperparameters and see how metrics change:
Common Pitfalls and Best Practices
❌ Don’t Do This
1. Look at test set many times
- Every time you check test set, you’re “using” it
- Eventually, you’re overfitting to test set
- Solution: Only look at test set once, at the very end
2. Tune hyperparameters on test set
- Test set is for final evaluation only
- Tuning on test set = data leakage
- Solution: Use cross-validation for tuning, test set only for final check
3. Ignore class imbalance
- Accuracy can be misleading with imbalanced classes
- Solution: Use stratified splits and multiple metrics
4. Report only accuracy
- Accuracy doesn’t tell the full story
- Solution: Report precision, recall, F1, and confusion matrix
5. Forget random seeds
- Results won’t be reproducible
- Solution: Always set
random_statefor reproducibility
✅ Do This Instead
1. Use cross-validation for model selection
- More reliable than single split
- Use same CV splits for fair comparison
2. Keep test set untouched until the end
- Test set is for final evaluation only
- Use validation set or CV for tuning
3. Use stratified splits for classification
- Keeps class proportions
- Important for imbalanced datasets
4. Report multiple metrics
- At minimum: accuracy, precision, recall, F1
- Add ROC AUC for binary classification
- Include confusion matrix
5. Set random seeds
- Makes results reproducible
- Use
random_state=42(or any number)
Evaluation Checklist
Before deploying a model, check:
- Used cross-validation (not just one split)
- Used stratified splits (for classification)
- Reported multiple metrics (not just accuracy)
- Compared models with same CV splits
- Didn’t tune on test set
- Set random seeds for reproducibility
- Checked confusion matrix
- Considered problem-specific metrics (recall for medical, precision for spam)
Summary
You’ve learned:
- Train/test split - Basic evaluation, but can be unstable
- Confusion matrix - Shows what types of errors your model makes
- Cross-validation - More stable than single split, tests multiple times
- Multiple metrics - Accuracy, precision, recall, F1, ROC AUC all tell different stories
- Model comparison - Use same CV splits for fair comparison
- Best practices - Avoid common pitfalls, use proper evaluation methods
Next Steps
Now that you know proper evaluation:
- Hyperparameter tuning - Use GridSearchCV with cross-validation
- Imbalanced datasets - Learn about class weights and sampling
- Preprocessing pipelines - Combine preprocessing with evaluation
- Production evaluation - Monitor models in production
Test Your Knowledge
Let’s see what you’ve learned:
Congratulations! 🎉
You’ve completed the tutorial on Model Evaluation and Cross-Validation. You now know how to:
- Split data correctly
- Use cross-validation for stable estimates
- Choose and interpret multiple metrics
- Compare models fairly
- Avoid common evaluation pitfalls
Keep practicing with different datasets and problems. The more you evaluate models, the better you’ll get at understanding their performance.