Why Multiple Metrics?

Different metrics tell you different things:

Accuracy: Overall correctness
Precision: When you say “positive”, how often are you right?
Recall: How many positives did you catch?
F1: Balance between precision and recall
ROC AUC: How well the model separates classes

In many problems, recall might matter more than accuracy. Sometimes precision is key. It’s good to inspect several metrics at once.

Using cross_validate

cross_validate lets you get multiple metrics in one call:

🐍 Python Multiple Metrics with cross_validate

📟 Console Output

Run code to see output...

Understanding ROC AUC

ROC AUC (Receiver Operating Characteristic Area Under Curve) is a bit different:

Range: 0 to 1
0.5: Random guessing (no better than chance)
1.0: Perfect separation
> 0.7: Generally considered good
> 0.8: Very good
> 0.9: Excellent

What it measures: How well the model can distinguish between classes. Higher = better separation.

🐍 Python Interpret ROC AUC

📟 Console Output

Run code to see output...

Which Metric Matters Most?

Different problems need different metrics:

**Priority: Recall > Precision** In medical diagnosis, missing a disease (false negative) is worse than a false alarm (false positive). - High recall = catch most diseases - Some false alarms are acceptable - Example: Cancer screening - better to test someone who doesn't have it than miss someone who does

Quick Question: Medical Example

For our breast cancer dataset, which metric matters most?

Think about it:

False negative = missing cancer = very bad
False positive = false alarm = bad but not catastrophic

Answer: Recall matters most. We want to catch all cancers, even if we get some false alarms.

🐍 Python Metric Priority for Medical Problem

📟 Console Output

Run code to see output...

Comparing All Metrics

Let’s see all metrics side by side:

🐍 Python Metric Summary Table

📟 Console Output

Run code to see output...

What to Look For

When evaluating multiple metrics:

Consistency: Do all metrics tell the same story?
- If accuracy is high but recall is low, model might be biased
Stability: Low standard deviation = stable model
- High std = model performance varies a lot
Problem-specific: Choose metrics that match your problem
- Medical: prioritize recall
- Spam: prioritize precision
- Balanced: use F1 or accuracy
ROC AUC: Good for overall model quality
- Works even with imbalanced classes
- Shows separation ability

Key Takeaways

Before moving forward:

Multiple metrics give full picture - Don’t rely on just accuracy
Choose metrics for your problem - Medical needs recall, spam needs precision
ROC AUC shows separation - Good overall quality metric
cross_validate is efficient - Get all metrics in one call

What’s Next?

In the final page, we’ll compare two models fairly using cross-validation. You’ll see how to evaluate multiple models and choose the best one.

Progress 86%

Page 6 of 7

← Previous → Next

Sign In