Intermediate 25 min

Why Multiple Metrics?

Different metrics tell you different things:

  • Accuracy: Overall correctness
  • Precision: When you say “positive”, how often are you right?
  • Recall: How many positives did you catch?
  • F1: Balance between precision and recall
  • ROC AUC: How well the model separates classes

In many problems, recall might matter more than accuracy. Sometimes precision is key. It’s good to inspect several metrics at once.

Using cross_validate

cross_validate lets you get multiple metrics in one call:

🐍 Python Multiple Metrics with cross_validate
📟 Console Output
Run code to see output...

Understanding ROC AUC

ROC AUC (Receiver Operating Characteristic Area Under Curve) is a bit different:

  • Range: 0 to 1
  • 0.5: Random guessing (no better than chance)
  • 1.0: Perfect separation
  • > 0.7: Generally considered good
  • > 0.8: Very good
  • > 0.9: Excellent

What it measures: How well the model can distinguish between classes. Higher = better separation.

🐍 Python Interpret ROC AUC
📟 Console Output
Run code to see output...

Which Metric Matters Most?

Different problems need different metrics:

**Priority: Recall > Precision** In medical diagnosis, missing a disease (false negative) is worse than a false alarm (false positive). - High recall = catch most diseases - Some false alarms are acceptable - Example: Cancer screening - better to test someone who doesn't have it than miss someone who does

Quick Question: Medical Example

For our breast cancer dataset, which metric matters most?

Think about it:

  • False negative = missing cancer = very bad
  • False positive = false alarm = bad but not catastrophic

Answer: Recall matters most. We want to catch all cancers, even if we get some false alarms.

🐍 Python Metric Priority for Medical Problem
📟 Console Output
Run code to see output...

Comparing All Metrics

Let’s see all metrics side by side:

🐍 Python Metric Summary Table
📟 Console Output
Run code to see output...

What to Look For

When evaluating multiple metrics:

  1. Consistency: Do all metrics tell the same story?

    • If accuracy is high but recall is low, model might be biased
  2. Stability: Low standard deviation = stable model

    • High std = model performance varies a lot
  3. Problem-specific: Choose metrics that match your problem

    • Medical: prioritize recall
    • Spam: prioritize precision
    • Balanced: use F1 or accuracy
  4. ROC AUC: Good for overall model quality

    • Works even with imbalanced classes
    • Shows separation ability

Key Takeaways

Before moving forward:

  1. Multiple metrics give full picture - Don’t rely on just accuracy
  2. Choose metrics for your problem - Medical needs recall, spam needs precision
  3. ROC AUC shows separation - Good overall quality metric
  4. cross_validate is efficient - Get all metrics in one call

What’s Next?

In the final page, we’ll compare two models fairly using cross-validation. You’ll see how to evaluate multiple models and choose the best one.