Measure the mistake
Loss answers: how wrong was that prediction?
For one example:
actual = 1 # passed
prediction = 0.7 # model probability
loss = (actual - prediction) ** 2
print(loss) # 0.09
| Situation | Loss feel |
|---|---|
| prediction ≈ actual | small loss |
| prediction far from actual | large loss |
| training goal | push loss down over time |
We use squared error because it is easy and punishes big misses harder. Bigger models use other losses (cross-entropy is common for classification)—the story is the same: quantify wrongness.
Training in plain language
- Make a prediction (forward pass)
- Compare to the real label (loss)
- Nudge weight and bias a little so next time is closer
- Repeat for every row, many epochs
No calculus lecture required. The model checks error and adjusts.
One training iteration
This animated concept requires JavaScript to be enabled.
Frames:
-
Forward pass: hours in, probability out.
-
Loss = (actual - prediction)² for each row.
-
Subtract a small step from weight and bias based on error.
Order the training steps
Drag each card into the step where it belongs.
The training loop (single input)
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
data = [
(1, 0),
(2, 0),
(3, 1),
(4, 1),
]
weight = 0.1
bias = 0.0
learning_rate = 0.1
for epoch in range(1000):
total_loss = 0
for study_hours, actual in data:
raw_output = study_hours * weight + bias
prediction = sigmoid(raw_output)
error = prediction - actual
loss = error ** 2
total_loss += loss
weight -= learning_rate * error * study_hours
bias -= learning_rate * error
if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {total_loss:.4f}")
print("Final weight:", weight)
print("Final bias:", bias)
Reading the update lines:
error = prediction - actual— positive means we overshot the labelweight -= learning_rate * error * study_hours— study hours scale how much this row should push weightbias -= learning_rate * error— bias always gets nudged by the error directly
This is a minimal gradient-style update. Full backpropagation generalizes the same idea to deep nets.
Run it (may take a second)
Exercise 3 — learning rate
Change learning_rate from 0.1 to 0.01 (fewer epochs is fine for a quick test).
Question: Does loss drop faster or slower? What happens if learning rate is huge (try 1.0 once)?
Answer
Smaller rate → slower but steadier steps. A huge rate can overshoot: loss may bounce or blow up because weights jump past a good spot.
Exercise 4 — starting weight
Start with weight = 1.0 instead of 0.1.
Question: Does the model still converge? Does it need more or fewer epochs?
Answer
Usually it still converges—the updates move weight toward a good value. A bad start might need more epochs or look noisier early on. That is why people care about initialization in big models.
Key takeaways
- Loss = squared gap between label and prediction.
- Training loops over data and epochs, updating weight and bias each row.
- Learning rate controls step size—too big is unstable, too small is slow.
Next: use the trained weights to predict new study hours and sanity-check behavior.