Intermediate 25 min

Measure the mistake

Loss answers: how wrong was that prediction?

For one example:

actual = 1          # passed
prediction = 0.7    # model probability

loss = (actual - prediction) ** 2
print(loss)  # 0.09
SituationLoss feel
prediction ≈ actualsmall loss
prediction far from actuallarge loss
training goalpush loss down over time

We use squared error because it is easy and punishes big misses harder. Bigger models use other losses (cross-entropy is common for classification)—the story is the same: quantify wrongness.

Training in plain language

  1. Make a prediction (forward pass)
  2. Compare to the real label (loss)
  3. Nudge weight and bias a little so next time is closer
  4. Repeat for every row, many epochs

No calculus lecture required. The model checks error and adjusts.

One training iteration

This animated concept requires JavaScript to be enabled.

Frames:

  1. Forward pass: hours in, probability out.

    Forward pass: hours in, probability out.

  2. Loss = (actual - prediction)² for each row.

    Loss = (actual - prediction)² for each row.

  3. Subtract a small step from weight and bias based on error.

    Subtract a small step from weight and bias based on error.

next row Epoch Each row Forward Loss Update

Order the training steps

Drag each card into the step where it belongs.

The training loop (single input)

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

data = [
    (1, 0),
    (2, 0),
    (3, 1),
    (4, 1),
]

weight = 0.1
bias = 0.0
learning_rate = 0.1

for epoch in range(1000):
    total_loss = 0

    for study_hours, actual in data:
        raw_output = study_hours * weight + bias
        prediction = sigmoid(raw_output)

        error = prediction - actual
        loss = error ** 2
        total_loss += loss

        weight -= learning_rate * error * study_hours
        bias -= learning_rate * error

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {total_loss:.4f}")

print("Final weight:", weight)
print("Final bias:", bias)

Reading the update lines:

  • error = prediction - actual — positive means we overshot the label
  • weight -= learning_rate * error * study_hours — study hours scale how much this row should push weight
  • bias -= learning_rate * error — bias always gets nudged by the error directly

This is a minimal gradient-style update. Full backpropagation generalizes the same idea to deep nets.

Run it (may take a second)

🐍 Python Train one neuron (1000 epochs)
📟 Console Output
Run code to see output...

Exercise 3 — learning rate

Change learning_rate from 0.1 to 0.01 (fewer epochs is fine for a quick test).

Question: Does loss drop faster or slower? What happens if learning rate is huge (try 1.0 once)?

Answer

Smaller rate → slower but steadier steps. A huge rate can overshoot: loss may bounce or blow up because weights jump past a good spot.

Exercise 4 — starting weight

Start with weight = 1.0 instead of 0.1.

Question: Does the model still converge? Does it need more or fewer epochs?

Answer

Usually it still converges—the updates move weight toward a good value. A bad start might need more epochs or look noisier early on. That is why people care about initialization in big models.

Key takeaways

  1. Loss = squared gap between label and prediction.
  2. Training loops over data and epochs, updating weight and bias each row.
  3. Learning rate controls step size—too big is unstable, too small is slow.

Next: use the trained weights to predict new study hours and sanity-check behavior.