May 22, 2026

Intermediate 25 min

Measure the mistake

Loss answers: how wrong was that prediction?

For one example:

actual = 1          # passed
prediction = 0.7    # model probability

loss = (actual - prediction) ** 2
print(loss)  # 0.09

Situation	Loss feel
prediction ≈ actual	small loss
prediction far from actual	large loss
training goal	push loss down over time

We use squared error because it is easy and punishes big misses harder. Bigger models use other losses (cross-entropy is common for classification)—the story is the same: quantify wrongness.

Training in plain language

Make a prediction (forward pass)
Compare to the real label (loss)
Nudge weight and bias a little so next time is closer
Repeat for every row, many epochs

No calculus lecture required. The model checks error and adjusts.

One training iteration

This animated concept requires JavaScript to be enabled.

Frames:

Forward pass: hours in, probability out.
Loss = (actual - prediction)² for each row.
Subtract a small step from weight and bias based on error.

Order the training steps

Drag each card into the step where it belongs.

Drag and Drop Activity

This interactive activity requires JavaScript to be enabled.

Items:

Compute sigmoid(hours × w + b) → Step 2: Forward
Compute (actual − prediction)² → Step 3: Loss
Adjust weight and bias using error → Step 4: Update
Load study hours and label from dataset → Step 1: Data

Zones:

Step 1: Data
Step 2: Forward
Step 3: Loss
Step 4: Update

The training loop (single input)

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

data = [
    (1, 0),
    (2, 0),
    (3, 1),
    (4, 1),
]

weight = 0.1
bias = 0.0
learning_rate = 0.1

for epoch in range(1000):
    total_loss = 0

    for study_hours, actual in data:
        raw_output = study_hours * weight + bias
        prediction = sigmoid(raw_output)

        error = prediction - actual
        loss = error ** 2
        total_loss += loss

        weight -= learning_rate * error * study_hours
        bias -= learning_rate * error

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {total_loss:.4f}")

print("Final weight:", weight)
print("Final bias:", bias)

Reading the update lines:

error = prediction - actual — positive means we overshot the label
weight -= learning_rate * error * study_hours — study hours scale how much this row should push weight
bias -= learning_rate * error — bias always gets nudged by the error directly

This is a minimal gradient-style update. Full backpropagation generalizes the same idea to deep nets.

Run it (may take a second)

🐍 Python Train one neuron (1000 epochs)

📟 Console Output

Run code to see output...

Exercise 3 — learning rate

Change learning_rate from 0.1 to 0.01 (fewer epochs is fine for a quick test).

Question: Does loss drop faster or slower? What happens if learning rate is huge (try 1.0 once)?

Answer

Smaller rate → slower but steadier steps. A huge rate can overshoot: loss may bounce or blow up because weights jump past a good spot.

Exercise 4 — starting weight

Start with weight = 1.0 instead of 0.1.

Question: Does the model still converge? Does it need more or fewer epochs?

Answer

Usually it still converges—the updates move weight toward a good value. A bad start might need more epochs or look noisier early on. That is why people care about initialization in big models.

Key takeaways

Loss = squared gap between label and prediction.
Training loops over data and epochs, updating weight and bias each row.
Learning rate controls step size—too big is unstable, too small is slow.

Next: use the trained weights to predict new study hours and sanity-check behavior.

Progress 57%

Page 4 of 7

← Previous → Next

Sign In

One training iteration

Frames: