Jul 8, 2026

Intermediate 25 min

Backpropagation without the scare factor

Backpropagation means: send the error backward through the network so each weight learns how much it contributed to the mistake.

In our tiny model there is only one layer, so “backward” looks like:

error = prediction - actual
weight1 -= learning_rate × error × study_hours
weight2 -= learning_rate × error × sleep_hours
bias    -= learning_rate × error

That error × input pattern is the seed of full backprop. Deep nets chain the same idea with calculus (chain rule) so every layer gets a fair share of blame.

Backprop is not magic. It is bookkeeping: who caused how much error, nudge them opposite that direction.

What this tutorial does not cover

Being upfront builds trust. Real stacks usually add:

More layers and neurons — depth creates feature hierarchies
Better losses — cross-entropy for classification, MSE for regression
Optimizers — Adam, momentum, weight decay
Regularization — dropout, early stopping, data augmentation
Scale — batching, GPUs, distributed training
Frameworks — PyTorch, TensorFlow, JAX

None of that changes the core loop you practiced:

Predict → measure error → adjust → repeat.

Final recap

You built a tiny neural network by hand.

Idea	You did it
Neuron	`input × weight + bias`
Probability	sigmoid
Decision	threshold 0.5
Loss	squared error
Training	epoch loop updating weights
Multi-input	`w1`, `w2`
Backprop preview	`error × input` updates

Runnable code: githubRepo/2026/05/20/tiny-neural-network-by-hand/.

Final quiz — test your knowledge

Knowledge Check

This interactive quiz requires JavaScript to be enabled.

Question 1: In our tutorial, what is the main purpose of the bias term?

A. To store the dataset labels
B. To shift the raw output up or down independent of input size (Correct)
C. To replace sigmoid during training
D. To count how many epochs have run

Explanation: Bias adds a constant offset so the model can fit patterns that do not cross zero the way weight-only scaling would.

Question 2: A student studies 2.5 hours. After training, sigmoid gives 0.52. What label do we assign with threshold 0.5?

A. Fail
B. Pass (Correct)
C. Retry training
D. Unknown

Explanation: 0.52 ≥ 0.5, so we label Pass—even though the probability shows the example is near the boundary.

Question 3: Why does training try to reduce loss?

A. So the dataset gets smaller each epoch
B. So predictions move closer to actual labels on average (Correct)
C. So we can remove sigmoid
D. So learning rate increases automatically

Explanation: Lower loss means smaller squared errors—predictions align better with truth.

Question 4: When we added sleep hours, how did weight2 get updated each row?

A. weight2 += learning_rate × actual
B. weight2 -= learning_rate × error × sleep_hours (Correct)
C. weight2 = sigmoid(sleep_hours)
D. weight2 is frozen; only weight1 trains

Explanation: Each weight is nudged by error scaled by the input that used that weight—same pattern as weight1 with study_hours.

Question 5: Which statement is most accurate about our pass/fail model?

A. It reasons about student motivation
B. It applies fixed math with learned numbers (Correct)
C. It only works with exactly four students in the world
D. It cannot run in Python without PyTorch

Explanation: The model applies learned weights and bias—no reasoning. Four rows were just our teaching dataset; you can add more rows and retrain.

Question 6: What is backpropagation, in one line?

A. A way to download datasets from the cloud
B. Sending error backward to decide how each weight should change (Correct)
C. Replacing loss with accuracy
D. A type of GPU driver

Explanation: Backprop distributes error to weights (and deeper layers in big nets) so updates point toward lower loss.

Question 7: You set learning_rate = 1.0 and loss spikes wildly. What likely happened?

A. Sigmoid broke because Python is slow
B. Updates were too large and overshot good weights (Correct)
C. Bias became unnecessary
D. The model ran out of memory on four rows

Explanation: A huge learning rate takes big steps; weights can overshoot and bounce instead of settling.

Where to go next

Add more rows and features in the repo’s examples/ folder
Read about logistic regression—you basically built one
Try a framework tutorial next, but trace one batch: loss.backward()

When you finish the quiz, head to the completion page for a summary and project ideas.

Progress 100%

Page 7 of 7

← Previous → Next

Sign In

Backpropagation without the scare factor

What this tutorial does not cover

Final recap

Final quiz — test your knowledge

Where to go next