Privacy-by-Design AIoT: Practical Patterns for Tiny Federated Learning on Devices
You have a wearable that tracks activity. It collects accelerometer data every second. That’s 86,400 readings per day. Multiply by thousands of users. That’s a lot of data.
You want to improve the activity recognition model. But you can’t send all that raw data to the cloud. It’s expensive. It’s slow. And it’s a privacy nightmare.
Federated learning solves this. Devices train locally. They only send model updates, not raw data. The cloud aggregates updates. Everyone benefits from better models. No one’s personal data leaves their device.
This article shows how to make federated learning work on real IoT devices. How to handle unreliable networks. How to keep updates small. How to deal with devices that see different data. How to add privacy layers on top.
Why Raw Sensor Data Should Not Always Leave the Device
Let’s start with concrete examples. Why does this matter?
Wearable Health Sensors
A fitness tracker measures heart rate, steps, sleep patterns. This data is personal. It reveals health conditions. It shows where someone goes. It shows when they’re home.
If you send this to the cloud, you’re creating a privacy risk. Even if you anonymize it, patterns can be re-identified. Location data plus activity patterns can identify individuals.
But you still want to improve the model. You want better sleep detection. You want better activity classification. Federated learning lets you do both.
Smart Home Microphones
Voice assistants listen for wake words. They process audio locally. But sometimes they send audio to the cloud for better recognition.
That audio can contain private conversations. It can reveal family dynamics. It can reveal health issues. It can reveal financial information.
If you can train models without sending raw audio, you reduce risk. Federated learning lets devices improve wake word detection without uploading conversations.
Industrial Logs That Reveal Production Secrets
A factory has sensors on production lines. The data shows production rates. It shows quality metrics. It shows when equipment fails.
This data is valuable. Competitors would pay for it. If you send it to the cloud, you’re creating a security risk. Even encrypted, metadata can leak information.
Federated learning lets you improve predictive maintenance models without exposing production secrets.
Practical Concerns
Compliance and regulation: GDPR, HIPAA, and other regulations require data minimization. If you can achieve your goal without collecting raw data, you should. Federated learning helps.
Customer trust: Users are more likely to adopt devices that keep data local. Privacy is a selling point. “Your data never leaves your device” is a strong message.
Bandwidth and cost: Sending raw sensor data is expensive. A single device might generate 10MB per day. Multiply by thousands of devices. That’s terabytes per month. Cellular data plans cost money. Federated learning sends only model updates, which are much smaller.
Latency: Sending data to the cloud and waiting for results adds delay. For real-time applications, local inference is faster. Federated learning improves local models without requiring cloud inference.
The Simple Rule
Move models, not data, when you can.
If you can train on device and only send updates, do that. If you can aggregate updates without seeing raw data, do that. If you can improve models without collecting personal information, do that.
This isn’t always possible. Some models are too large. Some devices are too weak. Some problems require centralized data. But for many AIoT applications, federated learning works.
Basics of TinyML + Federated Learning for IoT
Let’s define terms. Then we’ll see how they fit together.
What is TinyML?
TinyML means small models that run on microcontrollers and edge devices. These devices have limited resources:
- Memory: 32KB to 512KB RAM
- Storage: 128KB to 2MB flash
- CPU: 80MHz to 240MHz ARM Cortex-M
- Power: Battery-powered, needs to last months
TinyML models are optimized for these constraints. They use quantization (8-bit or 4-bit weights). They use pruning (remove unnecessary connections). They use efficient architectures (MobileNet, EfficientNet).
A typical TinyML model might be 50KB. It can run inference in milliseconds. It uses microjoules of energy.
What is Federated Learning?
Federated learning is a training method where many devices train their copy of a model locally and only share updates, not raw data.
Here’s how it works:
- Server sends global model: The coordinator sends the current global model to selected devices.
- Devices train locally: Each device trains on its local data for a few epochs.
- Devices send weight updates: Devices compute gradients or weight deltas and send them to the server.
- Server aggregates: The server combines updates from multiple devices using FedAvg or similar.
- Server updates global model: The new global model is sent back to devices.
- Repeat: This process repeats for multiple rounds.
The key insight: devices never send raw data. They only send model updates. The server never sees individual data points. It only sees aggregated updates.
The Simple FL Loop
Here’s a concrete example:
# Pseudocode for federated learning round
# 1. Server selects devices
selected_devices = select_devices(num_clients=10)
# 2. Server sends global model
global_model_weights = get_global_model()
for device in selected_devices:
send_model(device, global_model_weights)
# 3. Devices train locally
device_updates = []
for device in selected_devices:
local_data = device.get_local_data()
local_model = load_model(global_model_weights)
# Train for a few epochs
for epoch in range(local_epochs):
train_step(local_model, local_data)
# Compute update (difference from global model)
update = compute_update(local_model, global_model_weights)
device_updates.append(update)
# 4. Server aggregates
aggregated_update = aggregate_updates(device_updates) # FedAvg
# 5. Server updates global model
new_global_model = global_model_weights + aggregated_update
save_global_model(new_global_model)
This is one round. You repeat this many times. Each round, the global model improves based on data from all participating devices.
Constraints Specific to AIoT
Federated learning on IoT devices faces unique challenges:
Unreliable connectivity: Devices go offline. Cellular networks drop. Wi-Fi disconnects. You can’t assume devices are always online. You need to handle partial participation.
Very small RAM/flash: Devices have limited memory. You can’t load large models. You can’t store large datasets. You need to train incrementally, one mini-batch at a time.
Battery and CPU limits: Training uses energy. If you train too much, the battery dies. You need to limit training frequency and duration.
Heterogeneous devices: Not all devices are the same. Some have more memory. Some have faster CPUs. Some have better connectivity. You need to handle this heterogeneity.
Non-IID data: Each device sees a different slice of the world. A wearable in one person’s pocket sees different patterns than another. This makes aggregation harder.
We’ll address each of these throughout the article.
When to Use What: Central Training vs Federated vs Pure On-Device
Not every problem needs federated learning. Here’s when to use each approach.
Decision Tree
If data is low volume and not sensitive: Central training is fine. Collect data. Train in the cloud. Deploy models. This is simpler and faster.
If data is sensitive but volume is manageable: Use federated learning or on-device training. Keep data local. Only send updates.
If connectivity is rare and data is very sensitive: Use mostly on-device training with periodic weight sync when possible. Train locally. Sync occasionally when online.
If devices are too weak to train: Use central training with heavy anonymization. Or use a gateway that trains on behalf of devices.
Scenario 1: Smart Meters
Smart meters measure electricity usage. The data shows when people are home. It shows daily routines. It’s sensitive.
But smart meters have good connectivity. They’re always online. They have enough compute to train small models.
Solution: Federated learning. Meters train locally on usage patterns. They send updates for load forecasting. The cloud never sees individual usage data.
Scenario 2: Wearables
Wearables track activity, heart rate, sleep. Very sensitive data. Limited battery. Intermittent connectivity.
Solution: Hybrid approach. Train on-device when possible. Participate in federated rounds when connected and battery is sufficient. Otherwise, use local-only models.
Scenario 3: Industrial IIoT
Factory sensors monitor equipment. Data reveals production secrets. But sensors are always online. They have enough compute.
Solution: Federated learning with secure aggregation. Sensors train locally. Updates are encrypted. Server sees only aggregated results.
Reference Architecture: AIoT FL in the Real World
Here’s a concrete architecture that works in production.
Coordination Service (Cloud / On-Prem)
The coordinator manages the federated learning process.
Model registry:
- Stores model versions
- Tracks which devices have which versions
- Manages model metadata (architecture, hyperparameters, performance)
FL orchestration:
- Selects devices for each round (client sampling)
- Manages round timing
- Handles device communication
- Aggregates updates
Example structure:
class FLCoordinator:
def __init__(self):
self.model_registry = ModelRegistry()
self.device_registry = DeviceRegistry()
self.aggregator = FedAvgAggregator()
def start_round(self, round_num):
# Select devices
selected = self.select_devices(num_clients=10)
# Send global model
global_model = self.model_registry.get_latest()
for device in selected:
self.send_model(device, global_model)
# Wait for updates
updates = self.collect_updates(selected, timeout=3600)
# Aggregate
aggregated = self.aggregator.aggregate(updates)
# Update global model
new_model = self.update_model(global_model, aggregated)
self.model_registry.save(new_model)
Edge Clients (Devices or Local Gateways)
Devices participate in federated learning.
TinyML runtime:
- Loads models (TensorFlow Lite Micro, CMSIS-NN)
- Runs inference
- Runs training (if capable)
Local training loop:
- Loads local data
- Trains for a few epochs
- Computes gradients or weight deltas
Communication worker:
- Sends updates via MQTT/HTTP
- Receives new global models
- Handles retries and backoff
Example structure:
# Device-side pseudocode
class FLClient:
def __init__(self, device_id):
self.device_id = device_id
self.model = None
self.local_data = []
def participate_in_round(self, global_model_weights):
# Load global model
self.model = load_model(global_model_weights)
# Train locally
for epoch in range(3): # 3 local epochs
for batch in self.local_data:
train_step(self.model, batch)
# Compute update
update = compute_weight_delta(self.model, global_model_weights)
# Send update
send_update(self.device_id, update)
Optional Gateway Tier
For very tiny devices, training happens on a nearby gateway, not on the sensor node itself.
Why use a gateway:
- Sensor nodes are too weak (8-bit microcontrollers)
- Gateway has more compute (Raspberry Pi, edge server)
- Gateway aggregates data from multiple sensors
- Gateway participates in federated learning on behalf of sensors
Example: A factory has 100 vibration sensors. Each sensor is an 8-bit MCU. They connect to a gateway via LoRaWAN. The gateway collects data from all sensors. The gateway trains a model. The gateway participates in federated learning.
Client Sampling
Not every device needs to participate in every round. This reduces communication and improves scalability.
Sampling strategies:
- Random sampling: Pick N random devices
- Stratified sampling: Pick devices from different groups (geographic, device type)
- Availability-based: Only pick devices that are online
- Performance-based: Prefer devices with good connectivity and compute
Example: You have 10,000 devices. Each round, select 100 devices. This means each device participates roughly once per 100 rounds. That’s manageable.
Making FL Communication Affordable on Constrained Networks
Model updates can be large. A simple neural network might have 100,000 parameters. At 32-bit floats, that’s 400KB. Over cellular, that’s expensive.
You need to compress updates.
Weight Quantization
Reduce precision of weights.
8-bit quantization: Convert 32-bit floats to 8-bit integers. This reduces size by 4x. Accuracy loss is usually small (1-2%).
4-bit quantization: Even more aggressive. Reduces size by 8x. Accuracy loss can be larger (3-5%).
Example:
def quantize_weights(weights, bits=8):
# Find min and max
w_min = weights.min()
w_max = weights.max()
# Scale to [0, 2^bits - 1]
scale = (w_max - w_min) / (2**bits - 1)
quantized = ((weights - w_min) / scale).round().astype(np.uint8)
return quantized, w_min, scale
# Original: 100KB (1000 params × 4 bytes)
# Quantized: 25KB (1000 params × 1 byte)
Sparse Updates
Send only changed weights.
Threshold-based: Only send weights that changed by more than a threshold. If a weight changed by 0.001, ignore it. If it changed by 0.1, send it.
Top-K: Send only the K largest changes. This ensures you send the most important updates.
Example:
def compute_sparse_update(old_weights, new_weights, threshold=0.01):
delta = new_weights - old_weights
mask = np.abs(delta) > threshold
sparse_delta = delta * mask
# Send only non-zero values with indices
indices = np.where(mask)[0]
values = sparse_delta[mask]
return indices, values
# Original update: 100KB
# Sparse update: 10KB (90% of weights didn't change significantly)
Client Sampling
Only some devices participate in each round. This reduces total communication.
Example: 10,000 devices, select 100 per round. Instead of 10,000 × 400KB = 4GB per round, you have 100 × 400KB = 40MB per round. That’s 100x reduction.
Keeping Messages Under Target Size
Set a target. For example, keep updates under 10KB.
Techniques:
- Quantize to 8-bit or 4-bit
- Use sparse updates
- Compress with gzip
- Send only top layers (fine-tune only last few layers)
Example:
def prepare_update(weights, target_size_kb=10):
# Try quantization first
quantized, min_val, scale = quantize_weights(weights, bits=8)
size_kb = quantized.nbytes / 1024
if size_kb > target_size_kb:
# Try sparse
sparse_indices, sparse_values = compute_sparse_update(...)
size_kb = (sparse_indices.nbytes + sparse_values.nbytes) / 1024
if size_kb > target_size_kb:
# Try 4-bit
quantized, min_val, scale = quantize_weights(weights, bits=4)
size_kb = quantized.nbytes / 1024
return quantized, min_val, scale
Round Frequency
Define how often rounds happen.
Once per day at night: Devices train during off-peak hours. Updates are sent when networks are less congested. Battery is recharged.
Once per week: For devices with very limited connectivity or battery.
Event-triggered: When enough devices have new data, trigger a round.
Example config:
round_schedule:
frequency: "daily"
time: "02:00 UTC" # 2 AM
min_participants: 50
max_participants: 200
Common IoT Links
Different networks have different characteristics.
LTE-M: Good bandwidth (1 Mbps), low latency, but costs money per MB. Use for important updates.
NB-IoT: Very low bandwidth (250 Kbps), very low power, very cheap. Use for small, infrequent updates.
LoRaWAN: Very long range, very low bandwidth (50 Kbps), very low power, free (if using public network). Use for non-time-sensitive updates.
Wi-Fi: High bandwidth, low latency, free (if available). Use when available.
Choose the right network for your use case. For federated learning, you might use Wi-Fi when available, fall back to LTE-M for important rounds, and use NB-IoT for status updates.
Handling Non-IID Data and Device Heterogeneity
Non-IID means each device sees a different slice of the world. This is the norm in federated learning, not the exception.
What Non-IID Means
A wearable on a runner’s wrist sees different patterns than one on an office worker’s wrist. A sensor in a factory in Texas sees different patterns than one in a factory in Germany.
This causes problems:
- Devices learn different things
- Aggregation can be unstable
- Global model might not work well for any device
Practical Patterns
Personalization layers:
Split the model into a shared base and a local head. The base learns general patterns. The head learns device-specific patterns.
# Shared base (federated)
base_model = create_base_model() # Trained via FL
# Local head (device-specific)
head_model = create_head_model() # Trained locally only
# Combined
def forward(x):
features = base_model(x)
output = head_model(features)
return output
The base model is updated via federated learning. The head model is trained only on local data and never shared.
Fine-tuning per cluster:
Group devices into clusters. Train a model per cluster.
# Cluster devices by similarity
clusters = cluster_devices(devices, features=['location', 'device_type'])
# Train one model per cluster
for cluster in clusters:
cluster_model = train_federated(cluster.devices)
deploy_to_cluster(cluster, cluster_model)
Devices in the same cluster see similar data. The cluster model works better for them.
Per-device calibration:
Keep the global model but add a calibration step on device.
# On device
global_model = load_global_model()
local_calibration = calibrate_on_local_data(global_model, local_data)
# Use calibrated model for inference
predictions = local_calibration(input)
Calibration adjusts thresholds or adds a small bias based on local data. It doesn’t change the model structure, just the decision boundaries.
Measuring Per-Segment Performance
Don’t just measure global accuracy. Measure accuracy per device, per cluster, per region.
# Global metrics
global_accuracy = compute_accuracy(global_model, all_test_data)
# Per-cluster metrics
for cluster in clusters:
cluster_accuracy = compute_accuracy(cluster_model, cluster.test_data)
print(f"Cluster {cluster.id}: {cluster_accuracy}")
# Per-device metrics (if you have test data)
for device in devices:
device_accuracy = compute_accuracy(device.model, device.test_data)
track_metric(device.id, device_accuracy)
If global accuracy is 90% but one cluster is 60%, you have a problem. The global model doesn’t work for that cluster. You need cluster-specific models.
Security and Privacy Layers on Top of FL
Federated learning is not automatically private. Weight updates can still leak information.
How FL Can Leak Information
Gradient inversion attacks: An attacker can reconstruct training data from gradients. This is especially true for small batches.
Membership inference: An attacker can determine if a specific data point was used in training.
Model inversion: An attacker can extract information about training data from the model itself.
Update analysis: By analyzing updates over time, an attacker can infer patterns in local data.
Secure Aggregation
The server sees only aggregated updates, not individual updates.
Cryptographic aggregation: Use secure multi-party computation. Devices encrypt updates. Server aggregates encrypted updates. Server decrypts only the aggregate.
Dummy updates: Add noise to individual updates. Server aggregates. Noise cancels out in aggregate (if done correctly).
Example:
# Each device adds noise
noisy_update = update + generate_noise()
# Server aggregates
aggregated = mean(noisy_updates)
# Noise cancels out (if noise sums to zero)
# Aggregated ≈ mean(updates)
This is simpler than full cryptographic aggregation but provides some protection.
Differential Privacy
Add noise to updates before sending.
Gaussian noise: Add Gaussian noise to gradients or weights.
def add_differential_privacy(update, epsilon=1.0, delta=1e-5):
# Clip gradients
clipped = clip_gradients(update, max_norm=1.0)
# Compute noise scale
sensitivity = 1.0 # Maximum change from one data point
sigma = sensitivity / epsilon
# Add noise
noise = np.random.normal(0, sigma, size=clipped.shape)
noisy_update = clipped + noise
return noisy_update
Privacy budget: Track how much privacy you’ve used. After a certain number of rounds, stop participating or reduce noise (which reduces privacy).
Example:
class PrivacyBudget:
def __init__(self, total_epsilon=10.0):
self.total_epsilon = total_epsilon
self.used_epsilon = 0.0
def can_participate(self, round_epsilon=1.0):
return self.used_epsilon + round_epsilon <= self.total_epsilon
def use(self, epsilon):
self.used_epsilon += epsilon
Device Attestation
Verify that updates come from legitimate devices.
Signing updates: Devices sign updates with private keys. Server verifies signatures.
# On device
update = compute_update(...)
signature = sign(update, device_private_key)
send_update(update, signature)
# On server
if verify_signature(update, signature, device_public_key):
accept_update(update)
else:
reject_update("Invalid signature")
Device certificates: Use PKI. Each device has a certificate. Server verifies certificates.
This prevents malicious devices from poisoning the global model.
Simple Threat Model
Honest-but-curious server: Server follows protocol but tries to learn from updates. Solution: secure aggregation, differential privacy.
Compromised device in the fleet: One device is malicious. It sends bad updates. Solution: robust aggregation (median instead of mean), device attestation, outlier detection.
Network attacker: Attacker intercepts updates in transit. Solution: TLS encryption, update signing.
What to Implement First
Start simple:
-
TLS encryption: Encrypt all communication. This is standard and easy.
-
Update signing: Sign updates with device keys. Verify on server. This prevents basic attacks.
-
Outlier detection: Reject updates that are too different from others. This prevents poisoning.
-
Gradient clipping: Clip gradients to a maximum norm. This reduces information leakage and improves stability.
Add more sophisticated privacy later:
- Differential privacy (add noise)
- Secure aggregation (cryptographic)
- Privacy budget tracking
Don’t try to implement everything at once. Start with basics. Add privacy layers as needed.
End-to-End Example: Privacy-Aware Activity Recognition from Wearables
Let’s walk through a complete example. Step by step.
Scenario
Devices: Wearables that classify basic activities (walk, sit, run, stand).
Constraint: Raw accelerometer data should never leave the device.
Goal: Improve the model as people use it.
Model: Small CNN that takes accelerometer windows and outputs activity class.
Initial Global Model
Train on a public dataset or seed dataset.
# Train initial model on public dataset
public_data = load_public_activity_dataset()
model = create_cnn_model()
train(model, public_data, epochs=50)
# Convert to TensorFlow Lite for deployment
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Deploy to devices
deploy_to_devices(tflite_model)
This is the baseline. It works but isn’t personalized.
Devices Collect Local Data
Devices collect accelerometer data and labels.
Labels come from:
- User feedback (user corrects misclassifications)
- Implicit labels (if user is stationary for 10 minutes, probably “sit”)
- Context (if it’s 2 AM, probably “sleep” not “run”)
# On device
def collect_local_data():
accelerometer_data = read_accelerometer(window_size=128)
label = get_label() # From user feedback or context
# Store locally
local_dataset.append((accelerometer_data, label))
# Keep only recent data (limited storage)
if len(local_dataset) > 1000:
local_dataset = local_dataset[-1000:]
On-Device Training
Devices train locally when conditions are met.
Conditions:
- Battery is above 50%
- Device is charging or idle
- Enough local data collected (at least 100 samples)
- Not too soon since last training (at least 24 hours)
# On device
def train_locally():
if not should_train():
return
# Load global model
global_model = load_global_model()
local_model = copy_model(global_model)
# Train for a few epochs
for epoch in range(3): # 3 local epochs
# Shuffle data
shuffled = shuffle(local_dataset)
# Train one mini-batch at a time (memory constraint)
for i in range(0, len(shuffled), batch_size=32):
batch = shuffled[i:i+32]
train_step(local_model, batch)
# Compute update
update = compute_weight_delta(local_model, global_model)
# Prepare for sending
return update
Memory constraints: Devices can’t load all data at once. Train one mini-batch at a time. This is slower but works within memory limits.
Federated Round
When enough devices are ready, start a round.
Device side:
# Device checks for round
if check_for_round():
# Download latest global model
global_model = download_global_model()
# Train locally
update = train_locally(global_model)
# Quantize update (reduce size)
quantized_update = quantize_weights(update, bits=8)
# Send update
send_update(device_id, quantized_update, signature)
Server side:
# Coordinator starts round
def start_federated_round():
# Select devices
selected = select_devices(num_clients=50, min_data=100)
# Send global model
global_model = get_global_model()
for device in selected:
send_model(device, global_model)
# Wait for updates (with timeout)
updates = []
for device in selected:
update = wait_for_update(device, timeout=3600)
if update and verify_signature(update):
updates.append(update)
# Aggregate (FedAvg)
aggregated = aggregate_fedavg(updates)
# Update global model
new_model = global_model + aggregated
save_global_model(new_model)
Aggregation
Use FedAvg (Federated Averaging).
def aggregate_fedavg(updates):
# Simple average (all devices weighted equally)
aggregated = mean(updates)
# Or weighted by number of local samples
# weights = [len(u.local_data) for u in updates]
# aggregated = weighted_mean(updates, weights)
return aggregated
Optional: Add noise for differential privacy:
def aggregate_with_dp(updates, epsilon=1.0):
# Clip updates
clipped = [clip_gradients(u, max_norm=1.0) for u in updates]
# Aggregate
aggregated = mean(clipped)
# Add noise
noise_scale = 1.0 / epsilon
noise = np.random.normal(0, noise_scale, size=aggregated.shape)
noisy_aggregated = aggregated + noise
return noisy_aggregated
New Global Model
After aggregation, the new global model is sent back to devices.
OTA update:
# Devices check for new model
if check_for_model_update():
new_model = download_model()
# Verify signature
if verify_signature(new_model):
# Replace old model
save_model(new_model)
# Clear local training state
reset_training_state()
Gradual rollout:
- Deploy to 10% of devices first (canary)
- Monitor metrics (accuracy, battery usage)
- If good, deploy to 50%, then 100%
Complete Flow Summary
-
Initial deployment: Global model trained on public data, deployed to all devices.
-
Local data collection: Devices collect data and labels over weeks/months.
-
Local training: Devices train when conditions are met (battery, data, time).
-
Federated round: Coordinator selects devices, sends model, collects updates.
-
Aggregation: Server aggregates updates (FedAvg, optionally with DP).
-
Model update: New global model deployed to devices via OTA.
-
Repeat: Process repeats periodically (weekly or monthly).
Each round, the model improves. It learns from real usage patterns. But raw data never leaves devices.
Limits and Trade-offs of FL on AIoT
Federated learning isn’t a silver bullet. Here’s where it doesn’t fit.
Devices Too Weak to Train
Some devices can’t train anything. 8-bit microcontrollers with 32KB RAM can’t run training loops.
Alternatives:
- Use a gateway that trains on behalf of devices
- Use central training with heavy anonymization
- Use simpler models that can be trained on slightly more capable devices
Very Unstable Networks
If devices are offline 90% of the time, federated learning is hard. Rounds take too long. Updates are stale.
Alternatives:
- Use mostly on-device training, sync occasionally
- Use a hybrid: train locally, sync when online (not true FL, but works)
- Accept that some devices never participate, use central training for them
Regulatory Requirements That Demand Central Logs
Some regulations require you to log everything. You can’t use federated learning if you must have central logs.
Alternatives:
- Use federated learning for model improvement, but keep logs separately (defeats some privacy benefits)
- Use heavy anonymization on central logs
- Use synthetic data generation instead of real data
When FL Doesn’t Make Sense
Small number of devices: If you have 10 devices, federated learning adds complexity without much benefit. Just collect data and train centrally.
Homogeneous data: If all devices see the same data distribution, federated learning doesn’t help. Central training is simpler.
Very large models: If models are too large to train on devices, federated learning doesn’t work. You need central training or model compression first.
Real-time requirements: If you need model updates in hours, not weeks, federated learning is too slow. Use central training with streaming updates.
Alternative Patterns
Local-only models: Train only on device. Never sync. Each device has its own model. This maximizes privacy but doesn’t benefit from other devices’ data.
Cloud-only with heavy anonymization: Collect data, anonymize heavily, train in cloud. This is simpler but less private.
Hybrid: Use federated learning for most devices, central training for devices that can’t participate. This is a practical middle ground.
Practical Checklist
Use this checklist to get started.
What is Sensitive, and Why?
- Identify sensitive data (health, location, behavior patterns)
- Understand why it’s sensitive (privacy, security, compliance)
- Document what must stay on device vs what can be shared
Can the Device Realistically Train Anything?
- Check device specs (RAM, flash, CPU)
- Estimate model size and training memory requirements
- Test training on device or similar hardware
- If too weak, consider gateway-based training
How Big Can an Update Be?
- Measure model size (number of parameters)
- Calculate update size (with and without quantization)
- Set target size (e.g., 10KB)
- Test compression techniques (quantization, sparse updates)
How Often Can You Afford a Round?
- Consider network costs (cellular data plans)
- Consider battery impact (training uses energy)
- Consider device availability (when are devices online?)
- Set round frequency (daily, weekly, monthly)
What’s Your Story for a Device That Never Participates?
- Define “never participates” (offline for 30 days? 90 days?)
- Decide: use last known model, use global model, or use local-only model
- Plan for devices that join late (how do they get initial model?)
- Plan for devices that leave (how do you handle their data?)
Additional Checklist Items
- Choose aggregation method (FedAvg, weighted, robust)
- Decide on privacy layers (differential privacy, secure aggregation)
- Set up device attestation (signing, certificates)
- Plan rollout strategy (canary, rings, gradual)
- Set up monitoring (participation rate, update quality, model performance)
- Plan for failures (what if aggregation fails? what if update is corrupted?)
Conclusion
Federated learning on AIoT devices is possible. It’s not easy, but it’s practical.
The key is to start simple. Use basic FedAvg. Add quantization to reduce update size. Handle device heterogeneity. Add privacy layers as needed.
The benefits are real: better models without exposing raw data. Lower bandwidth costs. Better user trust. Regulatory compliance.
The challenges are real too: unreliable networks, limited resources, non-IID data. But these can be addressed with the right patterns.
Move models, not data, when you can. That’s the principle. Federated learning is one way to achieve it.
Start with a pilot. Deploy to a few devices. Run a few rounds. Learn what works for your use case. Then scale.
The alternative is collecting everything in the cloud. That works, but it’s expensive, slow, and risky. Federated learning offers a better path for many AIoT applications.
Discussion
Loading comments...