Designing systems that keep working when things fail
This philosophy grew out of one simple observation:
homes should not require perfect networks, perfect clouds, or perfect controllers to function. My home is far from any of that, I have a WISP (50ft tower) or a satellite system for internet, nothing is guaranteed way out here, not even power.
Automation should add efficiency and intelligence, not fragility. When parts of the system fail, and they will, the house should degrade gracefully, not confuse the humans living in it.
Below are the core principles I use when designing and operating my Home Assistant–based system.
1) Accept that some devices are cloud-bound and plan for failure
Many commercial devices are cloud-based, and their Home Assistant integrations are cloud-based as well. There’s often nothing we can do about that.
When faced with this reality:
-
install the official or approved (HACS) integration
-
control the device using normal HA automations
-
explicitly accept that loss of internet may break the integration or the device itself
What can be done is failure awareness.
If Home Assistant is running locally but internet egress is down, automations should:
-
detect the loss of connectivity or integration failure
-
notify users clearly that the system is degraded
No one should be standing in a room mashing a button, wondering why nothing is happening. A visible or audible “this is currently broken” signal matters more than silent failure.
2) When local control exists, prefer it and even if cloud features remain
If a device exposes a local API, local protocol, or local control path, that should be the primary integration.
Even when certain features only exist in the cloud, there is rarely a reason to:
-
depend entirely on the cloud, unless the device is cloud only with no internal logic
-
ignore local control just because cloud integration is easier
Use the local interface for core functionality. Layer cloud features on top only where necessary.
Local-first doesn’t mean cloud-never, it means cloud-optional.
3) Devices must be able to do their job without Home Assistant
Wherever possible, devices should contain their own minimal, built-in logic.
Many commercial devices already do this well.
Example: my Litter Robot continues operating if the internet goes down. Otherwise I’d have very unhappy cats.
This principle matters most for:
-
ESPHome devices
-
DIY controllers
-
SwitchBot-style retrofits
-
anything you build yourself
A device doesn’t need to be smart on its own but it must be functional.
If everything fails:
-
Home Assistant down
-
network down
-
integrations broken
…the device should still perform its basic task, even if is inefficient or a little janky.
Working badly is better than not working at all.
Example: hot water recirculator
My hot water recirculator will happily pump in a loop all day long if left alone. That behavior is built into the device itself.
Home Assistant doesn’t run the pump, it optimizes it:
-
disable when no one is home
-
disable during low-use hours (10am–5pm)
-
enable only when it makes sense
If HA goes down, the pump reverts to its basic behavior and continues working. Efficiency is lost but functionality is not.
4) Put efficiency and intelligence in HA, not basic function
This follows directly from the previous rule:
-
Devices provide function
-
Home Assistant provides optimization
HA should be responsible for:
-
energy savings
-
scheduling
-
presence-based behavior
-
cross-device coordination
Devices should not require HA in order to function at all.
This separation keeps failures survivable and systems understandable.
5) Devices should look to Home Assistant rather than the other way around
Whenever possible, devices should consume state from HA, rather than HA constantly polling or chasing device state.
This allows:
-
simpler automations
-
cleaner logic
-
easier fallback behavior when HA is unavailable
Example: “Home Stance” and freeze protection
I maintain a set of Home Stances in Home Assistant. One of them is Freeze Warning.
The Freeze Warning binary turns on when:
-
a forecast predicts freezing temperatures within the next 8 hours
-
or any temperature sensor on the property reports near-freezing conditions
This acts as a central “bat signal.”
ESPHome devices capable of heating:
-
monitor the Freeze Warning binary
-
preemptively turn on heating when it’s active
-
do this regardless of whether they have their own temperature sensor
If HA is up, devices coordinate intelligently.
If HA is down, devices fall back to their own local logic.
The result is centralized awareness with decentralized action.
6) Design for graceful degradation, not perfect uptime
Automation systems should be designed with the assumption that:
-
the internet will go down
-
integrations will fail
-
controllers will reboot
-
something will misbehave at the worst possible time
A good system:
-
fails loudly, not silently
-
informs humans what is broken
-
continues performing essential functions
-
resumes optimization automatically when systems recover
Summary: the operating rules
If I had to condense this philosophy into a few rules:
-
Local-first whenever possible
-
Cloud-optional, never cloud-required
-
Devices must function without HA
-
HA optimizes; devices operate
-
Central awareness, distributed action
-
Graceful degradation beats cleverness
Automation should make a home calmer, not more fragile.
My temperature sensor in the pump house failed, however the heater still turned on because the Home Stance told it to.

