Turn it Off and On again
John L. Gordon August 2021

Does this document have a title or are the words at the top, the instructions to fix all problems with computer based equipment?
I am prompted to write this now because it seems to me that the instructions at the top are being somewhat over used and over relied on by equipment manufacturers and suppliers. I did spend a little of my life writing software for microprocessor and computer based things so I do have a little experience. I have a computer controlled home system which operates 24 hours a day, all of the time and never switches off, well it’s not supposed to. In the early days of development I used a controller board I designed and built myself using a 68000 microprocessor. Well, there were even systems before that (going back to the 1980’s). Later the system migrated through commercial controller systems and is now based around a Raspberry PI and Arduino Mega (to do the interfacing).
When I switched to the Raspberry PI, I think in 2013, the system would only run for about 7 or 8 days before stopping, or stopping working, (microprocessors don’t usually stop, they just do stuff you don’t want them to do). I quickly realised that turning it off and then on again was going to be a real pain so I made a system which did it automatically. The original system is sort of based on a program like this.

Start of the things you want the program to do.
Do the next thing.
Do the next thing.
Wait for something which will never happen.
Do the next thing
Go back to the start and do everything again.

It might be quite easy to see that the problem with this program is in line 4. Of course, I didn’t write programs like this deliberately, but lines like line 4 can creep in where you don’t expect them. Not to mention the little problem of loading in some number which turns out to be zero and then dividing something else by it. Computers and microprocessors don’t like dividing numbers by zero, even your calculator won’t do it.
So, computers and controllers stop working properly in situations like this and I actually found it fun to find the problems and fix them. So much so that my home system now works for, well I don’t know how long, but a long time without stopping. But before I found the problems, I introduced an automated ‘switch it off and then on again’ system. This meant making a program something like this.

Start of the things you want the program to do.
Do the next thing.
Do the next thing.
Do the Next thing.
Pat the dog.
Go back to the start and do everything again.

Whilst the program is running, you might notice that it keeps patting the dog, in line 5. Lots of systems have a watchdog built into them. Probably even a TV box. If you set the watchdog loose at the start, it wants to be regularly patted. If you don’t pat the dog for any reason, the dog gets angry and crashes the computer and makes it start up all over again. So, I never had to turn it off and then on again, the dog did that for me if I made any mistakes in the program.
But there are some unlucky people who can’t really rely on the trusty dog, although they might leave it there watching, just in case. Imagine NASA launching a space rocket with astronauts on board where the people who write the programs to control the space ship, aren’t that careful. Of course, the dog might not be patted and crash the program but that isn’t a great idea when several people are rocketing into space under the control of the program. It’s really best if the programmers don’t make these mistakes and also check for them carefully, even though they don’t make them. It is really important to pat the dog in this case.

So, now on to my TV box (or my network router, or ALEXA, or many other gadgets I might have). Something stops working properly, so I phone the help desk and after a day or two get to speak to a person who tells me to turn it off and then on again. In my case, and probably in your case these days, we have already done that. But why did we have to. When I call about my periodically faulty TV box, there are even automated systems to tell me to turn it off and on again. The companies even have remote checking systems and can, so they say, tell if your gadget isn’t working properly. If this is really true, then why do they wait for us to phone up to fix it. Even more, why did they not simply set the dog loose to do it automatically. But even more importantly, why did they not ask the nice software engineers to write the programs properly and the hardware engineers to build robust gadgets.

So, what is it to be, design and build good quality hardware and software or simply tell everyone to turn it off and then back on again?