One of the benefits of using Google cloud platform or AWS for computational tasks is that you can get resources on demand and not pay for round the clock usage. But even with intermittent use, you may get surprised by the billing amount at the end of the month. It feels worse when you remember that you had left an instance running for a couple of days after your simulation had stopped running. The cores were idling, but you still must pay for them.
One of the approaches you can use is to launch your simulation using a script which shuts down the machine after your program finishes execution. You can throw in a delay in between to allow your results to sync though Dropbox (which is pretty fast anyway <3 ). The default cloud images do not ask for sudo passwords, so you don’t need to run the script as root, nor does executing “sudo poweroff” from a script require any special tricks.
But this is not the most convenient solution. What if, you could normally launch your program, and then have another script looking at the CPU utilization in the background? If you are sure your program keeps the CPU loaded up to a certain level, which it should, you can set a threshold and see if the usage stays below that level for a sustained period, then shut down the machine.
Normally, doing this would require doing a moving average of CPU usage over an interval of time – a lot of math to do if you are aiming for a simple shell script. Fortunately, the Unix ‘uptime‘ command automatically does the average for you and reports the instantaneous CPU load, last 5 minute load average, and the last 15 minute load average. There can be no better indicator that your program has finished executing than the 15 minute load average being close to zero. To be sure, you can watch this number while your program executes and make sure your program is properly and consistenetly utilizing the CPU.
Here is the output of the uptime command when some of the cores have been busy:
11:35:03 up 1 day, 1:20, 7 users, load average: 3.08, 3.87, 3.80
And this when it has been relatively idle:
11:36:23 up 11:55, 4 users, load average: 0.43, 0.26, 0.28
The 15 minute load average is the last number in the output. The idle load varies with the kind of background processes running on the computer, but there is a clear margin between the idle and busy values.
Here is a short script that will compare the 15 minute load average to a set threshold every minute, and shut the machine down if it stays low like that for 10 minutes.
Several things can be improved about this script. But for now, this is something you can just take and use without worrying about customizing it or changing your workflow. Just don’t set the threshold so low that the idle state becomes hard to detect, or too high that it seems the machine is idle even when you program is running. 0.4 works great for me when I know my program will be pushing at least one core to the limits during execution (around 0.8 – 0.9 minimum load).
Other approaches that come to mind would be looking for your program in the list of processes, or watching for a file produced by your program at completion. But measuring the CPU utilization is the most generic and reliable method there is.