All posts by rohitrawat

Fixing Keras Hangups in Jupyter Notebooks

Keras is a wonderful Python library for high level implementation of deep learning networks. It provides a neat customizable interface for designing intricate sequential and recurrent neural networks and a fine grained control on the training algorithm. For its backend, it transparently allows the usage of either Theano or Tensorflow which seamlessly abstracts the CPU and GPU implementations of the complicated algorithms and data flows. Modern compiler design and expression templates have really come a long way!

I love programming in Jupyter notebooks because it leads to a reproducible record of my work, and it provides a very convenient interface to run your code on a headless work machine or in the AWS/Google cloud. Jupyter notebooks are also used heavily in machine learning courses taught online and in classrooms, because they help the instructor abstract the ugly details of setting up the environment into VMs or cloud images.

One frequently encountered problem with training Keras models in Jupyter notebooks is of getting a ‘WebSocket ping timeout’. My understanding of the issue is that the training progress bar updates overwhelm the Jupyter client-server connection and communication freezes. This is an often referenced issue, and some of the solutions involve redirecting the stdout to a file to relieve the stress of progress bar updates, but those prevent you from looking at the training progress and important messages right in the notebook. One elegant solution I like is disabling the default text progressbar in Keras and using the keras_tqdm progressbar. tqdm is a neat modern looking progressbar with Jupyter notebook support, so they don’t time out the connection with constant updates. The author has put together a really convenient Keras callback class that draws and updates the progress bars in the notebook. I have successfully used it to fix my timeout issues when training Keras models.

One little improvement that I contributed is a bug fix to support dynamic batch size training. Keras provides two ‘fit’ functions for training: a fit() function for fixed size batches where all data is loaded in a large numpy array, and a fit_generator() function where batches are generated on the fly from a generator. Batch generators have several advantages, the most important one being that datasets too large to fit into memory can be processed by reading them from the disk in chunks. Another significant advantage is being able to ‘generate’ data – for example, by applying transforms and crops to images, by adding noise, or by raw synthesis. The generators do not need to abide by a fixed batch size; they can yield batches with a different number of items every time (although many generators have a fixed yield size). This breaks the progress counting mechanism in keras_tqdm code. I have detailed my process in the bug report and the pull request. Till the pull request is merged, you can use my fork. Follow these install instructions:

git clone https://github.com/rohitrawat/keras-tqdm.git
cd keras-tqdm
python setup.py install

I hope this post helps people who are running in the Websocket ping error, or in general, are unable to run keras_tqdm with fit_generator.

Automatically shutting down Google cloud platform instance

One of the benefits of using Google cloud platform or AWS for computational tasks is that you can get resources on demand and not pay for round the clock usage. But even with intermittent use, you may get surprised by the billing amount at the end of the month. It feels worse when you remember that you had left an instance running for a couple of days after your simulation had stopped running. The cores were idling, but you still must pay for them.

One of the approaches you can use is to launch your simulation using a script which shuts down the machine after your program finishes execution. You can throw in a delay in between to allow your results to sync though Dropbox (which is pretty fast anyway <3 ). The default cloud images do not ask for sudo passwords, so you don’t need to run the script as root, nor does executing “sudo poweroff” from a script require any special tricks.

But this is not the most convenient solution. What if, you could normally launch your program, and then have another script looking at the CPU utilization in the background? If you are sure your program keeps the CPU loaded up to a certain level, which it should, you can set a threshold and see if the usage stays below that level for a sustained period, then shut down the machine.

Normally, doing this would require doing a moving average of CPU usage over an interval of time – a lot of math to do if you are aiming for a simple shell script. Fortunately, the Unix ‘uptime‘ command automatically does the average for you and reports the instantaneous CPU load, last 5 minute load average, and the last 15 minute load average. There can be no better indicator that your program has finished executing than the 15 minute load average being close to zero. To be sure, you can watch this number while your program executes and make sure your program is properly and consistenetly utilizing the CPU.

Here is the output of the uptime command when some of the cores have been busy:

$ uptime
11:35:03 up 1 day, 1:20, 7 users, load average: 3.08, 3.87, 3.80

And this when it has been relatively idle:

$ uptime
11:36:23 up 11:55,  4 users,  load average: 0.43, 0.26, 0.28

The 15 minute load average is the last number in the output. The idle load varies with the kind of background processes running on the computer, but there is a clear margin between the idle and busy values.

Here is a short script that will compare the 15 minute load average to a set threshold every minute, and shut the machine down if it stays low like that for 10 minutes.

#!/bin/bash

threshold=0.4

count=0
while true
do

  load=$(uptime | sed -e 's/.*load average: //g' | awk '{ print $3 }')
  res=$(echo $load'<'$threshold | bc -l)
  if (( $res ))
  then
    echo "Idling.."
    ((count+=1))
  fi
  echo "Idle minutes count = $count"

  if (( count>10 ))
  then
    echo Shutting down
    # wait a little bit more before actually pulling the plug
    sleep 300
    sudo poweroff
  fi

  sleep 60

done

Several things can be improved about this script. But for now, this is something you can just take and use without worrying about customizing it or changing your workflow. Just don’t set the threshold so low that the idle state becomes hard to detect, or too high that it seems the machine is idle even when you program is running. 0.4 works great for me when I know my program will be pushing at least one core to the limits during execution (around 0.8 – 0.9 minimum load).

Other approaches that come to mind would be looking for your program in the list of processes, or watching for a file produced by your program at completion. But measuring the CPU utilization is the most generic and reliable method there is.

Using Google Cloud Platform for parallelizing simulations

This post is not about parallel computing. It’s about splitting long jobs across several machines and using Google Cloud Platform and Dropbox to do it cheaply and effectively. We researchers often run simulations that go on for several hours. Many times it’s about running the same job several times but with different parameters for every run. Assuming these are independent jobs, it makes sense to run them in parallel on several machines to save time. The problem here for most people is that they don’t have dozens of powerful machines at their disposal. Google Cloud Platform to the rescue! (Nothing against AWS, but I used up their free credit many years ago before I needed the compute capabilities.)

Cost

Currently, Google Cloud Platform has a promotion that gives you $300 credit to use up within 60 days of signing up. This is a substantial amount to use up in two months – I roughly estimate that it could run five dual core instances with 8G of RAM and 50G hard drives non-stop for two months and still have about $35 of unused credit at the end. Say what? Now if you didn’t need the instances running 24×7, you could possibly have many more of worker machines running in parallel with 4/8 cores and dollops of RAM and SSD storage and still be using the free credit all this time. (Note 1: Google gives discounts on sustained use, just like AWS does, so running a machine for half the time does not exactly halve the bill. Note 2: The free trial puts a limit of 8 cores per zone, so your machines are limited to a maximum of 8 cores. Note 3: There are total 5 zones available, so you may not exceed 40 cores running at a time, which is still 10 quad core machines if you will). With a paid account, you could go even higher.

In my case I needed to run a simulation with five different sets of parameters, and each run took about two hours to finish. I could either run them all on my machine and wait ten hours to get the results, or I could spin up four cloud instances on the side and have everything ready in two. You get the advantage. If there was no free credit, would I still do it? Absolutely! Getting results quickly and moving on to the next thing is priceless!

You do need a credit card to sign up for the free trial, which is a bummer if you are a student. I hope they remove this requirement for students. I have also not compared prices to see if Google is offering cheaper compute resources than Amazon. There are probably other cheaper options available for non-compute uses like hosting small websites.

Dropbox for syncing code and results

I have some Linux experience, so setting up multiple instances was not difficult for me. It might not be easy for others so I’ll talk about how I did things and the tricks I learned. My goal here is to save you time. If your simulations only run on Windows, I’m not sure how helpful this post will be to you – sorry! Where does Dropbox fit in? You guessed it. To synchronize the code and results. Most people have done enough referrals to bump up their quota from the standard 2G that Dropbox offers. For others, there is Google Drive with 15G of free space (BUT NO OFFICIAL LINUX CLIENT! ARE YOU KIDDING ME!?), so if you can get one of the third party Linux clients working satisfactorily, good for you, but I will only stick with Dropbox for this post. I have a paid Dropbox account with 1TB of storage so it wasn’t an issue for me. I can’t thank Dropbox enough for the excellent work they have done.

Performance

cat /proc/cpuinfo tells me that the instance I’m looking at has 4-cores of Xeon CPUs @ 2.30GHz. These are supposed to be dedicated to my instance. Even though my simulations only take about two cores to 100% utilization, they run noticeably slower on the cloud instance than on my local workstation with an 8-core i7-4770 CPU @ 3.40GHz. The workstation can even run a second simulation inside a VM with no major slowdowns. But certainly none of this matters unless I have benchmarks to go with my claims. I will try to post some here later.

Getting started

If you don’t have a Google account (Gmail), you will need to create one. You can then sign up for the free trial or a paid account. Google Cloud Platform is an umbrella under which they provide many services – for hosting apps, running databases, networking, and machine learning, to name a few. We are looking at the Compute Engine component. You can open your Google Cloud Platform console by going to https://console.cloud.google.com/. You must create a new Project which will organize all your compute instances, app engines, databases, etc. related to a project together. Our project will only have compute engine instances, but you must create one to get started.

This is what your console looks like (with a project already open):

Dashboard

You will be automatically prompted to create a new project when you first sign in. Pick a name for your project:

Create project

Wait a few seconds till things are ready. You will see a spinning circle till then which will turn into a notification once it is done.

Wait Wait

Once created, your new project’s dashboard will look like this:

Empty project dashboard

Creating your first instance

Click on the expand menu button and click on Compute Engine:

Create instance 1

You can then click on the Create instance button under VM instances.

Create instance 2

Here you pick a name for the instance (which is also its hostname), a zone (there are limits on how many cores you can be running at the same time in a zone), number of cores and memory, and a boot disk. The default machine types can be customized to change the number of cores and amount of RAM to suit your  needs.

Create instance - provide name, choose hardware specs

For the boot disk, you can choose from the available Linux distros, choose between SSD and HDD, and set a size for the disk. You can also choose a snapshot of a disk you saved earlier.

Select distribution for boot disk

I’ll go with Ubuntu 16.04 LTS and a 20GB standard boot disk. The documentation says that larger disks will have better performance, so if you have I/O intensive tasks, you may want to go for a larger SSD type disk.

As soon as you click on the create button, the machine performs its first boot (to auto-configure the hostname, networking, and other management features) and you start getting billed for it. You can always shut it down when not using it but there is a minimum 10 minute limit for billing.

Running instance with public IP

Once the machine is running, you will see its public IP address. You can SSH into it directly from your web browser by clicking the SSH button, but many people, including myself, don’t like the feel of a javascript console and prefer the real thing. Also, when using the web console, you are automatically signed in to an account named after your Google username which may not be what you want. To use an SSH client to connect via the public IP, you will need to set up your SSH keys.

Setting up SSH keys

You can set up per-instance SSH keys or project-wide SSH keys. I prefer the latter, as once set up, they are automatically installed on all instances in the project. All user acounts corresponding to the installed keys are also created automatically. To set up your SSH key, click on the Metadata section and select the SSH tab. Copy the contents of your public key and paste it into the box (if you don’t have one, you can search the web for how to create an SSH key pair). Your username will be automatically picked up from the key – which is usually the username you have set up on your local machine. Click the save button and you are all set. Click on the VM Instances button to go back.

Project-wide SSH keys

You can now access your machine by simply running:

ssh public_ip

Here is a connection session:

~$ ssh 104.137.152.171

The authenticity of host '104.137.152.171 (104.137.152.171)' can't be established.
ECDSA key fingerprint is SHA256:s5d4f54sd5f4s5
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '104.137.152.171' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-38-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.



The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

rohit@machine1:~$

You can see the username and hostname on the machine are correctly set up.

Install your software

You should now install all the software you will be running on the machines. Remember that this is your template machine which will be duplicated to create your other instances, so try to keep things clean and complete. If you are running free or open source software, it won’t have complicated licensing mechanisms and the clones will work perfectly fine without the need for re-installing again and again. Even proprietary licensed software installations, like MATLAB, can be duplicated this way but may need re-activation on each instance depending on the license type.

For my personal setup, I installed the C++ compiler, Octave, and R.

sudo apt-get install build-essential 
sudo apt-get install octave octave-image octave-signal 
sudo apt-get install r-base

You can also launch R and install any R packages you need.

Although I have only used tools that can be installed and run without an X server, you can install one and use it headless/remote using VNC.

Install Dropbox

Follow “Dropbox Headless Install via command line (64-bit)” instructions at https://www.dropbox.com/install-linux. Once you have run the commands on that page, you will be asked to authenticate by visiting a URL. The Dropbox daemon will start running and syncing your files.

You can kill the sync process for now by pressing Ctrl-C. We will first download the Dropbox command line tool to monitor Dropbox and also set up the daemon to autostart. We will also set up exclude folders to prevent personal or unnecessary files from syncing to the nodes.

Fetch the Dropbox command line tool

$ mkdir ~/bin
$ cd ~/bin
/bin$ wget --content-disposition https://www.dropbox.com/download?dl=packages/dropbox.py
--2016-10-01 03:17:34--  https://www.dropbox.com/download?dl=packages/dropbox.py
Resolving www.dropbox.com (www.dropbox.com)... 162.125.4.1
Connecting to www.dropbox.com (www.dropbox.com)|162.125.4.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://linux.dropbox.com/packages/dropbox.py [following]
--2016-10-01 03:17:35--  https://linux.dropbox.com/packages/dropbox.py
Resolving linux.dropbox.com (linux.dropbox.com)... 52.84.63.11, 52.84.63.249, 52.84.63.76, ...
Connecting to linux.dropbox.com (linux.dropbox.com)|52.84.63.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 116583 (114K) [application/octet-stream]
Saving to: ‘dropbox.py’

dropbox.py          100%[===================>] 113.85K  --.-KB/s    in 0.02s   

2016-10-01 03:17:35 (5.24 MB/s) - ‘dropbox.py’ saved [116583/116583]

/bin$ chmod +x dropbox.py 
/bin$ cd
$ ~/bin/dropbox.py 
Dropbox command-line interface

commands:

Note: use dropbox help <command> to view usage for a specific command.

 status       get current status of the dropboxd
 throttle     set bandwidth limits for Dropbox
 help         provide help
 puburl       get public url of a file in your dropbox's public folder
 stop         stop dropboxd
 running      return whether dropbox is running
 start        start dropboxd
 filestatus   get current sync status of one or more files
 ls           list directory contents with current sync status
 autostart    automatically start dropbox at login
 exclude      ignores/excludes a directory from syncing
 lansync      enables or disables LAN sync
 sharelink    get a shared link for a file in your dropbox
 proxy        set proxy settings for Dropbox

$ ~/bin/dropbox.py status
Dropbox isn't running!

Dropbox autostart

Currently, the Dropbox daemon is not running, but we will make it run on boot in the background. Unlike other services, the Dropbox daemon should-not/need-not be run as root.  Thus it is best set up to launch as your personal cron job.

$ crontab -e
no crontab for rohit - using an empty one

Select an editor.  To change later, run 'select-editor'.
  1. /bin/ed
  2. /bin/nano        <---- easiest
  3. /usr/bin/vim.basic
  4. /usr/bin/vim.tiny

Choose 1-4 [2]: 3

Add the following line to the end of the file:

@reboot $HOME/.dropbox-dist/dropboxd

Save and exit the editor. Now reboot the machine with:

$ sudo reboot

Wait about a minute to allow the machine to come back up, then ssh into it again. You can now check the Dropbox status:

~$ ~/bin/dropbox.py status
Starting...

And once the sync completes, you should see

~$ ~/bin/dropbox.py status
Up to date.

Exclude personal/unnecessary files

You can exclude folders that contain personal files or ones which are too big to fit on your instances by

~$ cd Dropbox
~/Dropbox$ ~/bin/dropbox.py exclude add "excluded folder 1" "excluded folder 2" "excluded file 1"

You can add as many folders as you want, but be aware that it is a slow process. Also, you will need to re-do the authentication and exclude steps on each instance.

Snapshotting and duplicating your machine

Now we get to the real time-saving part – creating other instances by duplicating this one. On the VM-instances page, you may notice a “Clone” button. But it doesn’t clone the way we expect it to. The clone is only similar in configuration – CPU cores, memory, disk size and distro – but your data is not duplicated! To do an effective clone, we will have to:

  1. “Unconfigure” the machine and turn it off.
  2. Take a snapshot of the disk.
  3. Create new instances using the snapshot.
  4. Re-authenticate Dropbox if needed.

Prepping for the snapshot

Snapshots that have to be deployed to multiple machines usually have things like hostnames, ssh keys, network config etc. erased so that they can be given unique values on each machine. The beauty of Google Cloud Platform is that it handles all these things transparently. The only “unconfiguration” you need to do is of Dropbox. If you blindly clone a Dropbox installation, Dropbox will think that all the clones are the same machine and you will get erratic syncing (the behavior at the time of this writing). Thus each machine should be independently authenticated – which is fast and painless. The data that is already present in the Dropbox folder is not re-downloaded as Dropbox is smart enough not to do that.

First stop the Dropbox daemon and then delete the “~/.dropbox” hidden folder:

$ ~/bin/dropbox.py stop
Dropbox daemon stopped.
~$ rm -rf "~/.dropbox"

Shut down the instance with sudo poweroff and we are ready to take the snapshot.

Taking the snapshot

Click on Snapshots in the sidebar, then click on Create Snapshot.

Create snaphot

Give a name for the snapshot, a description, and for the source disk, select the machine you just created. Click create.

Create snaphot - details and source disk

Note that keeping snapshots is not free. You are charged per GB of snapshot storage, but the prices are insanely low.

Create a new instance using the snapshot

Go back to the VM instances page. Click the Create Instance button.

New instance from snapshot

New instance from snapshot - configure

For the boot disk, click on Change, and this time visit the Snapshots tab. Select the snapshot you made and finish creating the instance.

Use snapshot as boot disk

Wait for the instance to boot up, and you should now have a clone with your data, installed programs, and correctly configured hostname and SSH keys. You can now simply SSH to the new instance.

Re-authenticate Dropbox

Once we are logged in, we would want to register this new machine with Dropbox and enable sync. First we kill the Dropbox daemon running in the background. We then launch it manually to get the auth URL.

~$ pkill dropbox
~$ ~/.dropbox-dist/dropboxd
This computer isn't linked to any Dropbox account...
Please visit https://www.dropbox.com/cli_link_nonce?nonce=lkj8u5h5jee8t to link this device.

Once complete, you can kill the daemon with Ctrl-C and reboot. You can then go ahead and exclude the folders you wish to from this instance.

You can repeat this process to create as many clones as you may need. You can then start/stop those instances, wait for the Dropboxes to sync up, and start your simulations. Your simulation results that are written to Dropbox are also accessible from anywhere, even when the compute nodes are powered down.

Misc. Problems

SSH Warnings

Since Google Cloud Platform recycles their public IP address, it is very likely that your new instance will get the same IP address you got for a different instance earlier. Usually, this is a sign of the server being hijacked, but not in this case. The resolution is given in the error message itself.

~$ ssh 104.197.12.171

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:hjdshf8745j4h53k4jh5I.
Please contact your system administrator.
Add correct host key in /home/rohit/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/rohit/.ssh/known_hosts:70
  remove with:
  ssh-keygen -f "/home/rohit/.ssh/known_hosts" -R 104.197.12.171
ECDSA host key for 104.197.152.171 has changed and you have requested strict checking.
Host key verification failed.
~$ ssh-keygen -f "/home/rohit/.ssh/known_hosts" -R 104.197.12.171
# Host 104.197.12.171 found: line 70
/home/rohit/.ssh/known_hosts updated.
Original contents retained as /home/rohit/.ssh/known_hosts.old
~$ ssh 104.197.12.171
The authenticity of host '104.197.12.171 (104.197.12.171)' can't be established.
ECDSA key fingerprint is SHA256:hj3h45h35jh4hkI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '104.197.152.171' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-38-generic x86_64)

Broken SSH connections

You will often find your SSH connections getting broken or timed out, especially when you are on WiFi, or if your machine running the SSH client goes to sleep. To keep the sessions active and to keep your simulations running even in the midst of disconnections, use a program like screen. I have also heard good things about tmux, but have not used it yet.

First thing you do after launching an instance and logging on to it is to run screen.

~$ screen

Screen version 4.03.01 (GNU) 28-Jun-15
...
                  [Press Space for next page; Return to end.]

You will be taken to a regular shell where you can do your tasks. Now, if for some reason your SSH connection is broken, create a new SSH session to the instance and resume the screen session.

~$ screen -r

Using screen takes away the scrolling features from the console, but you can press Ctr-A, Esc keys in screen to enter a scrollable mode.

Checking your quotas

During the free trial, you can only run 8 cores in a particular zone (of which there are five: east, west, central, Europe, and Asia). You can check your usage by clicking the Quotas link in the sidebar.

Quotas page

Security

If you or the school/employer you work for has security policies for the storage, transmission, or use of data, code, and anything else, please be sure to follow them. It is not okay to store data on unecrypted disks or in cloud services if your workplace has rules against doing that.

Remember to apply all important security updates, especially ones related to SSL and SSH. Do not open any ports in the cloud platform firewall unless you know what you are doing. Disable password authentication for SSH and install an intrusion prevention tool like fail2ban.

Conclusion

I hope this guide helps you get a head start with using Google Cloud Platform to advance your research. With Dropbox, your most up-to-date code will be available across all your machines, and your results will be too. By cleverly making and using snapshots, you can scale up to dozens of worker machines without breaking a sweat. Here you can see multiple instances running, but not really doing anything CPU intensive.

Cloud console with multiple instances

Remember to power off instances when you won’t be using them for long periods. Happy Cloud Computing!

If you liked this post, please “like” it on LinkedIn: www.linkedin.com/hp/update/6187867846433964032

Thank you!

(Very) Basic Linux Server Security

This post is meant for Linux newbies.

So you finally decide to set up a Linux machine to serve the glorious content you have created. Your Linux machine may seem invincible behind the safety of your home or corporate router’s firewall, but once it becomes a publicly accessible server, it becomes highly vulnerable to hackers, DoS bots, and spammers. If you are managing a VPS, or if have your own machine or VM exposed to the Internet through port forwarding, it becomes your responsibility to keep the machine secure and updated. A compromised machine is not only bad for your own business, but it can also become a device for hackers to launch other attacks.

Typical server uses include

  • publishing a website/app
  • ftp server
  • ssh server

Common ways to compromise a server are

  • exploiting a 0-day or other unpatched security vulnerability
  • crash it with a Denial of Service attack
  • brute force password cracking

I’ll just dump the basic practices I follow in hopes that it may some day help another uninformed soul. Since I’m no security expert, please take everything with a grain of salt.

Keep Everything Updated

This advice often falls on deaf ears. Outdated software is not only deficient in terms of features, it is also full of security holes that can be exploited by hackers.

On most Linux distros, updating software is as simple as

Debian style:

sudo apt-get update && sudo apt-get upgrade

or Red Hat style:

sudo yum update && sudo yum upgrade

Sometimes packages need you to a ‘dist-upgrade’ instead of ‘upgrade’, which should be fine most of the time.

Software updates always carry the risk of breaking things. I have run into servers running five year old versions of Apache, PHP, Perl, etc just to avoid the discomfort of dealing with broken dependencies. Most times these machines are located on private isolated networks so the threat is somewhat mitigated, but this is an absolute no-no on a publicly accessible server.

If you are extremely worried about updates breaking your server, you can choose to do the updates manually on a regular basis so that you can carefully examine what is being updated and what effects each update may have. But this is generally going overboard, as most software developers are careful about what they release and would never want to break millions of machines on the Internet. For everyone else, I recommend some automated way of patching your machines. The simplest way I know of is to create a cron job to do this regularly:

Edit the crontab file for the root user:

sudo crontab -e

(Choose your desired editor if prompted and) Add the following statement to update your system daily at midnight:

0 0 * * * apt-get update && apt-get -y dist-upgrade

(assumes a Debian type system).

Even with the most up-to-date software, expert hackers can still find holes that are not known publicly yet and get inside a system. But if those kind of guys are after you, then you shouldn’t be managing your server security yourself.

TO BE CONTINUED..

Detecting Power Outage From WiFi Networks and Stats

Many people use a Raspberry Pi as a server at home. Even though most uses are recreational and non-critical, power outages can occasionally lead to loss of data and corruption of the SD card. To avoid that, cautious users often connect their Pis to a backup uninterruptible power source (UPS). There are many dedicated UPS solutions for the Raspberry Pi. Some are sold as ready made products, others are DIY projects. If there is other IT equipment connected to a UPS, the Rasperry Pi can simply be plugged into one of those outlets instead of having its own separate UPS.

UPS systems have limited backup, and depending on the load and battery capacity, they can last anywhere from a few minutes to several hours. Many UPSs have a PC interface that alerts the system that a power outage is in effect and of the state of the batteries so that the system can gracefully shut itself down before the UPS dies. Dedicated Raspberry Pi UPSs also often come with circuitry that can send a message to the Pi via a GPIO pin. There are also cases where there is no such facility, for example, when sharing a UPS meant for other devices. People have come up with many DIY circuits that detect when there is no power in a non-UPS backed up outlet, but these involve spending money on components and building circuits.

Here, we will try to come up with a software only solution to detect the power outage.

During a power outage any WiFi routers not connected to a UPS will go down. Keeping an eye on how many traditionally stable WiFi networks are are down at the same time can be used to determine if there is a power outage in the area. This assumes the Raspberry Pi has WiFi capability (like an RPi3 or a WiFi USB dongle). It does not matter whether the Pi uses WiFi or Ethernet for Internet connectivity, WiFi is merely used to scan for available SSIDs.

Here is a simple script that counts the number of active SSIDs using the iwlist command, and if the number is less than a set threshold, it is flagged as an outage.

#!/bin/bash

# THIS SCRIPT MUST BE RUN WITH SUDO

MIN_NUM_SSID=5

NUM_SSID=$(iwlist wlan0 scan | grep -c ESSID)
echo $NUM_SSID

if [ $NUM_SSID -gt $MIN_NUM_SSID ];
then
  echo "Power is good."
else
  echo "Power outage suspected."
fi

In this example, the Raspberry Pi sees about 10 SSIDs on average with good strength at any given time. During a power cut, I expect at least half of them to be out of service as almost no one around me plugs their WiFi routers to a UPS but for a few mobile hotspots that spring up, so a threshold of 5 seems reasonable.

This script can be run as a root cron job, executing at say five minute intervals. When the script detects an outage, it can do several things, I am listing only two here:

  • Shut down the Raspberry Pi
    shutdown -h now
    Although this is the safest method, this is not recommended as the Raspberry Pi will not turn itself on unless it is actually power cycled – manually or by the UPS draining out. You could be stuck with a shut off Raspberry Pi.
  • Shut down all important services and sync the filesystem
    service vsftpd stop
    service nginx stop
    service mysql stop
    service php5-fpm stop
    sync
    To get a list of active services, run sudo service --status-all. The sync command flushes all buffers to the SD card. This is not a fool proof measure against data corruption, but it just might work. When all the networks come back up, you can restart all the services or simply reboot the Pi with the reboot command.

Better detection of the outage?

The script above only counts the number of SSIDs at the instant the cron job executes it. For a more reliable count, the script should average the number of SSIDs visible over a few minutes to prevent false triggering.

The script is also flawed because it assumes the number of active SSIDs in the area won’t change over time. New networks could come up and old ones could be taken down permanently. The script can keep track of online SSIDs over several days and use the moving average as the total. The threshold used to detect a power outage could be a fraction of this value.

There are many many ways to implement this more gracefully and I will leave it to the curious minds out there to do it in their own way.

Saving bandwidth when upgrading multiple machines to Windows 10

Microsoft will stop offering the free upgrade to Windows 10 it is currently handing out to Windows 7, 8, and 8.1 users from July 29, 2016. If you want the free upgrade to Windows 10, you must do so now or forever your peace.

Microsoft has made it fairly easy for everyone (but watch out for the new privacy and automatic update settings and customize them) and it is a compact 3GB download. This happens through the Windows Update mechanism. But there are people around the world on metered connections who would exhaust their bandwidth trying to upgrade multiple machines. Those people can download and save a USB or ISO installer version using the Windows 10 Media Creation Tool and upgrade multiple PCs (eligible for the upgrade). Note that Widows 10 is set up by default to share downloaded updates with other computers, but apparently this does not work for the Windows 10 upgrade files as I did not notice any speed improvements even though there was a recently upgraded Windows 10 machine on the LAN with this sharing enabled.

If you are on a metered connection and already upgraded your machine but now want to upgrade other machines without downloading the files all over again, there are a couple of workarounds. Windows 10 upgrade files are located in the $WINDOWS.~BT folder on your system drive and may still be there. The installer downloads a “install.esd” file which can be converted to an ISO file by following tutorials like this on the web. The ISO can be written to a DVD or its contents copied via a USB drive to another computer to perform the upgrade.

I’m currently traveling and putting up with a metered connection. With the free upgrade deadline near, I realized I had a few laptops lying around which I did not upgrade yet. Instead of converting the ESD file to an ISO, I simply copied over the full $WINDOWS.~BT folder to another machine waiting for the upgrade using a USB drive. You may need to run Explorer as an administrator to successfully copy the files if copying from within Windows. I tried running the setup.exe file but it complained about missing files and failed. I then ran across this Reddit thread and decided to give it a try. Even though OP was trying to resume setup on the same machine the files were downloaded on, I decided to try it on the second machine and launched the Sources/setupprep.exe file, and lo!, the setup ran and completed successfully. Since my primary OS is Ubuntu which sits on a separate SSD with its own EFI bootloader, I was not worried if anything went wrong, but nothing did. If your folder is missing the Sources/install.esd file which means you cannot build but the ISO but rest of the install files are there, you can still try this method to do the upgrade on other machines. I hope this will help people with bandwidth quotas who did not begin with downloading the ISO file and need to upgrade more machines.

UTA VPN from Linux

UTA provides a VPN service for UTA students and faculty to facilitate work from off-campus. Cisco AnyConnect is used for the connection which has a Java installer and client interface, and installs platform specific binaries to your system. As usual, support for Linux breaks with time and is not fixed and the Cisco AnyConnect client for Linux cannot be installed on many machines. Time to look for an open alternative.

OpenConnect provides all the functionality of Cisco’s AnyConnect and works nicely on all platforms. OpenConnect can be used from the command line, or through the network-manager applet interface.

It can be installed on Ubuntu/Debian systems as:
sudo apt-get install openconnect
To use the network-manager applet, you should also install the network-manager-openconnect, network-manager-openconnect-gnome packages. This post only covers the command line method.

To connect to UTA’s VPN:
sudo openconnect vpn.uta.edu
and follow the prompts on the console.

  • UTA apparently has not updated its certificates in a while, so one has to accept unverified certificates twice by typing “yes”.
  • When prompted for GROUP, students should use the first one .Default--Students by typing it in exactly
  • Use your NetID and password as is. It is not required to prefix “uta\” to the username.

Once everything is done, you see the Connect Banner with UTA’s VPN policy and you are connected to the student VPN.

Pressing Ctrl-C in the console should gracefully shut down the VPN connection. If things are not right afterwards or you cannot reconnect, see below.

If you suspend your machine while it is connected to VPN, networking may not behave correctly on resume or you may not be able to re-establish the VPN connection. Killing the old VPN process by pressing Ctrl-C in the console, and then killing the openconnect daemon seems to fix things for me.

sudo pkill openconnect

If you know the Expect language, you can write a script to automate this and get you connected without responding to prompts. You can also use autoexpect to generate the Expect script.