GeoCUP: supporting a flexible student computing environment

Over the past year, we’ve been supporting our first cohort of Geocomputation & Spatial Analysis (GSA) students as they learn to code and work with geo-data in an open computing context (predominantly FOSS). This post reflects on some of the problems – and solutions – that emerged as a result.

GeoCUP.v1

The first incarnation of GeoCUP (short for GeoComputation on a USB Platform) was a system-on-a-key described in a previous post. With the support of the Department and Faculty, USB keys were supplied to students at the start of term as follows:

  • 64GB USB 3.0 keys
  • Ubuntu Linux 14 LTS release (32-bit)
  • Pre-installed software:
    • R
    • QGIS
    • Canopy
    • Assortment of specified Python libs
    • Mozilla Firefox
    • Dropbox

The idea was that students could launch GeoCUP at boot time on a cluster machine from the USB key and would thus be running a full Linux distribution over which they had complete control. In an institutional computing context this was as close as we could get to giving them their own computer to play with, break, and manage.

We had also expected, based on what we’d seen with Linux ‘Live’ distributions that it would be feasible to have a key that would work with multiple types of firmware (including Apple’s EFI) and that students could therefore also run GeoCUP at home.

A final advantage would be the ease of replacing a lost key: since all their code was in Dropbox all they needed to do was reconnect Dropbox on a replacement key and they’d be up and running again in no time.

Ubuntu Screen Grab

Unexpected Issues

No well-laid plan survives much contact with the real world, and several issues emerged in the run-up to launch day:

  1. It is not (yet?) possible to have a full Linux distribution (as opposed to an essentially static ‘live’ distribution) that will start up at boot time on both Macs and PCs. Indeed, there are also issues with different vendors’ PC hardware being different enough from the machine on which GeoCUP.v1 was developed for this facility to be patchy, at best, on generic PCs as well. So portability proved to be rather more limited than we’d expected and hoped.
  2. Formatting the keys took much longer than expected. Since the keys needed to be bootable, the only way to write them was using the ‘disk duplication’ utility; however, dd is not able to distinguish between largely empty space and used space since it’s blindly copying the entire disk. So even though only about 20GB of the 64GB was in actual use, each key took about 5 hours to write. We were able to write up to 7 keys at once by combining dd with tee: as follows:
    dd 
    if=/Volumes/GeoCUP/geocup-20150917.bak/backup bs=524288 
    | sudo tee 
    /dev/disk3 ... /dev/disk9 > /dev/null
    

    We’d also note that using dd meant that we could only use 64GB USB keys, so if students lost a key and needed to replace it, they had to source exactly the same-sized key.

These start-up issues were then supplemented by performance issues after roll-out:

  1. Hardware buffering was much worse than expected. We had, naively, assumed that USB3 would provide sufficient bandwidth for our purposes and that read/writes would be fairly modest. We were wrong: the system frequently blocked completely for up to 10-12 seconds while data was written to/read from the USB key, and the entire Linux UI became unresponsive… which was rather frustrating for the students.
  2. As well, the pace of I/O usage of a full Linux distribution had a propensity to expose any physical weaknesses in the flash devices, so we had to re-flash probably 10–20% of the students’ keys over the course of the year.
  3. These performance issues then led some students to begin using their own laptops running OSX or various flavours of Windows instead, producing a proliferation in the number of students using the wrong Python libraries as platform support on some geodata and spatial analysis libraries is limited.
  4. All of this was compounded by the fact that some students were remembering to run
    sudo apt-get update

    on a regular basis, while others didn’t. So we even ended up with different versions of libraries on GeoCUP itself, and that led to code that would fail to run on one system but have no issues on another.

  5. A final ‘nail in the coffin’ of GeoCUP.v1 was the fact that one of our Ubuntu repositories was accidentally pointing at a development repository, not the stable one, and so one of the updates knocked out most of QGIS’ modelling functionality!

These were all serious issues, but in spite of them there were a number of students who reported that using GeoCUP had nonetheless helped the module as it gave them full control of their system, exposed them to power-user features such as the bash shell, and opened their eyes to some of the practical problems entailed in managing a system and a codebase. They also got to watch us doing some fairly frenetic on-the-fly debugging.

So with that in mind…

GeoCUP.v2

Virtualbox_logo

Part way through the year we began to experiment with Oracle’s VirtualBox platform as a way to enable students to run GeoCUP on their own computers (as that had signally not happened with GeoCUP.v1). Although there are higher-performance virtualisation platforms out there, VirtualBox is free, open source software so there were no licensing or cost implications to rolling this out on cluster systems or in suggesting that students download it to their personal computer.

GeoCUP.v2 is built as follows:

  • Ubuntu Linux 16 LTS (64-bit)
  • Anaconda Python
  • Rodeo & Atom IDEs
  • Dropbox
  • Google Chrome
  • QGIS

We’ve adapted installation scripts posted by Dani, up at Liverpool University for use with our own GeoCUP distribution since this speeds up the configuration and updating of the system as new Ubuntu distributions are released. You find them on GitHub: github.com/jreades/GeoCUP-Vagrant.

The main advantages of this shift are:

  1. The VDI (Virtual Disk Image) file is decoupled from the physical storage media, so as long as the image fits on the device then students can bring in whatever hardware they like (hard drive, flash drive, personal computer…) and run GeoCUP from that hardware.
  2. The VDI file is smaller and copying to new hardware uses the normal file copying mechanisms so ‘installation’ is also radically faster (we also only copy 20GB of data, instead of 64GB).
  3. By ditching Canopy for Anaconda we can also ‘fix’ the Python libraries using a configuration file so as to avoid last-minute problems caused by the release of new versions. We can then update those libraries to new, stable versions by distributing an upgrade script to the students rather than relying on manually-typed commands.

Alongside this, however, we retain the flexibility to give students administrator rights over their (virtual) machine, to install new software on the fly, and to take advantage of software updates without having to embed them in a centralised IT upgrade cycle. We also think that the virtualisation approach has significant advantages for IT services because they don’t have to monkey about with the BIOS of the cluster machines since the entire process is now software-based.

GeoCUP.v3 & Beyond

In the long run we’d like to automate even more of the distribution process so that we are no longer even responsible for ‘burning’ new USB keys or given students a drive from which to copy the latest version of GeoCUP.

Tools that enable just this sort of approach are beginning to surface: Vagrant and Docker are the two leading contenders at the moment, though they do slightly different things. I’ve been impressed by the way that Dani’s Vagrant-based distribution allows you to download a 2GB file containing a full Linux server distribution, have it automatically configured when it first runs, and then interact with the system via Jupyter Notebooks: it’s a fairly lightweight, but fully-functional Python-based geodata analytics ‘server’.

There are several problems with using this approach in our context:

  1. I’ve had a lot of problems getting Vagrant to also run in a ‘headed’ context, and since we want students to use the latest versions of QGIS as well as unsupported (by IT Services) IDEs such as Rodeo or Atom, we can’t drop the Linux desktop entirely and just run the notebook server.
  2. We can’t have students downloading even a 2GB file on to the cluster machines since a) they have nowhere to keep it in their allocated 200MB of online storage, and b) multiplying that 2GB overhead by 30 students is suddenly quite a big ‘hit’ to the network at the start of every class.
  3. We also can’t run Jupyter on a server somewhere on campus since every users runs with the same permissions as Jupyter and there’s no separation of user spaces as I understand it.

I suspect that these issues will be remedied in the not-too-distant future, and James and I will be exploring some of the possibilities with colleagues at ASU and UNSW over the coming year.

Finally, a /ht to Ryan Barnes, one of our own Geography grads who did the heavy lifting on version 1 of GeoCUP.

Aspect-Slope Maps in QGIS

While working with Naru to design our new 2nd year GIS methods training course (with parallel QGIS and ArcGIS streams!), I came across a rather striking map on the ESRI blog that managed to combine both slope (steepness) and aspect (direction) in a single representation. This post explains both a problem with the way that the colour scheme was specified and how to replicate this type of map in QGIS (with style sheet).

The Inspiration

Here’s Aileen Buckley’s Aspect-Slope map in all its glory – this is a the area around Crater Lake, Oregon, and you can see that it neatly captures both the direction of slopes (aspect) and their steepness (degree). So features like the crater stand out really clearly, as do what I assume is evidence of lava flows and such, while lesser features gradually fade towards grey, which means flat.

aspect-slope_map

So these maps combine two properties:

  • The direction of the slope is expressed in the hue – different directions are different colours.
  • The steepness of the slope is expressed by its saturation – steeper slopes are brighter colours.

Rather than just jump into providing you with a style sheet, I think it’s useful to trace this back to its constituent parts as it turns out that ESRI has made a mistake in setting up their colour maps.

Aspect Mapping

Aspect maps give the viewer a sense of the direction in which various slopes derived from  a Digital Terrain Model (DTM) lie – typically, we do this by dividing the angle of the slope into eight quadrants: North, Northwest, West, Southwest, South… well, you get the idea.

Here’s an example of what the standard aspect map out of ArcMap looks like as posted by the Rural Management and Development Department of Sikkim:

10sikkim-village-aspect

This, helpfully, gives us the ranges that we’ll need for our aspect-slope map. Note, however, that we don’t really have any idea how steep any of these obvious hills are.

Slope Mapping

Slopes maps are, obviously, intended to fill in the gap in terms of how steep an area is. Typically, we can measure this as either a degree value from one raster cell to the next of the DTM or as a percent/ratio (1-in-10 gradient = 10%). Here’s a nice example looking at the link between coffee bean growing areas and slope in Costa Rica:

costarica_bean_atlas_slope-rb-new

Unlike the aspect map, the divisions used in the slope map seem to be largely arbitrary with no real consensus on the mapping between measured steepness and terminology. The clearest guidance that I could find came from The Barcelona Field Studies Centre and looked like this:

Slope (%) Approx. Degrees Terminology
0.0–0.5 0.0 Level
0.5–2.0 0.3–1.1 Nearly level
2.0–5.0 1.1–3.0 Very gentle slope
5.0–9.0 3.0–5.0 Gentle slope
9.0–15.0 5.0–8.5 Moderate slope
15.0–30.0 8.5–16.5 Strong slope
30.0–45.0 16.5–24.0 Very strong slope
45.0–70.0 24.0–35.0 Extreme slope
70.0–100.0 35.0–45.0 Steep slope
> 100.0 > 45.0 Very steep slope

A Better Aspect-Slope Map Scheme

In order to create an aspect-slope map, we need to combine the two data ranges into a single number that we can use as a classification, and  this is where the ESRI blog approach goes a bit off the rails. In their approach, the ‘tens column’ (i.e. 10, 20, 30, …) represents the steepness – so 0–5 percent slope=10; 5–20 percent slope=20; and 20–40 percent slope=30 – and the ‘units columns’ (i.e. 0–8) represents aspect – so 0–22.5 degrees=1; 22.5–67.5 degrees=2; etc.

The problem with this approach is that you have a lot of problems if you want to add or remove a steepness category: in their example the highest value is 48, which means ‘highest value’ and an aspect of Northwest. But what if decide to insert a class break at a 30 percent slope to distinguish more easily between ‘Extreme’ and ‘Steep’? Well, then I need to redo the entire classification above 30… which is really tedious.

If we switch this around such that aspect is in the tens column (10–80) and steepness in the units column (0–9) then this becomes trivial: I just add or remove breaks within each group of 10 (10–19, 20–29, etc.). No matter how many breaks I have within each aspect class, the overall range remains exactly the same (10–89 if you use the full scale) regardless of the steepness classification that I’m using. It’s not just easier to modify, it’s easier to read as well.

Implementation in QGIS

For all of this to work in QGIS, you need to generate and then reclassify a slope and an aspect analysis from the same DTM. You can do this using outputs from the raster Terrain Analysis plugin (that’s the point-and-click way), or you can build a model in the Processing Toolbox (that’s the visual programming way). I personally prefer the model approach now that I’ve finally had a moment to understand how they work (that’s a topic for another post), but one way or the other you need to get to this point.

Regardless of the approach you take (manual or toolbox), once you’ve got your two output rasters you then need to reclassify them and then combine them. Here’s the mapping that I used to reclassify the two rasters as part of a model. You would copy these lines into text files and then use the GRASS GIS reclassify geoalgorithim while specifying the appropriate reclassification file.

Aspect-Reclassify.txt

0.0 thru 22.499 = 10
22.5 thru 67.499 = 20
67.5 thru 112.499 = 30 
112.5 thru 157.499 = 40
157.5 thru 202.499 = 50
202.5 thru 247.499 = 60
247.5 thru 292.499 = 70
292.5 thru 337.499 = 80
337.5 thru 360.5 = 10

Slope-Reclassify.txt (for percentage change)

0.0 thru 4.999 = 0
5.0 thru 14.999 = 2
15.0 thru 29.999 = 4
30.0 thru 44.999 = 6
45.0 thru 100.0 = 8

So that’s a 5-class steepness classification, but you could easily set up more (or fewer) if you needed them.

Once you’ve reclassified the two rasters it’s a relatively simple matter of raster layer addition: add the reclassified slope raster to the reclassified aspect raster and you should get numbers in the range 10–88.

Here’s the model that I set up (as I said above, more on models in another post):

Aspect-Slope-Model

Specifying a Colour Map

Taking the ‘Aspect Slope Map’ output, all we need to do now is specify a colour map. I took the colours posted by ESRI in the colour wheel (as opposed to the ones specified in the text) and converted them to hexadecimal since that was the easiest way to copy-paste colours. I think, however, that I’ve ended up with a slightly ‘muddier’ set of colours than are in the original Crater Lake set as you’ll see with my ‘Sussex Aspect-Slope Map’ below:

Sussex Aspect Slope Map

And, finally, the QGIS style sheet file is here (sorry about the zip format but .QML is not a recognised style type):

Aspect Slope Style – Close to Original.qml

Wrap-Up

I’m sure that this style sheet could be further improved (and may even try to do so myself, though I’d also welcome submissions from anyone with some time on their hands), but at least this gives users and easy way to combine representations of slope and aspect in a single map using a reclassification scheme that is simple to extend/truncate according to analytical or representational need. Enjoy!

MSc Student Profile: Olivia Pang

King’s Water is fundamentally a research hub, connecting and encouraging interdisciplinary investigation into the biophysical, political, socio-economic, developmental and institutional aspects of water resources and their management. This focus on the multiplicity of institutional and physical systems surrounding and impacting water … Continue reading

Come and Join Us!

We’re looking for someone with a passion for teaching and research that uses quantitative and computational methods to understand geographical systems. If that sounds like you, submit your application for the position of Lecturer in Spatial Analysis at King’s College London.

King’s Geocomputation really began life when Jon Reades joined the Department of Geography as Lecturer in Quantitative Human Geography and James Millington switched from his Leverhulme Fellowship to become Lecturer in Physical and Quantitative Geography. Together they kick-started the Geocomputation and Spatial Analysis (GSA) pathway through the undergraduate Geography degree, and soon after Naru Shiode joined as Reader in Geocomputation and Spatial Analysis. Now we’re looking to expand even further with the appointment of a Lecturer in Spatial Analysis.

The person we’re looking for will have expertise in spatial analysis and computational methods for understanding geographical systems. They will contribute to the delivery of the GSA pathway which emphasizes quantitative methods, spatial statistics, programming, simulation modelling, and behavioural ‘big data’. The pathway makes use of free and open source software where possible, and the successful candidate will also likely have expertise in tools like R and Python.

Alongside teaching, of course we’re also looking for someone who will contribute to the capacity of the Department of Geography to undertake world leading research, education and public engagement activities. The particular substantive area of research interest is open but should be broadly aligned with the research interests of existing members of the Department. Engagement with other departments and programmes, such as Informatics and Health, to deliver world-leading and boundary-pushing knowledge would also be welcomed.

You can find full details of the position and how to apply online. For an informal discussion of the post please contact Professor Nick Clifford or feel free to contact existing members of King’s Geocomputation.

And this isn’t the only opportunity to join King’s – there are currently multiple positions open across the Department of Geography that could potentially contribute to King’s Geocomputation activities. These positions are at Professor and Teaching Fellow levels.

All deadlines for applications are 30th March 2016, so get cracking!