MoDS: Mapping Knowledge with Data Science

I’m really excited to announce the latest addition to our growing stable of computational geography research: a fully-funded ESRC studentship involving the application of cutting-edge techniques (text-mining, topic modelling, graph analysis) to a large, rich data set of 450,000 PhD theses in order to understanding the evolving geography of academic knowledge production: how are groundbreaking ideas produced, circulated, and ultimately succeeded, and how do issues such as researcher mobility and institutional capacity shape this process?

We’re looking for a stellar candidate (either undergraduate or Masters-level) with a demonstrable interest in interdisciplinary research – you will be working at the intersection between disciplines and this will present unique challenges (and opportunities!) that call for resourcefulness, curiosity, and intellectual excellence.

Project Overview

The British Library manages EThOS, the national database of UK doctoral theses, which enables users to discover and access theses for use in their own research. But the almost complete aggregation of metadata about more than 450,000 dissertations also enables us to begin asking very interesting questions about the nature and production of knowledge in an institutional and geographic context across nearly the entire U.K., and this anchors the project in quintessentially social science questions about the impact of individuals, work, and mobility on organisations and cultures.

However, textual data of this scale is solely interpretable and navigable through ‘distant reading’ approaches; so although it remains rooted in the interests and episteme of the social sciences, the research involves genuinely interdisciplinary work at the interfaces with both the natural sciences and the (digital) humanities! At its heart, this project is therefore an exciting example of ‘computational social science’ (Lazer et al. 2009) in that it involves the application of cutting-edge computational techniques to large, rich data sets of human behaviour.

Ultimately, this project seeks to understand changes in the U.K. geography of academic knowledge production over time and across two or more disciplines. All applicants are therefore expected to demonstrate an interest in the underlying social science research questions and (at a minimum) basic competence in programming. Additionally, the successful applicant for the 1+3 route would be expected to successfully complete King’s MSc Data Science programme, while the successful +3 applicant would be expected to demonstrate a degree of existing facility with core analytical approaches.

For more information on the project, please see here.

Studentship type

1+3 (1 year Masters + 3 year PhD) or +3 (PhD only), subject to candidate’s existing academic/professional background. For applicants with a social science background we are suggesting King’s MSc Data Science programme. For applicants with a natural science background we will need to discuss how best to achieve a grounding in the social sciences.

Application deadline

31 January 2018


Feature Talk: Prof. Sergio Rey talks about PySAL, Open Source & Academia

We’re really pleased to announce that on Wednesday, 22 February Professor Sergio Rey, of the School of Geographical Sciences and Urban Planning at Arizona State University will be discussing the Python Spatial Analysis Library (PySAL). His talk will provide an overview of PySAL and illustrate key components of the library drawing on examples from regional inequality dynamics and urban analysis.

Future plans for PySAL and related projects will also be outlined. Lessons learned in directing a distributed, open source project will be shared with a particular emphasis on the challenges and opportunities found at the intersection of open source and the academy.

The talk will be followed by drinks and a chance to speak informally with Prof. Rey, or just to mingle and chat with other researchers.

When:

Wednesday 22 February at 5:30pm

Where:

Room S-3.20, Strand Building, WC2R 2LS

https://goo.gl/maps/7zmjc6xGmuA2

Download the flyer: serge-rey-talk-22-february.


GeoCUP: supporting a flexible student computing environment

Over the past year, we’ve been supporting our first cohort of Geocomputation & Spatial Analysis (GSA) students as they learn to code and work with geo-data in an open computing context (predominantly FOSS). This post reflects on some of the problems – and solutions – that emerged as a result.

GeoCUP.v1

The first incarnation of GeoCUP (short for GeoComputation on a USB Platform) was a system-on-a-key described in a previous post. With the support of the Department and Faculty, USB keys were supplied to students at the start of term as follows:

  • 64GB USB 3.0 keys
  • Ubuntu Linux 14 LTS release (32-bit)
  • Pre-installed software:
    • R
    • QGIS
    • Canopy
    • Assortment of specified Python libs
    • Mozilla Firefox
    • Dropbox

The idea was that students could launch GeoCUP at boot time on a cluster machine from the USB key and would thus be running a full Linux distribution over which they had complete control. In an institutional computing context this was as close as we could get to giving them their own computer to play with, break, and manage.

We had also expected, based on what we’d seen with Linux ‘Live’ distributions that it would be feasible to have a key that would work with multiple types of firmware (including Apple’s EFI) and that students could therefore also run GeoCUP at home.

A final advantage would be the ease of replacing a lost key: since all their code was in Dropbox all they needed to do was reconnect Dropbox on a replacement key and they’d be up and running again in no time.

Ubuntu Screen Grab

Unexpected Issues

No well-laid plan survives much contact with the real world, and several issues emerged in the run-up to launch day:

  1. It is not (yet?) possible to have a full Linux distribution (as opposed to an essentially static ‘live’ distribution) that will start up at boot time on both Macs and PCs. Indeed, there are also issues with different vendors’ PC hardware being different enough from the machine on which GeoCUP.v1 was developed for this facility to be patchy, at best, on generic PCs as well. So portability proved to be rather more limited than we’d expected and hoped.
  2. Formatting the keys took much longer than expected. Since the keys needed to be bootable, the only way to write them was using the ‘disk duplication’ utility; however, dd is not able to distinguish between largely empty space and used space since it’s blindly copying the entire disk. So even though only about 20GB of the 64GB was in actual use, each key took about 5 hours to write. We were able to write up to 7 keys at once by combining dd with tee: as follows:
    dd 
    if=/Volumes/GeoCUP/geocup-20150917.bak/backup bs=524288 
    | sudo tee 
    /dev/disk3 ... /dev/disk9 > /dev/null
    

    We’d also note that using dd meant that we could only use 64GB USB keys, so if students lost a key and needed to replace it, they had to source exactly the same-sized key.

These start-up issues were then supplemented by performance issues after roll-out:

  1. Hardware buffering was much worse than expected. We had, naively, assumed that USB3 would provide sufficient bandwidth for our purposes and that read/writes would be fairly modest. We were wrong: the system frequently blocked completely for up to 10-12 seconds while data was written to/read from the USB key, and the entire Linux UI became unresponsive… which was rather frustrating for the students.
  2. As well, the pace of I/O usage of a full Linux distribution had a propensity to expose any physical weaknesses in the flash devices, so we had to re-flash probably 10–20% of the students’ keys over the course of the year.
  3. These performance issues then led some students to begin using their own laptops running OSX or various flavours of Windows instead, producing a proliferation in the number of students using the wrong Python libraries as platform support on some geodata and spatial analysis libraries is limited.
  4. All of this was compounded by the fact that some students were remembering to run
    sudo apt-get update

    on a regular basis, while others didn’t. So we even ended up with different versions of libraries on GeoCUP itself, and that led to code that would fail to run on one system but have no issues on another.

  5. A final ‘nail in the coffin’ of GeoCUP.v1 was the fact that one of our Ubuntu repositories was accidentally pointing at a development repository, not the stable one, and so one of the updates knocked out most of QGIS’ modelling functionality!

These were all serious issues, but in spite of them there were a number of students who reported that using GeoCUP had nonetheless helped the module as it gave them full control of their system, exposed them to power-user features such as the bash shell, and opened their eyes to some of the practical problems entailed in managing a system and a codebase. They also got to watch us doing some fairly frenetic on-the-fly debugging.

So with that in mind…

GeoCUP.v2

Virtualbox_logo

Part way through the year we began to experiment with Oracle’s VirtualBox platform as a way to enable students to run GeoCUP on their own computers (as that had signally not happened with GeoCUP.v1). Although there are higher-performance virtualisation platforms out there, VirtualBox is free, open source software so there were no licensing or cost implications to rolling this out on cluster systems or in suggesting that students download it to their personal computer.

GeoCUP.v2 is built as follows:

  • Ubuntu Linux 16 LTS (64-bit)
  • Anaconda Python
  • Rodeo & Atom IDEs
  • Dropbox
  • Google Chrome
  • QGIS

We’ve adapted installation scripts posted by Dani, up at Liverpool University for use with our own GeoCUP distribution since this speeds up the configuration and updating of the system as new Ubuntu distributions are released. You find them on GitHub: github.com/jreades/GeoCUP-Vagrant.

The main advantages of this shift are:

  1. The VDI (Virtual Disk Image) file is decoupled from the physical storage media, so as long as the image fits on the device then students can bring in whatever hardware they like (hard drive, flash drive, personal computer…) and run GeoCUP from that hardware.
  2. The VDI file is smaller and copying to new hardware uses the normal file copying mechanisms so ‘installation’ is also radically faster (we also only copy 20GB of data, instead of 64GB).
  3. By ditching Canopy for Anaconda we can also ‘fix’ the Python libraries using a configuration file so as to avoid last-minute problems caused by the release of new versions. We can then update those libraries to new, stable versions by distributing an upgrade script to the students rather than relying on manually-typed commands.

Alongside this, however, we retain the flexibility to give students administrator rights over their (virtual) machine, to install new software on the fly, and to take advantage of software updates without having to embed them in a centralised IT upgrade cycle. We also think that the virtualisation approach has significant advantages for IT services because they don’t have to monkey about with the BIOS of the cluster machines since the entire process is now software-based.

GeoCUP.v3 & Beyond

In the long run we’d like to automate even more of the distribution process so that we are no longer even responsible for ‘burning’ new USB keys or given students a drive from which to copy the latest version of GeoCUP.

Tools that enable just this sort of approach are beginning to surface: Vagrant and Docker are the two leading contenders at the moment, though they do slightly different things. I’ve been impressed by the way that Dani’s Vagrant-based distribution allows you to download a 2GB file containing a full Linux server distribution, have it automatically configured when it first runs, and then interact with the system via Jupyter Notebooks: it’s a fairly lightweight, but fully-functional Python-based geodata analytics ‘server’.

There are several problems with using this approach in our context:

  1. I’ve had a lot of problems getting Vagrant to also run in a ‘headed’ context, and since we want students to use the latest versions of QGIS as well as unsupported (by IT Services) IDEs such as Rodeo or Atom, we can’t drop the Linux desktop entirely and just run the notebook server.
  2. We can’t have students downloading even a 2GB file on to the cluster machines since a) they have nowhere to keep it in their allocated 200MB of online storage, and b) multiplying that 2GB overhead by 30 students is suddenly quite a big ‘hit’ to the network at the start of every class.
  3. We also can’t run Jupyter on a server somewhere on campus since every users runs with the same permissions as Jupyter and there’s no separation of user spaces as I understand it.

I suspect that these issues will be remedied in the not-too-distant future, and James and I will be exploring some of the possibilities with colleagues at ASU and UNSW over the coming year.

Finally, a /ht to Ryan Barnes, one of our own Geography grads who did the heavy lifting on version 1 of GeoCUP.


Talk: Urban hierarchies and scaling laws

This afternoon’s seminar by CASA’s Dr. Elsa Arcaute will be of interest to a wide range of students and staff at King’s – with a background in theoretical physics and complexity, Elsa now studies how urban and regional systems scale and divide, and how these aspects are expressed in infrastructure and the built environment. To put it another way: where does London end? 4:30pm today in the Pyramid Room (K4U.04) and followed by wine and soft drinks.

Abstract

In this talk we look at the different ways to obtain definitions of cities and their relevance to urban scaling laws. We also look at the hierarchical structure of Britain through a percolation process on the road network. We observe how at a large scale the divisions relate to well-known fractures of Britain, such as the North-South divide, while at small scales cities can be observed at a transition where the fractal dimension of the clusters has a maximum. The clusters defined at this distance threshold are in excellent correspondence with the boundaries of cities recovered from satellite images and the previous method.

About Elsa

Elsa Arcaute, is a Lecturer in Spatial Modelling and Complexity at the Centre for Advanced Spatial Analysis (CASA) at University College London. She is a physicist with a masters and a PhD in Theoretical Physics from the University of Cambridge. She decided to move to the field of Complexity Sciences and joined the Complexity and Networks group at Imperial College London. There she developed models on self-regulation for social systems, extracting fundamental behaviours from experiments on ant colonies to test on robots, and to implement for an intervention in an Irish eco-village. In 2011, Elsa moved to CASA, joining a project funded by the European Research Council and led by Prof. Michael Batty, on morphology, energy and climate change in the city. Since then Elsa has been working on applying complexity sciences to urban systems.


Aspect-Slope Maps in QGIS

While working with Naru to design our new 2nd year GIS methods training course (with parallel QGIS and ArcGIS streams!), I came across a rather striking map on the ESRI blog that managed to combine both slope (steepness) and aspect (direction) in a single representation. This post explains both a problem with the way that the colour scheme was specified and how to replicate this type of map in QGIS (with style sheet).

The Inspiration

Here’s Aileen Buckley’s Aspect-Slope map in all its glory – this is a the area around Crater Lake, Oregon, and you can see that it neatly captures both the direction of slopes (aspect) and their steepness (degree). So features like the crater stand out really clearly, as do what I assume is evidence of lava flows and such, while lesser features gradually fade towards grey, which means flat.

aspect-slope_map

So these maps combine two properties:

  • The direction of the slope is expressed in the hue – different directions are different colours.
  • The steepness of the slope is expressed by its saturation – steeper slopes are brighter colours.

Rather than just jump into providing you with a style sheet, I think it’s useful to trace this back to its constituent parts as it turns out that ESRI has made a mistake in setting up their colour maps.

Aspect Mapping

Aspect maps give the viewer a sense of the direction in which various slopes derived from  a Digital Terrain Model (DTM) lie – typically, we do this by dividing the angle of the slope into eight quadrants: North, Northwest, West, Southwest, South… well, you get the idea.

Here’s an example of what the standard aspect map out of ArcMap looks like as posted by the Rural Management and Development Department of Sikkim:

10sikkim-village-aspect

This, helpfully, gives us the ranges that we’ll need for our aspect-slope map. Note, however, that we don’t really have any idea how steep any of these obvious hills are.

Slope Mapping

Slopes maps are, obviously, intended to fill in the gap in terms of how steep an area is. Typically, we can measure this as either a degree value from one raster cell to the next of the DTM or as a percent/ratio (1-in-10 gradient = 10%). Here’s a nice example looking at the link between coffee bean growing areas and slope in Costa Rica:

costarica_bean_atlas_slope-rb-new

Unlike the aspect map, the divisions used in the slope map seem to be largely arbitrary with no real consensus on the mapping between measured steepness and terminology. The clearest guidance that I could find came from The Barcelona Field Studies Centre and looked like this:

Slope (%) Approx. Degrees Terminology
0.0–0.5 0.0 Level
0.5–2.0 0.3–1.1 Nearly level
2.0–5.0 1.1–3.0 Very gentle slope
5.0–9.0 3.0–5.0 Gentle slope
9.0–15.0 5.0–8.5 Moderate slope
15.0–30.0 8.5–16.5 Strong slope
30.0–45.0 16.5–24.0 Very strong slope
45.0–70.0 24.0–35.0 Extreme slope
70.0–100.0 35.0–45.0 Steep slope
> 100.0 > 45.0 Very steep slope

A Better Aspect-Slope Map Scheme

In order to create an aspect-slope map, we need to combine the two data ranges into a single number that we can use as a classification, and  this is where the ESRI blog approach goes a bit off the rails. In their approach, the ‘tens column’ (i.e. 10, 20, 30, …) represents the steepness – so 0–5 percent slope=10; 5–20 percent slope=20; and 20–40 percent slope=30 – and the ‘units columns’ (i.e. 0–8) represents aspect – so 0–22.5 degrees=1; 22.5–67.5 degrees=2; etc.

The problem with this approach is that you have a lot of problems if you want to add or remove a steepness category: in their example the highest value is 48, which means ‘highest value’ and an aspect of Northwest. But what if decide to insert a class break at a 30 percent slope to distinguish more easily between ‘Extreme’ and ‘Steep’? Well, then I need to redo the entire classification above 30… which is really tedious.

If we switch this around such that aspect is in the tens column (10–80) and steepness in the units column (0–9) then this becomes trivial: I just add or remove breaks within each group of 10 (10–19, 20–29, etc.). No matter how many breaks I have within each aspect class, the overall range remains exactly the same (10–89 if you use the full scale) regardless of the steepness classification that I’m using. It’s not just easier to modify, it’s easier to read as well.

Implementation in QGIS

For all of this to work in QGIS, you need to generate and then reclassify a slope and an aspect analysis from the same DTM. You can do this using outputs from the raster Terrain Analysis plugin (that’s the point-and-click way), or you can build a model in the Processing Toolbox (that’s the visual programming way). I personally prefer the model approach now that I’ve finally had a moment to understand how they work (that’s a topic for another post), but one way or the other you need to get to this point.

Regardless of the approach you take (manual or toolbox), once you’ve got your two output rasters you then need to reclassify them and then combine them. Here’s the mapping that I used to reclassify the two rasters as part of a model. You would copy these lines into text files and then use the GRASS GIS reclassify geoalgorithim while specifying the appropriate reclassification file.

Aspect-Reclassify.txt

0.0 thru 22.499 = 10
22.5 thru 67.499 = 20
67.5 thru 112.499 = 30 
112.5 thru 157.499 = 40
157.5 thru 202.499 = 50
202.5 thru 247.499 = 60
247.5 thru 292.499 = 70
292.5 thru 337.499 = 80
337.5 thru 360.5 = 10

Slope-Reclassify.txt (for percentage change)

0.0 thru 4.999 = 0
5.0 thru 14.999 = 2
15.0 thru 29.999 = 4
30.0 thru 44.999 = 6
45.0 thru 100.0 = 8

So that’s a 5-class steepness classification, but you could easily set up more (or fewer) if you needed them.

Once you’ve reclassified the two rasters it’s a relatively simple matter of raster layer addition: add the reclassified slope raster to the reclassified aspect raster and you should get numbers in the range 10–88.

Here’s the model that I set up (as I said above, more on models in another post):

Aspect-Slope-Model

Specifying a Colour Map

Taking the ‘Aspect Slope Map’ output, all we need to do now is specify a colour map. I took the colours posted by ESRI in the colour wheel (as opposed to the ones specified in the text) and converted them to hexadecimal since that was the easiest way to copy-paste colours. I think, however, that I’ve ended up with a slightly ‘muddier’ set of colours than are in the original Crater Lake set as you’ll see with my ‘Sussex Aspect-Slope Map’ below:

Sussex Aspect Slope Map

And, finally, the QGIS style sheet file is here (sorry about the zip format but .QML is not a recognised style type):

Aspect Slope Style – Close to Original.qml

Wrap-Up

I’m sure that this style sheet could be further improved (and may even try to do so myself, though I’d also welcome submissions from anyone with some time on their hands), but at least this gives users and easy way to combine representations of slope and aspect in a single map using a reclassification scheme that is simple to extend/truncate according to analytical or representational need. Enjoy!


The Future of Geocomputation

On Friday 18 December, we hosted a workshop on ‘the future of geocomputation’ involving over 30 researchers from across the UK and Ireland. We’re still working to synthesise and write up the discussions that made up the second half of the workshop, but below are the presentations that kicked off the day. Some of the tweets from the day are embedded below but from more see our storify for the day or search #fogeocomp.

Introduction

To set the context for the day’s discussion we argued that the future of geography is cheap – cheap hardware and software, cheap data and code, and ‘cheap’ (by which we mean simple) interaction with sophisticated geographical models.


 Chris Brunsdon’s Keynote

In his opening keynote, Chris set out an ambitious agenda for geocomputation that called for a deeper understanding of geographical processes, data visualisation (with provocative images from his caricRture  library) and, most challenging of all, Approximate Bayesian Computation (ABC).

Chris highlighted some of the problems with GIS and argued that geographers need to rediscover coding for reproducibility, for flexibility and for openness in their research.


 Alison Heppenstall’s Keynote

Alison’s talk rounded out Chris’ preliminaries, highlighting the role that Agent-Based Models (ABMs) could play in deepening our understanding of spatial processes by bringing aspects of the real world and human behaviour into our computational models. We see these as complementary to, rather than competing with, the more statistical aspects explored by Chris.

Alison also highlighted that despite the large number of platforms that have been developed now for agent-based modelling, and its growing use across academia, its adoption and use in policy-making has been limited. Alison puts this down to the great uncertainties that often remain and argued a need for increased focus on model calibration and verification.


Alex Singleton’s Keynote

Finally, Alex wrapped up with his thoughts on how we can train the next generation of students in the tools and concepts that they will need to get to grips with the issues explored by Alison and Chris.

Alex pointed out how ‘point-and-click’, button-pushing GIS classes in undergraduate degree programmes fails to teach anything about the process of data analysis and research and called for us to move beyond ‘sleepy’ geography curricula.


Discussion

After the keynotes we broke into smaller groups to discuss Training the Next Generation, Data, Tools & Processes and Setting an Agenda for the next 10 Years. We’re still working on summarising and synthesising the discussions from these groups so look out for that soon.

Before finishing and continuing discussions over wine, Andy Evans from University of Leeds outlined plans for the Geocomputation 2017 conference. We’re looking forward to that already but there’s more to come in 2016 first!


Avoiding Email for Academics

How many times do you return from field work or a holiday to find that most of your first day is spent deleting emails that are no longer applicable or were never relevant to begin with? Or, worse, you are asked to address issue ‘x’ but have no history or documentation to explain how ‘x’ became a problem, what solutions have been considered, or even why you are the one to solve it! Asana and Slack can help with that.

What are they?

In a previous post I pointed to some ways that peer programming could be adapted for module development, now I want to turn my attention to developer tools that help programmers to avoid sending email in the first place. In most cases, these are cloud-based tools that try to make up for some of the basic failures of email:

  1. It is just about impossible to collaborate on a long, complex document via email since you have no versioning control and it can become very, very difficult to know which is the latest version.
  2. Only people who participated in the initial exchanges have access to a full discussion history. Anyone joining later either needs to be forwarded the entire history, or they basically have to jump in blind.
  3. A lot of email traffic amounts to “Have you done this?”, “No, not yet, as  I thought you were doing it”, “Oh, ok, now I have”… Does all of this actually need an email? What if someone else wants to comment?
  4. Email is not particularly ‘live’ in that you can’t see someone else’s reply while still composing your own.
  5. Email does not categorise easily – sure, you can categorise by sender, header, or keyword, but that’s all on you. Why not ask the sender to apply some ‘tags’ to help you filter out the ‘chat’ from the ‘this information is critical to the success of the project’?

Slack

Slack is designed to help keep lines of communication open within teams, but without that generating a flood of emails mixing in everything from cat photos to “Have you seen what ‘y’ have done? We need to get on top of this right away!” The service is a kind of ingenious blend of IRC, Twitter, Email, and SharePoint. Yes, I really did write that sentence sober.

So how does it work? When you join Slack you join one or more ‘teams’ which are the basic ‘unit’ of interaction on Slack. You can join many different teams, but the messages of one team remain private to that team, and to keep you from accidentally posting to the wrong team you have to ‘switch teams’ in order to post content to other teams.

Screenshot from King's Geocomputation Team channel.
Screenshot from King’s Geocomputation Team channel.

Within each team you can have any number of ‘channels’. So that’s the IRC bit: each channel has a purpose, and this can be everything from ‘deadlines’ to ‘random’ (for the cat photos, naturally). Using the Twitter-familiar hashtags you can categorise your messages/posts into one or more channels. There’s also the ‘@’ sign to draw an item to a team member’s attention, and so on. So that all cuts back on cc’ing and to-ing-and-fro-ing, and more importantly: you’ll never get the entire history included in the reply-to-all.

The final, SharePoint-y bit is what makes Slack into a fully-fledged business app: using service integrations you can automatically consume data published across a variety of services (github, Twitter, blogs, Dropbox, etc.) and route it to a particular channel. So if you want to track a competitor’s press releases you can do that. If you want to share a Dropbox document for collaborative editing you can do that too.

Oh, and there are clients for every major O/S: Mac, Windows, iOS, Android, etc. so you can keep abreast of things whether you’re at your computer or away from your desk.

Asana

Asana is much more complex service built around task management: at its most basic it’s a kind of shared checklist. But by organising things into projects, checklists and sub-checklists, and assigning them to individuals or groups, as well as giving them due dates, you can set up some pretty complex flows of dependencies. In fact, the wonderfully-named Instaganntt has even come up with a way to turn your Asana project into that dreaded – but still expected by most RCUKs – waterfall chart.

Project overview page from Asana.
Project overview page from Asana.

Again, as far as I’m concerned this is all about reducing or eliminating email: you can set up Asana to only watch certain tasks, to send you a summary of what has changed, alert you to what you have due next week, etc. Or you can tell it to shove off and leave you alone. So there’s a pull, not push option if you really dislike email reminders. Most importantly, you no longer need to ‘chase’ people just to find out the status of a job, you can just see if they’ve marked it as completed or not. You can also categorise tasks, linking them to one or more projects or other tasks, so some quite arcane structures can result.

As you’d expect, you can attach files, comment on tasks, reassign tasks, split them up, merge them, and so forth all within the rich web interface. Like Slack there are service integrations that allow you to add more functionality and to automatically integrate things like bug tracking and such, but I’ve not made use of those. As you’d expect these days there are also mobile versions for iOS and Android – they’re a little more limited as some functionality is missing, but more than good enough for keeping tabs on things while out of the office (or for catching up when you get back if you really do manage to switch off).

Relevance to Teaching & Administration

If you’ve had to use Moodle – the Open Source ‘blackboard’ app – then you’ll know that its 2-way communications functionality is rather poor: it’s fine for blasting the students with a message about exams or a change to the readings, but it’s not exactly encouraging of course-related ‘chat’ as it comes across as rather clunky (you need to navigate through several pages to even get to the messaging functionality for a given module) and very formal.

We’re going to be experimenting with Slack for our new Geocomputation pathway which starts in a week’s time in the hopes that it encourages students to support one another while also giving us a way to spot recurring problems or questions (and to not have to reply multiple times via email to those recurrent problems). By enabling more fluid conversation, and giving us a way to channel communications into appropriate categories, we’re hoping that students will be more open about the challenges they’re facing and will put us in a better place to help them learn to programme (as well as do statistics and spatial analysis!).

Of course, it would get hard on us if the students started to expect replies at 2a.m., but that’s where we hope the more open structure of Slack will come to the rescue: chances are, quite a few of our students will be working late to master Python and submit the requisite code on time, so they can help one another in a way that’s a lot more similar to what they’d encounter in the real world (where Slack is used by a lot of software shops) than what they’ve normally had to deal with in Moodle.

Asana is less obviously relevant for students, and here I’m using mainly for organising our thinking and planning on one of the departmental committees. We’ll see how well that goes, but early indications are pretty positive: no one has complained to me that the site was unusable or nonsensical, and I’m hoping that my colleagues will find it useful to have categorised ‘to do’ lists and a clearer understanding of who is working on what, and what issues are outstanding (e.g. unassigned or unresolved). So, with luck, it will again mean less email, more effective communication, and a higher level of ‘output’

Conclusion

I’m finding both of these tools really helpful for managing both the ‘meta’ part of teaching and also for grant/project management work. If you have other favourite tools for unblocking your inbox then feel free to share!

[1] There are plenty of other services, of course, but these are the ones I’m familiar with.


Peer Programming for Academics

As we prepare to teach the first year of the GSA pathway, we’ve been experimenting with techniques more commonly used in software development to see if they can help us to deliver quality and integration in our new modules right from the start. This post will explore the logic of Pair Programming.

What is Pair Programming?

In the traditional software development arena applications are designed by a group of experts; they then hand a set of requirements over to a programmer who heads off to their desk to write the code that will meet those requirements. If the requirements are sufficiently well thought-out and extensive then delivering code that meets those requirements means the project is a success.

There’s just one problem: how often has anything complex been sufficiently well thought-out that individuals, working in isolation, have been able to deliver something integrated and feature-complete on the first go? Actually, there’s a second problem: it is also possible to write something that fully meets the requirements, but doesn’t meet the needs of the application or the organisation. While I work away on ‘my’ bit, I miss a major issue that was also hidden from the application’s designers because no one has an eye any longer on the ‘big picture’ of what the application is supposed to actually do.

It’s into this breach that Agile-derived pair programming steps as a way both to keep developers looking at the big picture, and to enable individuals to access the type of practical knowledge that is only formed through long or diverse experience. Sometimes called peer programming (which is a rather nice terminological link to academia), pair programming matches a ‘driver’ who focuses on the tactical aspects of task completion with an ‘observer’ or ‘navigator’ who “continuously and actively observes the driver’s work, watching for defects, thinking of alternatives, looking up resources, and considering strategic implications” (Williams et al., 2000). In other words, the driver has someone looking over their shoulder… but in a constructive way.

Start them early…

Issues in Pair Programming

Can it work? Programmers aren’t known for their tolerance of being supervised or managed during programming tasks, so there are a number of techniques designed to make this a more constructive experience: for instance, the pair switch roles regularly so that each person ‘drives’ for a while and then puts on the strategic thinking hat for a bit. And repeat. The constant role-switching means that both programmers have an opportunity to do both types of thinking, which builds up practical knowledge and also helps to ensure that many more possible approaches to a problem are considered.

So, by pairing old hands with novices pair programming encourages sharing of ‘best practice’ and yields immediate and frequent feedback during development. That said, you don’t ordinarily pair very experienced programmers with complete novices because the knowledge gap is too wide; it’s common to pair novices and intermediates, or intermediates and experts, with the idea being that the more experienced person still remembers having to learn what their ‘junior’ is trying to understand at the same time as it gives them a chance to systematise their own experience through teaching.

Obviously, some level of social aptitude/sensitivity and trust is also going to be important here, but somehow many Agile firms have managed to make it work. Interestingly, developers actually report finding the process quite enjoyable, while businesses report 40% faster turnaround, more efficient code, and fewer defects (ibid.). And it has been noted that pair programming works best on challenging tasks that call for creativity and ‘sophistication’ (Lui and Chan, 2006).

Applications in Academia

These types of benefits are clearly relevant for thinking about teaching and administration in academia where there is often a poor understanding, especially amongst new hires, of the objectives of a particular task, its rationale, and the range of viable solutions. So while none of us involved in the GSA pathway would claim to be experts in either module design or programming, we thought that pairing would be useful for new module development because we could cover each others’ ‘weaknesses’ while also talking out the overall strategy of the modules themselves.

So far, the results are really promising: although two of us had to invest fully 1.5 days working together (and switching offices since no one else can use my Kinesis Contour keyboard) to develop a week-by-week teaching plan that incorporated pre-class readings, in-class concepts, and practical work, I feel that the result is looking much better – more integrated and with an obvious appreciation of what concepts need to be covered in which weeks in order to bring the students to the final assessment – than if we’d tried to each tackle ‘our’ bit independently. We also brought more ideas and resources to bear on how we might teach each concept and came up with what I think are really good ideas for testing student learning.

Personally, I’ve found it so productive that, where remotely practical, I’m thinking of inflicting it on every team taught module I’m involved in. That said, like all things it’s probably best in moderation and may be most valuable during the planning stage, less so during the “I need to create my PowerPoint slides” stage.

So that’s peer programming – or, in this case, peer planning – and we’ll try too post about the other techniques and tools we’ve experimented with over the coming months.

References

  1. Lui, K.M. and Chan, K.C.C. (2006), ‘Pair Programming productivity: Novice–novice vs. expert–expert’, International Journal of Human-Computer Studies, 64(9):915–925.
  2. Williams, L. and Kessler, R.R. and Cunningham, W. and Jeffries, R. (2000), ‘Strengthening the Case for Pair Programming’, IEEE Software, July/August 2000, pp.19–25.

‘GeoCUP’: Linux System-on-a-Key for Geospatial Analysis Project

We have received funding to develop a system for managing and distributing a full Linux system-on-a-key to students on our new undergraduate pathway. We are looking for an Informatics student (PhD, MSc, or BSc) to research, recommend, develop and test an appropriate solution that meets our needs. Read on for more information.

Background

This Autumn, the Department of Geography is launching an innovative new undergraduate ‘pathway’ in Geocomputation and Spatial Analysis (GSA). The pathway responds to a recognised gap not only in our own module offerings, but across the offerings of UK universities as a whole: the need for geographers with the programming skills to process ‘big geo-data’ using Free and Open Source Software (FOSS) and able to tackle pressing geographical challenges in commercial, governmental, and third-sector data analysis and visualisation.

Effective delivery of this pathway will require students to store and manipulate large data sets, to install and manage new ‘code libraries’ and applications on-demand and as-needed, and to be able to collaborate flexibly on- and off-line across multiple platforms (mobile, personal, and institutional). Within the constraints of managed IT infrastructure these needs can only be met through the use of ‘bootable’ USB flash drives that provide a platform on which open-source geocomputation and spatial analysis tools can be hosted and run.

To meet this need this project will develop the GeoComputation USB Platform (GeoCUP). GeoCUP will allow students to manage and run a Linux-based operating system over which they have full administrative control. This capability is integral to successful learning on the GSA pathway as the innovative nature of student assignments and independent projects requires the use of compiled open source software libraries and tools.

This project therefore seeks to research, configure, develop, and test a management strategy to support this bootable USB flash drive approach so that it: i) enhances student experience of the College’s computing environment; ii) minimises the maintenance demands on staff as this approach cannot be supported by central IT; and iii) creates opportunities for other staff to deploy a similar system when flexibility and agility in computing are called for.

Swiss Army knife with USB key (swissarmy365.co.uk).

Objectives

There are several overarching objectives for how GeoCUP will improve the student learning experience:

  1. An operating system over which students have full control will allow them to maintain and customise their individual instance of GeoCUP to suit their personal computing needs. As the students develop competence in programming and analytical techniques, they will begin to pursue separate, distinct challenges requiring the ability to compile and install code libraries, or even entirely new applications, on-the-fly. This is impossible to achieve in a traditional, tightly-managed computing environment context.
  2. We will be able to maintain and update the ‘master version’ of GeoCUP so that incoming students to the pathway will always be working with the most up-to-date system possible. In addition, should a student lose a USB drive or suffer some other type of data loss, we will be able to quickly provide them with a fully-functioning and up-to-date version of GeoCUP from which to recover. We will also be able to enforce data-protection requirements such as the use of encrypted partitions to ensure that the USB flash drives are unusable and inaccessible without the student’s password.
  3. GeoCUP will be configured with the full set of programming support tools needed to ensure the development of computational (spatial) data analysis skills, including not only Enthought Canopy and QGIS, but also open collaboration and development tools used by technology firms such as PayPal and Google. Many of the required tools are not available at all through managed IT systems, these include: the GitHub versioning tool; the Postgres+PostGIS spatial database; the routino routing application; the RStudio IDE; Dropbox; and the Slack collaboration tool, amongst others. Our intention is to promote students’ employability by grounding their experience in a realistic computing environment as used by commercial and other organisations.

As a result of the ‘real world’ environment GeoCUP will provide, incidental – but by no means insignificant – benefits to student experience, including:

  1. The ‘Slack’ collaboration system functions on all computing platforms, including all major mobile ones, and creates a series of ‘channels’ across which students and staff can communicate in a way that more closely mirrors student preferences: content (including code) is ‘pushed’ in real-time to all devices, can be categorised using hashtags, and serves as a instantly-searchable archive of interactions. This complements the 1-to-1 and 1-to-many format of email and the KEATS ‘broadcasting’ tool, and is expected to encourage dynamic peer support and collaboration, while avoiding repeated “Can you tell me…” messages to staff.
  2. The GitHub version control platform is now the de facto standard for collaborative programming projects in all sectors. It also brings the additional benefit of mitigating data loss in the event of corruption, loss of a USB flash drive, or other unforeseen events. We will therefore be reinforcing for students the importance of integrating code-management into their workflow.
  3. Students will also be able to take advantage of more open, platform-independent cloud-computing resources such as Dropbox and Amazon Web Services (AWS), which is not possible on the existing Microsoft-based SharePoint solution.

More Information

Selected researcher will be paid in accordance with King’s College London guidelines. Project work can begin immediately and must be complete by late-August.

For more information about the project timeline and for expressions of interest (by Thurs 25 June), please contact Jonathan Reades or James Millington in the Department of Geography.