CUSP London Seminar: Dani Arribas-Bel

This past Thursday we were really lucky to catch Dani Arribas-Bel, Senior Lecturer in Geographic Data Science at the University of Liverpool and major contributor to PySAL, on his way back home following two weeks’ teaching in the Caribbean. Dani kindly agreed to give a talk in two parts on “Infusing Urban and Regional analysis with Geographic Data Science” (‘GDS’) which we will summarise below… As one of the first CUSP London-branded seminars, it was great to see so many Urban Informatics staff and students there (and even a few from UCL’s CASA!)

Sexiest job of the 21st century…

Geography & Computers

The first half of Dani’s talk covered highlights from a recently-published paper in Geography Compass titled “Geography & Computers: Past, Present, and Future” (an author pre-print is available via KCL’s Institutional Repository); in it, Dani and KCL’s Jon Reades link shifts in computing power and access to shifts in the ways in which geographers use computers to ‘do’ geography.

The basic contention is that there have been three waves of change that they (we) summarise as: 1) a computer in every institution (50s–70s); 2) a computer in every office (80s–00s); 3) a computer in every thing (10s–). We don’t need to revisit the article in full here since highlights are available in a previous blog post, but Dani’s focus was on the links to ‘data science’ the ‘sexiest job of the 21st century‘.

This led through to a discussion of ‘data-driven methods’ which, to a geographer, can sound like putting the cart before the horse. However, it’s important to keep in mind that we, as researchers, have little to no control over how the kinds of data underpinning a (geographic) data science are created and therefore need to adapt our approach to the data, and not the other way around.

I particularly appreciated Dani’s observation on the importance of data processing/handling as part of this shift: sometimes dismissed as ‘mere cleaning’, this stage is critical to ensuring that the data is both well-understood (shows what we think it shows) and fit-for-purpose (does what we want it to do).

I’ve seen the term ‘feature engineering‘ pop up in my own news feeds with increasing regularity and that has a nice ring to it (it’s engineering, not cleaning!) but it doesn’t quite capture the full scope of what good data science really entails. And it also doesn’t take into account the ‘baking’ of geo-data that is really required to ensure methods and models are appropriate.

Dani wrapped up this section with a discussion of how GDS can serve as the interface between geographers and data scientists, supporting the co-production of systems (a.k.a. tools), methods (spatially aware ML), and epistemologies (ways of knowing that are appropriate to these types of data).

Applications of Geographic Data Science

The second half of Dani’s talk covered a work-in-progress using a large building data set from Spain to delineate urban and employment boundaries. This nicely illustrated one of the key concepts elaborated in the first half of the talk: the importance of data-driven methods in geographical data science.

The question Dani and his co-authors are exploring is how one can meaningfully delimit the spatial extent of urban areas and economic activity with the minimum number of prior assumptions about spatial configuration or ‘auxiliary geographies’; by this we mean using other steps or data, such as rasterisation or regional boundaries, to constrain the process to our preconceived notions of what the answer ‘should be’.

The issues with rasterisation and the MAUP are well-known, but what do you do when you have 15 million data points to cluster and can no longer load the data set into memory? This is what we mean by data-driven methods: Dani’s exiting addition (which prompted a good deal of questioning from the audience) is a way to make an existing algorithm work not only in a large data context but which also does so in a way that works around what I feel is an important conceptual flaw in the existing algorithm to give you insights into the robustness of your results!

Such a method is not without theory, nor without empirical input: Dani and his colleagues use research findings on commuting distances and employment to provide essential parameters. I’m not able to share additional details at this stage, but I’m really looking forward to seeing this algorithm ‘in the wild’ since it addresses a number of issues that I have with some work that I’m (slowly) undertaking…

Understanding Gentrification through ML

Although it has taken rather a long time to see the light of day, our just-published paper is one of the reasons I love my job: drawing on a mix of data science and deep geographical knowledge, we look at the role that new Machine Learning (ML) techniques – normally seen as just a ‘black box’ for making predictions – can play in helping us to develop a deeper understanding of gentrification and neighbourhood change. For those of a ‘TL;DR’ nature (or without the privilege of an institutional subscription!), we wanted to share some of our key ideas in a more accessible format. Continue reading

MoDS: Mapping Knowledge with Data Science (MSc + PhD Studentship)

Although we had some great responses to our initial call, we’re still looking for the ‘right’ candidate for this fully-funded studentship that is open to both undergraduate finalists as well as completing Masters students. The project involves the application of data science techniques (text-mining, topic modelling, graph analysis) to a large, rich data set of 450,000+ PhD theses in order to understand the evolving geography of academic knowledge production: how are groundbreaking ideas produced and circulated, and how does researcher mobility and institutional capacity shape this process?

We’re looking for a great candidate (see ‘pathways’ below) with a demonstrable interest in interdisciplinary research – you will be working in collaboration with the British Library at the intersection between geography, computer science, and the humanities, and this will present unique challenges (and opportunities!) that call for resourcefulness, curiosity, and intellectual excellence. Continue reading

MoDS: Mapping Knowledge with Data Science

I’m really excited to announce the latest addition to our growing stable of computational geography research: a fully-funded ESRC studentship involving the application of cutting-edge techniques (text-mining, topic modelling, graph analysis) to a large, rich data set of 450,000 PhD theses in order to understanding the evolving geography of academic knowledge production: how are groundbreaking ideas produced, circulated, and ultimately succeeded, and how do issues such as researcher mobility and institutional capacity shape this process?

We’re looking for a stellar candidate (either undergraduate or Masters-level) with a demonstrable interest in interdisciplinary research – you will be working at the intersection between disciplines and this will present unique challenges (and opportunities!) that call for resourcefulness, curiosity, and intellectual excellence.

Project Overview

The British Library manages EThOS, the national database of UK doctoral theses, which enables users to discover and access theses for use in their own research. But the almost complete aggregation of metadata about more than 450,000 dissertations also enables us to begin asking very interesting questions about the nature and production of knowledge in an institutional and geographic context across nearly the entire U.K., and this anchors the project in quintessentially social science questions about the impact of individuals, work, and mobility on organisations and cultures.

However, textual data of this scale is solely interpretable and navigable through ‘distant reading’ approaches; so although it remains rooted in the interests and episteme of the social sciences, the research involves genuinely interdisciplinary work at the interfaces with both the natural sciences and the (digital) humanities! At its heart, this project is therefore an exciting example of ‘computational social science’ (Lazer et al. 2009) in that it involves the application of cutting-edge computational techniques to large, rich data sets of human behaviour.

Ultimately, this project seeks to understand changes in the U.K. geography of academic knowledge production over time and across two or more disciplines. All applicants are therefore expected to demonstrate an interest in the underlying social science research questions and (at a minimum) basic competence in programming. Additionally, the successful applicant for the 1+3 route would be expected to successfully complete King’s MSc Data Science programme, while the successful +3 applicant would be expected to demonstrate a degree of existing facility with core analytical approaches.

For more information on the project, please see here.

Studentship type

1+3 (1 year Masters + 3 year PhD) or +3 (PhD only), subject to candidate’s existing academic/professional background. For applicants with a social science background we are suggesting King’s MSc Data Science programme. For applicants with a natural science background we will need to discuss how best to achieve a grounding in the social sciences.

Application deadline

31 January 2018

Feature Talk: Prof. Sergio Rey talks about PySAL, Open Source & Academia

We’re really pleased to announce that on Wednesday, 22 February Professor Sergio Rey, of the School of Geographical Sciences and Urban Planning at Arizona State University will be discussing the Python Spatial Analysis Library (PySAL). His talk will provide an overview of PySAL and illustrate key components of the library drawing on examples from regional inequality dynamics and urban analysis.

Future plans for PySAL and related projects will also be outlined. Lessons learned in directing a distributed, open source project will be shared with a particular emphasis on the challenges and opportunities found at the intersection of open source and the academy.

The talk will be followed by drinks and a chance to speak informally with Prof. Rey, or just to mingle and chat with other researchers.


Wednesday 22 February at 5:30pm


Room S-3.20, Strand Building, WC2R 2LS

Download the flyer: serge-rey-talk-22-february.

The Full Stack: Tools & Processes for Urban Data Scientists

Recently, I was asked to give talks at both UCL’s CASA and the ETH Future Cities Lab in Singapore for students and staff new to ‘urban data science’ and the sorts of workflows involved in collecting, processing, analysing, and reporting on … Continue reading