Linking Data in Sydney

 

By Geoff Browell, Head of Archives Services

I was fortunate to attend the biennial Linked Open Data,
Libraries, Archives, Museums summit in early July in Sydney, Australia. I
played a very small role in setting it up, as a member of the organising
committee. The conference is an opportunity for archivists, librarians, museum
curators and information professionals and IT experts to meet and discuss the
latest developments in Linked Data among higher education, heritage and
‘memory’ institutions, worldwide. Delegates have the chance to hear about
successful (and unsuccessful) projects and take part in targeted discussions on
the future of the technology, and encourage new collaborations. The event
features the ‘Challenge’ – an open competition for the best application of
Linked Data in a cultural setting.  The
summit adopts the ‘un-conference’ format without pre-prepared papers, at which
relevant issues can be aired and debated and sub-groups convened to address
specific topics.

View this graph of attendees: https://graphcommons.com/graphs/0f874303-97c2-4e53-abc6-83a13a1a2030

What is Linked Data?

Linked Data is a way of structuring online and other data to
improve its accuracy, visibility and connectedness. The technology has been
available for more than a decade and has mainly been used by commercial
entities such as publishing and media organisations including the BBC and
Reuters.  For archives, libraries and
museums, Linked Data holds the prospect of providing a richer experience for
users, better connectivity between pools of data, new ways of cataloguing
collections, and improved access for researchers and the public.

It could, for example, provide the means to unlock research
data or mix it with other types of data such as maps, or to search digitised
content including books and image files and collection metadata. New, more
robust, services are currently being developed by international initiatives
such as Europeana which should make its adoption by libraries and archives much
easier. There remain many challenges, however, and this conference provided the
opportunity to explore these.

The conference comprised a mix of quick fire discussions,
parallel breakout sessions, 2-minute introductions to interesting projects, and
the Challenge entries.

[photo: Work in progress at the LODLAM summit]

Quick fire points
from delegates

  • Need for improved visualisation of data (current
    visualisations are not scalable or require too much IT input for archivists and
    librarians to realistically use)
  • Need to build Linked Data creation and editing
    into vendor systems (the Step change model which we pursued at King’s Archives
    in a Jisc-funded project)
  • Exploring where text mining and Natural Language
    Processing overlap with LOD
  • World War One Linked Data: what next? (less of a
    theme this time around as the anniversary has already started)
  • LOD in archives: a particular challenge?
    (archives are lagging libraries and galleries in their implementation of Linked
    Data)
  • What is the next Getty vocabularies: a popular vocabulary
    that can encourage use of LOD?
  • Fedora 8 and LOD in similar open source or
    proprietary content management systems (how can Linked Data be used with these
    popular platforms?)
  • Linked Data is an off-putting term implying a
    data-centric set of skills (perhaps Linked Open Knowledge as an alternative?)
  • Building a directory of cultural heritage
    organisation LOD: how do we find available data sets? (such as Linked Open
    Vocabularies)
  • Implementing the European Data Model: next steps
    (stressing the importance of Europeana in the Linked Data landscape)
  • Can we connect different entities across
    different vocabularies to create new knowledge? (a lot of vocabularies have
    been created, but how do they communicate?)

 

Day One sessions

OASIS Deep Image
Indexing (
http://www.synaptica.com/oasis/).

This talk showcased a new product called OASIS from
Synaptica, aimed at art galleries, which facilitates the identification,
annotation and linking of parts of images. These elements can be linked
semantically and described using externally-managed vocabularies such as the
Getty suite of vocabularies or classifications like Iconclass. This helps
curators do their job. End users enjoy an enriched appreciation of paintings
and other art. It is the latest example of annotation services that overlay useful
information and utilise agreed international standards like the Open Annotation
Data Model and the IIIF standard for image zoom.

We were shown two examples: Botticelli’s The Birth of Venus
and Holbein’s The Ambassadors for impressive zooming of well-known paintings
and detailed descriptions of features. Future development will allow for
crowdsourcing to identify key elements and utilising image recognition software
to find these elements on the Web (‘find all examples of images of dogs in 16th
century public works of art embedded in the art but not indexed in available
metadata’).

This product mirrors the implementation of IIIF by an
international consortium that includes leading US universities, the Bodleian,
BL, Wellcome and others. Two services have evolved which offer archives the
chance to provide deep zoom and interoperability for their images for their
users: Mirador, and the Wellcome’s Universal Viewer (http://showcase.iiif.io/viewer/mirador/).
These get around the problem of having to create differently sized derivatives
of images for different uses, and of having to publish very large images on the
internet when download speeds might be slow.

Digital New Zealand

Chris McDowall of Digital New Zealand explored how best to
make LOD work for non-LOD people. Linked Open Data uses a lot of acronyms and
assumes a fairly high level of technical knowledge of systems which should not
be assumed. This is a particular bugbear of mine, which is why this talk
resonated. Chris’ advocacy of cross developer/user meetups also chimed with my
own thinking: LOD will never be properly adopted if it is assumed to be the
province of ‘techies’. Developers often don’t know what they are developing
because they don’t understand the content or its purpose: they are not
curators.

He stressed the importance of vocabulary cross-walks and the
need for good communication in organisations to make services stable and
sustainable. Again, this chimed with my own thinking: much work needs to be
done to ‘sell’ the benefits of Linked Data to sceptical senior management.
These benefits might include context building around archive collections,
gamification of data to encourage re-use, and serendipity searches and prompts
which can aid researchers. Linked Data offers the kind of truly targeted
searching in contrast to the ‘faith based technology’ of existing search
engines (a really memorable expression).

He warned that the infrastructure demands of LOD should not
be underestimated, particularly from researchers making a lot of simultaneous
queries: he mooted a pared down type of LOD for wider adoption.

Chris finished by highlighting a number of interesting use
cases of LOD in Libraries as part of the Linked Data for Libraries (LD4L) project,
a collaboration between Harvard, Cornell and Stanford (https://wiki.duraspace.org/pages/viewpage.action?pageId=41354028). See also
Richard Wallis’ presentation on the benefit of LO for libraries: http://swib.org/swib13/slides/wallis_swib13_108.pdf

Schema.org

Richard Wallis of OCLC explored the potential of Schema.org,
a growing vocabulary of high level terms agreed by the main search engines to
make content more searchable. Schema.org helps power search result boxes one
sees at the top of Google search return pages. Richard suggested the creation
of an extension relevant to archives to add to the one for bibliographic
material. The advantage of schema.org is that it can easily be added to web
pages, resulting in appreciable improvement in ranking and the possibility of
generating user-centred suggestions in search results. For an archive, this
might mean a Google user searches for the papers of Winston Churchill and is
offered suggested other uses such as booking tickets to a talk about the
papers, or viewing Google maps information showing the opening times and
location of the archive.

The group discussion centred on the potential elements (would
the extension refer to thesis, research data, university systems that contain
archive data such as Finance and student information?), and on the need for use
cases and setting out potential benefits. I agreed to be part of an
international team through the W3C Consortium, to help set one up.

[photo: Shakespeare window at the State Library of New South Wales]

Dork shorts/Speedos –
these are impromptu lightning talks lasting a few minutes, which highlight a
project, idea or proposal. View here:
http://summit2015.lodlam.net/about/speedos/

Highlights:

Cultuurlink (http://cultuurlink.beeldengeluid.nl/app/#/): Introduction by Johan Oomen

This Dutch service facilitates the linking of different
controlled vocabularies and thesauri and helps address the problem faced by
many cultural organisations ‘which thesauri do I use?’ and ‘how do I avoid
reinventing the thesauri wheel?’. The services allows users to upload a SKOS
vocabulary, link it with one of four supported vocabularies and visualise the
results.

The service helps different types of organisation to connect
their vocabularies, for example an audio-visual archive with a museum’s
collections. The approach also allows content from one repository to be
enhanced or deepened through contextual information from another. The example
of Vermeer’s Milkmaid was cited: enhancing the discoverability of information
on the painting held in the Rijksmuseum
in Amsterdam through connecting the collection data held on the local museum
management system with DBPedia and with the Getty Art and Architecture
Thesaurus. This sort of approach builds on the prototypes developed in the last
few years to align vocabularies (and to ‘Skosify’ data – turn it into Linked
Data) around shared Europeana initiatives (see http://semanticweb.cs.vu.nl/amalgame/).

Research Data
Services project: Introduction by Ingrid Mason

This is a pan-Australian research data management project
focusing on the repackaging of cultural heritage data for academic re-use.
Linked Data will be used to describe a ‘meta-collection’ of the country’s
cultural data, one that brings together academic users of data and curators. It
will utilise the Australia-wide research data nodes for high speed retrieval (https://www.rds.edu.au/project-overview
and http://www.intersect.org.au/).

Tim Sherratt on
historians using LOD

This fascinating short explained how historians have been
creating LOD for years – and haven’t even known they were doing it –
identifying links and narratives in text as part of the painstaking historical
process. How can Linked Data be used to mimic and speed up this historical
research process? Tim showed a working example and a step by step guide is
available: http://discontents.com.au/stories-for-machines-data-for-humans/
and listen to the talk: http://summit2015.lodlam.net/2015/07/10/lod-book/

Jon Voss on
historypin

Jon explained how the popular historical mapping service,
historypin, is dealing with the problem of ‘roundtripping’ where heritage data
is enhanced or augmented through crowdsourcing and returned to its source. This
is of particular interest to Europeana, whose data might pass through many
hands. It highlights a potential difficulty of LOD: validating the authenticity
and quality of data that has been distributed and enriched.

Chris McDowall of
Digital New Zealand

Chris explained how to search across different types of data
source in New Zealand, for example to match and search for people using
phonetic algorithms to generate sound alike suggestions and fuzzy name
matching: http://digitalnz.github.io/supplejack/.

Axes Project (http://www.axes-project.eu/): Introduction from Martijn Kleppe

This 6 million Euro EU-funded project aims to make
audio-visual material more accessible and has been trialled with thousands of
hours of video footage, and expert users, from the BBC. Its purpose is to help users
mine vast quantities of audio-visual material in the public domain as
accurately and quickly as possible. The team have developed tools using open
source frameworks that allow users to detect people, places, events and other
entities in speech and images and to annotate and refine these results. This
sophisticated tool set utilises face, speech and place recognition to zero-in
on precise fragments without the need for accompanying (longhand) metadata. The
results are undeniably impressive – with a speedy, clear, interface locating
the parts of each video with filtering and similarity options. The main use for
the toolset to date is with film studies and journalism students but it
unquestionably has wider application.

The Axes website also highlights a number of interesting
projects in this field. Two stand out: http://www.axes-project.eu/?page_id=25,
notably Cubrik (http://www.cubrikproject.eu/),
another FP 7 multinational project which mixes crowd and machine analysis to
refine and improving searching of multimedia assets; and the PATHS prototype (http://www.paths-project.eu/)  ‘an interactive personalised tour guide through
existing digital library collections. The system will offer suggestions about
items to look at and assist in their interpretation. Navigation will be based
around the metaphor of a path through the collection.’ The project created an
API, User Interface and launched a tested exemplar with Europeana to
demonstrate the potential of new discovery journeys to open access to
already-digitised collections.

Loom project (http://dxlab.sl.nsw.gov.au/making-loom/): Introduction from Paula Bray of State Library of New South Wales

The NSW State Library sought to find new ways of visualising
their collections by date and geography through their DX Labs, an experimental
data laboratory similar to BL Labs, which I have worked with in the UK. One
visually arresting visualisation shows the proportions of collections relevant
to particular geographical locations in the city of Sydney. Accompanied by
approving gasps from the audience, this showed an iceberg graphic superimposed
onto a map showing the proportion of collections about a place that had been
digitised and yet to be digitised – a striking way of communicating the
fragility of some collections and the work still to be done to make them
accessible to the public.

LODLAM challenge

19 entries were received: http://summit2015.lodlam.net/challenge/challenge-entries/

  1. Open Memory Project. This Italian entry
    won the main prize. It uses Linked Data to re-connect victims of the Holocaust
    in wartime Italy. The project was thought provoking and moving and has the
    potential to capture the public imagination.
  2. Polimedia is a service designed to
    answer questions from the media and journalists by querying multi-media
    libraries, identifying fragments of speech. It won second prize for its
    innovative solution to the challenge of searching video archives.
  3. LodView goes LAM is a new Italian
    software designed to make it easier for novices to publish data as Linked Data.
    A visually beautiful and engaging interface makes this a joy to look at.
  4. EEXCESS is a European project to
    augment books and other research and teaching materials with contextual
    information, and to develop sophisticated tools to measure usage. This is an
    exciting, ambitious, project to assemble different sources using Linked Data to
    enable a new kind of publication made up of a portfolio of assets.
  5. Preservation Planning Ontology is a
    proposal for using Linked Data in the planning of digital preservation by
    archives. It has been developed by Artefactual Systems, the Canadian company
    behind ATOM and Archivematica software. This made the shortlist as it is a good
    example of a ‘behind the scenes’ management use of Linked data to make
    preservation workflows easier.

A selection of other
entries:

Public Domain City
extracts curious images from digitised content. This is similar to BL Labs’
Mechanical Curator, a way of mining digitised books for interesting images and
making them available to social media to improve the profile and use of a
collection.

Project Mosul uses
Linked Data to digitally recreate damaged archaeological heritage from Iraq. A
good example of using this technology to protect and recreate heritage damaged
in conflict and disaster.

The Muninn Project
combines 3D visualisations and printing using Linked Data taken from First
World War source material.

LOD Stories is a
way of creating story maps between different pots of data about art and
visualising the results. The project is a good example of the need to make
Linked Data more appealing and useful, in this case by building ‘family trees’
of information about subjects to create picture narratives.

Get your coins out of
your pocket
is a Linked Data engine about Roman coinage and the stories it
has to tell – geographically and temporally. The project uses nodegoat as an
engine for volunteers to map useful information: http://nodegoat.net/.

Graphity is a
Danish project to improve access to historical Danish digitised newspapers and
enhancing with maps and other content using Linked Data.

Dutch Ships and
Sailors
brings together multiple historical data sources and uses Linked
Data to make them searchable.

Corbicula is a way
of automating the extraction of data from collection management systems and
publishing it as Linked Data.

[photo: delegates at the summit]

Day two sessions

Day two sessions focused on the future. A key session led by
Richard Wallis explained how Google is moving from a page ranking approach to a
triple confidence assertion approach to generating search results. The way in
which Google generates its results will therefore move closer to the LOD method
of attributing significance to results.

Highlights

  • Need for a vendor manifesto to encourage systems
    vendors such as Ex Libris, to build LOD into their systems (Corey Harper of New
    York University proposed this and is working closely with Ex Libris to bring
    this about)
  • Depositing APIs/documentation for maximum re-use
    (APIs are often a weak link – adoption of LOD won’t happen if services break or
    are unreliable)
  • Uses identified (mining digitised newspaper
    archives was cited)
  • Potential piggy-backing from Big Pharma
    investment in Big Data (massive investment by drugs companies to crunch huge
    quantities of data – how far can the heritage sector utilise even a fraction of
    that?)
  • Need to validate LOD: the quality issue – need
    for an assertion testing service (LOD won’t be used if its quality is
    questionable. Do curators (traditional guardians of quality) manage this?)
  • Training in Linked Data needs to be addressed
  • Need to encourage fundraising and make LO
    sustainable: what are we going to do with LOD in the next ten years? (Will the
    test of the success of Linked Open Data be if the term drops out of use when we
    are all doing it without noticing? Will 5 Star Linked Data be realised? http://5stardata.info/)

Summary

There were several key learning points from this conference:

  • The divide between technical experts and policy
    and decision makers remains significant: more work is needed to provide use
    cases and examples of improved efficiencies or innovative public engagement
    opportunities that the technology provides
  • The re-use and publication of Linked Data is
    becoming important and this brings challenges in terms of IPR, reliability of
    APIs and quality of data
  • Easy to use tools and widgets will help spread
    its use; avoiding complicated and unsustainable technical solutions that depend
    on project funding
  • Working with vendors to incorporate Linked Data
    tools in library and archive systems will speed its adoption
  • The Linked Data community ought to work towards
    the day Linked Data is business as usual and the terms goes out of use

Personalisation: Who do you think you are?

Geoff Browell, Head of Archives Services

I attended a two day conference at the University of Brescia
in northern Italy in late April: the 13th International Innovations
in Education Colloquium. This was an opportunity for delegates to learn  about the opportunities afforded by
‘personalisation’ – the collection or delivery of information tailored to the
needs of individuals. The focus of the talks at this conference was
personalisation in the fields of healthcare and higher education, with the dual
objectives of improving patient care and making the student experience richer,
more creative and rewarding. Some fifty delegates were brought together from
these sectors drawn from a dozen countries as far afield as the UK, Australia
and Thailand.  This was an eclectic community
that included leading dentists, maxillo-facial cancer surgeons, and an expert
on conflict resolution, a specialist in 3D printing and an expert on student
interactions with online resources.

The hosts – as always – were the Brescia Medical and Dental
School, and we were grateful throughout for the care and attention provided by
many of the delightful student volunteers, including collecting delegates from
airports up to 60 miles distant and even preparing a magnificent homemade pasta
lunch!

The colloquium began with an icebreaker session at which
delegates had were asked to draw and reinterpret examples of Australian flora –
in my case a pine cone (very badly). The session had a deeper purpose: the art
facilitator, Jen Wright, is currently completing a PhD on the role of fine art
in improving virtual learning systems for cancer surgeons – and our experience
provided some valuable feedback on how art can help the practice of surgery.

The keynote was provided by Eeva Leinonen, formerly of
King’s College London, and now Deputy Vice Chancellor of the University of
Wollongong, Sydney. The university focuses on technology enhanced learning and
has recently developed a number of popular MOOCS and Open Educational Resources.
One of their key objectives is the creation of personalised learning support
for students – and this requires the collection of more detailed and meaningful
analytics to improve strategy, university management and explore the
motivations of students in a holistic way. Recent work has included a
comprehensive student survey on the ethical limitations of data collection (how
far are students willing to share data), and the potential of data collection
to improve teaching (for example almost real time data on attendance at
lectures), identify reasons for drop-out and thus improve retention of
students. The objective is the creation for a student of a truly personalised plan
analogous to a personalised medical treatment plan in a clinical setting. This
will help demonstrate to a student that they are valued, that their unique
contribution to the university is being recognised, and that interventions to
improve their learning experience can be acted on promptly.  

The second keynote, from Dieter Schonwetter of the University
of Manitoba, Canada, in honour of the late Bruce Elson, explored the role of
legacy and immortalisation post mortem as a type of personalisation, not least
in the digital sphere, but also in students’ research, writing and friendships.
A student’s connections, relationships and extra-curricular experiences are as
valuable as their formal learning in contributing to a lasting legacy.
Professor Schonwetter’s talk was supplemented by Professor Emeritus Margaret
Cox in a moving tribute to Bruce Elton and his work in the field of dental
haptics. Margaret drew attention to the hapTEL project at King’s – designed to
improve the teaching of dental students (http://www.haptel.kcl.ac.uk/).

A presentation by Dr Eva Dobozy of Curtin University,
Western Australia, explored the phenomenon of online ‘lurking’ by students. I
was unfamiliar with the term, which refers to a type of online student
behaviour in which students fail to engage in online or blended learning but instead
are passive ‘watchers’ who refuse to participate in the opportunities which new
learning technologies afford and to take part in online discussions – the
equivalent of those classroom students that never contribute to classes but sit
passively from the back. We were shown the Learning Activity Management System
(LAMS) developed by Macquarie University to allow educators to monitor such
behaviour in online learning environments.  

The afternoon symposium comprised a packed series of talks
and presentations showcasing several new technologies. Professor Kenneth Eaton,
unveiled the SHIELD project – responding to an EU Horizon 2020 call aimed at
personalising health and care. The project, should it be funded, will seek to
support the development of digital devices to monitor the health of the elderly
at home, train healthcare workers and support carers, saving money – and the
dignity – of the old.

The North and South Culture Cafe Project, managed by the
universities of Hull and Southampton, was a series of talks aimed at
challenging stereotypes of the north and south of England, which often came
into being in the literature of the nineteenth century: https://twitter.com/northsouth2017

An important talk from Marika Guggisberg of Curtin
University, Western Australia, explored the impact to mental health of sexual
violence, concluding that it is under-reported and often goes unrecognised.
Marika proposed the development of new methods of intervention and prevention.

The next talk introduced GRAPHIC – a serious online game
used by dental students as part of their practical education. It drew attention
to the importance of such games in professional training. GRAPHIC-1 and its
follow-up, GRAPHIC-2, have been used by students at King’s College and in
Thailand on the suitability of oral health programmes in simulated situations.
The project has concluded that serious games have a role to play in
professional training to augment, but not replace, face to face teaching.

The next presentation concerned the delivery of
personalisation at scale – for large numbers of students – delivered by David
Gibson, Director of Learning Engagement at Curtin University in Western
Australia. He explained how algorithms have been developed which can recommend
learning materials or tasks based on data generated by other learners, or which
analyse large quantities of information relating to learner demographics and
personalise the learner experience based on these data. David proposed a mixed
approach using semi-automated personalised learning using machine learning
algorithms but which are adapted and shaped by the real experience of learners
and teachers.

The final workshop was delivered by John Burgess, a
consultant working in the construction industry and a professional adjudicator,
who provided invaluable advice on methods and means of personal development to
encourage innovation and risk-taking – encouragement of a sort of personalised
risk-taking.

Overall, this was a
fascinating conference, which mixed together practitioners from many different
environments to think more deeply about providing more nuanced and intelligent
teaching and care for university students and patients alike.