Next in the ‘A-Z’ series is L for Linkage. The use of linked data provides an exciting avenue for future research. This blog outlines what data linkage is, the advantages and challenges of linked data, and the power of using it collaboratively.

Celestine, 1st year PhD Student & Research Assistant







What is Data Linkage?

Data linkage is the process of bringing together information from two or more distinct sources that relate to the same person to obtain rich, detailed information, and enhance existing data. Traditionally, this has involved linking information from multiple sources (e.g., medical records, administrative data) to individual level data (e.g. research questionnaire data). 

                                                                  Examples of the types of data frequently involved in record linkage for research.


What are the advantages of data linkage in research?

Linkage provides many key benefits. First, it allows researchers to access large amounts of data without needing to undertake new data collections. This enables longitudinal research studies to focus their limited data collection space on information not included in routine records.

Second, data linkage allows for the integration of richer sources of data about research participants, creating a more accurate description of their lives.

Some individuals may find it difficult to participate in research, meaning that not everyone is represented in research findings. It can also be difficult to accurately remember specific details for questionnaires (e.g. date of specific treatments). Data linkage is one tool for learning about hard-to-reach populations or hard-to-remember information, which makes samples more representative, and findings more generalisable. 

Importantly, data linkages are also more convenient for study participants. For example, rather than asking people to report on specific details of their medical history, this information can be taken straight from official records. 


What are the challenges of linkage in research?

Despite the numerous advantages, linkage also comes with a unique set of challenges. Missing data can be a problem with official records, as it is not possible to capture information on individuals who do not interact with services. For example, there are many sensitive topics which are prevalent in society but commonly not reported, such as drug use, self-harm or mental ill-health. 

Challenges can also present during the linkage process itself. Linkage takes place by connecting identifiers (such as name, date of birth, address) in two or more datasets. In cases where there is missing or insufficient identifying information, data is not always accurately linked.

To protect confidentiality, linked data are stored in secure data linkage environments. These environments have comprehensive guidelines for the required data access approvals, researcher accreditation, and physical space where research can take place. This governance can be challenging and time consuming for researchers to navigate.


Linkage in Longitudinal Population Studies

Longitudinal population studies are research projects that collect information from a group of people repeatedly over time. These studies help to answer questions about development and health across the lifespan. There are many longitudinal population studies that have completed or are undertaking record linkage, including the GLAD study TwinsUK, Avon Longitudinal Study of Parents and Children, and the Twins Early Development Study. As part of the linkage, studies provide information about what is involved, and participants decide whether they want to be included or not. 


The UK Longitudinal Linkage Collaboration

Although linkage is a powerful tool for individual research projects, combining data from many studies can increase our ability to answer important questions using linked data. The UK Longitudinal Linkage Collaboration (UK LLC; is a national interdisciplinary research resource that brings together data from over 20 longitudinal UK studies. Within the UK LLC framework, research data from over 280,000 participants has been linked to NHS electronic health and environmental records. The UK LLC governance is structured so that contributing studies retain control over which participants data are used for which research projects – thus protecting the vital study/participant trust relationship.

By joining information from many different studies, the UK LLC increases the sample sizes that can be used in research. These larger samples increase the diversity of people included in research and makes it possible to study rare conditions. Find out more about accessing data through the UK LLC here: 


Celestine Lockhart

Author Celestine Lockhart

More posts by Celestine Lockhart