Challenges and risks associated with the use of linked data and machine learning to address the opioid crisis

July 2021
Co-authors: Professor Matthew Hickman, Dr Sebastiano Barbieri, and Professor Louisa Degenhardt



Clinical risk prediction is abundant in the medical literature. Much less has been written about how the field of public health, and specifically substance use disorders, could similarly benefit from these approaches. With the opioid crisis, there is an immediate need to accurately estimate overdose risk and improve understanding of how to deliver treatments and interventions in people with opioid use disorder in a way that reduces overdose risk. In a recently published review in The Lancet Digital Health, we examined the potential opportunities for predictive analytics and routinely collected administrative data to evaluate how overdose could be reduced among people with opioid use disorder.

By combining results from new and existing systematic reviews, the review describes the use of data linkage in research into opioid overdose to date, and considers the potential for predictive modelling, including machine learning, to prevent and monitor opioid overdoses. The potential challenges and risks associated with the use of linked data and machine learning in reducing mortality and other harms related to opioid use are discussed, and key points summarised below.


The process of identifying, matching, and merging data that correspond to the same unit (person) from multiple sources - known as data linkage – allows more comprehensive cohorts of people with opioid use disorder to be established and ensures that multiple indicators of potential risk factors can be operationalised. Given the increasing evidence of major syndemics in many countries, future studies should ideally aim to construct these kinds of data resources to facilitate analyses that address questions, and directly inform, public health actions. However, the increasing use of linked data for research purposes and sharing with agencies external to those that originally collected the data, impresses the need for their use and accessibility to be guided by regulatory environments with codes of practice for data security and information governance. In some instances, these environments will require legislative change. Improvements in the accessibility of individually linked data for the research community could therefore, in many cases, be facilitated by government-led initiatives and strategies for data linkage.

Machine learning model biases

An emerging interest has been the use of machine learning methods for developing prediction models. Although these methods are often claimed to perform better in prediction tasks than traditional regression due to their data-driven approach and ability to identify patterns in data, integration of prediction models that were developed by use of machine learning into clinical practice is scarce, even more so for public health interventions. One concern relating to their implementation is the fact that the model may not provide equal benefit to every population subgroup. This could be a consequence of cognitive biases from investigators or implicit in the training data used to develop the model. Although various measures in the literature purport to assess different dimensions, it is not possible to satisfy multiple measures simultaneously and selection of fairness goals needs to be guided by the context.

Population over individualisation   

To date, exercises for predictive modelling in the context of opioid overdose prevention have largely focused on evaluations of individualised risk. The debate exists whether an appropriate goal for public health outcomes – such as opioid overdose – would be to focus on the population level (e.g., focusing on the subgroup that are affected as a whole) rather than individual level. For example, exploring how to optimise opioid agonist treatment (as a method for reducing overdoses) through changes in delivery and adjunct service provision. Such approaches to improving OAT access and retention could provide a population-based approach to preventing opioid use harms while also overcoming concerns over potential unintended consequences relating to individualised risk predictions. Increasing the application of design principles from causal inference and emulated trials to the analysis of observational data would also help to address questions relating to comparative effectiveness of interventions and treatments and provide an empirical basis for establishing evidence-based treatment guidelines.

System capacity limitations
Ultimately, the clinical value of a prediction model is established not by its output, but by the health-care system’s ability to respond and deliver the appropriate care. Clinicians and service providers can only care for people accessing services within the framework of available systems. Limitations relating to the availability and accessibility of effective interventions for opioid use disorder will determine the pathways of care that can be provided. For this reason, the integration of risk models to reduce related harms will not be a panacea. Rather, evaluation of the existing services and the ability for those systems to be scaled up to meet the requirements of people with opioid use disorder is crucial for reducing harms that are related to opioid use and in facilitating the evaluation and implementation of novel models of care for addressing the opioid crisis.


Establishing large-scale platforms for data linkage will strengthen collaboration efforts for monitoring, evaluating, and responding to events that are related to opioid use. Predictive modelling and emulated trials applied to big (linked) data offer an opportunity to explore how public health could be improved through evaluating opioid use disorder prognoses, predicting treatment response, and personalising treatment recommendations. Prevention of harms through evidence-based, targeted approaches will require several important areas to be considered in future research agendas.

Citation: Bharat, C., Hickman, M., Barbieri, S., & Degenhardt, L. (2021). Big data and predictive modelling for the opioid crisis: existing research and future potential. The Lancet Digital Health, 3(6), e397-e407. DOI: 10.1016/S2589-7500(21)00058-3