Why good data are fundamental to equity, trust and transformational change in healthcare
In today’s rapidly evolving healthcare landscape, data is not just a tool but a critical foundation for driving equitable, trustworthy, and transformational change. This webinar explores the essential role that high-quality data plays in creating a strong foundation for impactful real-world innovations, while also addressing the challenges and obstacles in effectively collecting, sharing, and utilizing health data at population level.
Experts:
Jessica Morley, Postdoctoral Research Associate, Digital Ethics Center (DEC), Yale University
Joe Zhang, Head of Data Science for the London AI Centre and Secure Data Environment (SDE) programme
Transcription:
My name is Helen Serrano. I'm one of the clinical editors here at BMJ. And it is my pleasure to welcome you to one of our future health webinars, which is a part of a series that we've been running since the spring in the run up to an event we're holding in So today's webinar is about data, why good data are fundamental to equity, trust and transformational change in healthcare.
And I know a lot of us are involved in healthcare in many ways. But this is really about how data is underpinning everything we do and all the good big things we want to do, dependent on data. So I'm delighted to Welcome to the webinar series Jess Morley and Joe Zhang, who who are going to speak on this subject.
Just as we go through, just to let you know, we will be taking questions in the chat or the Q and A box. Please use both of those. And I will take those questions at the end and I'll put them to Joe and Jess at the end of the talk. And There's also going to be in the chat box some other links coming up, so keep an eye on everything going on there.
I think for now, nothing left for me to do but to hand over to Jess and Jo. Thank you. Thank you so much, Helen. And thanks everyone for joining us wherever you are. So Joe is back home in London and I am here in Connecticut. So we are doing both old England and new England at the same time, but it is raining in both places, so we're good.
The aim of today is to walk through about four different things, which I will hopefully you can see. So we're going to start by saying, talking about why are health data so important for transformational change, at least from a policy perspective, and also from a little bit of a realist perspective.
Then we'll talk about why is leveraging health data for secondary purposes so challenging. And then we'll talk about why is it so important to actually then we'll move on to what are the potential ethical consequences if we get things wrong. And then the last little bit will be how do we get right and how do we get it right, sorry.
And as Helena said, we'll hopefully have time for questions at the at the end. And you'll be very relieved to know that Joe and I are doing this together, so you don't have to listen to me talk the entire time. So that's a relief for all involved. Okay. So starting off, why do we care? Why has this interest in the use of health data and in particularly the use of AI suddenly peaked really extremely in the sort of last couple of years and definitely since the arrival of chat GPT last year.
Sort of at the most basic level, at least in a sort of policy rhetoric perspective, it's because healthcare is in crisis. I don't think I need to tell anybody on this call or anybody who has interacted with the NHS or indeed any other healthcare system in the world that things are quite dire. People are struggling, the healthcare system is really struggling to cope.
with sort of systemic problems, which you can see on the screen, the fact that we've got more complex patients, there are more complex treatments the costs are skyrocketing, and yet there are diminishing returns. And then of course, there was this small thing, which I don't know if people remember called COVID 19, which also threw an extra spanner in the works and really added to this already existing complexity and really added to the sort of struggle that people were were facing.
If anybody is, Not aware of the fact that we are spending more and getting less is one of the clearest indicators of that happening is the fact that we have a declining life expectancy in the UK and that hasn't recovered, even in the wake of coded. What we have started to see at least in policy rhetoric so this is what you will see in documents that come out from everywhere from like the Department of Health and Social Care all the way through to think tanks.
Thank you so much. is this sort of rhetorical narrative that by using data and AI, we can basically take these massive complex models, throw a whole bunch of data at them. And by doing that, we can see when people are going to become unwell. So we can predict when they are going to become unwell in the future.
The idea then being that we can intervene. earlier and we can prevent people becoming unwell, and we can also target treatments to them more specifically, so that they are more effective in this idea that you are making healthcare more preventative, predictive, personalized, and participatory.
That participatory P is the one that people sometimes are less familiar with, but that essentially comes from the idea that you as the patient or as the user, you're are collecting more data on yourself, whether that be through your Apple Watch, or through another wearable, or through increasingly through ambient monitoring.
So you are more involved. And then the logic being that by doing this, by making medicine more predictive, preventative, personalized, and participatory, we can achieve this triple aim. Sometimes it's called the quadruple aim. But the idea is that we can simultaneously increase population health and improve the experience of care and reduce per capita cost.
If you've heard me speak before, you will know that I think this is utter nonsense and that we cannot achieve this triangle and that we in fact get a wonky triangle where we optimize one or maybe even two of those two of those three things, but it's very unlikely that we'll get all three. But regardless, this is where policy interest is coming from and it's where investment interest is coming from.
And so this is really why we care about the use of data and AI in the healthcare service. And I will talk later on about why I think this is a bit challenging, particularly from a sort of equity perspective. But first of all, before we get to, to that part of the work with me, I'm going to let Joe do his part of the work with me, which is talking about why it's so difficult to get this right.
Amazing. Thanks very much, Jess. This is a slightly more of a technical, maybe architectural overview about why things are so difficult and why we definitely aren't there yet in terms of our data and achieving all of these broad goals related to personalized healthcare, precision medicine, research, life sciences.
This is a bit of work that Jess and I did together a couple of years ago following on from a review that Jess did as well called the Goldacre Review, talking about where data was flowing and how we should flow it better within the NHS. And one of the issues that really has been a, I think, a hallmark of NHS data strategy in the past 10, 20 years is this, not quite an inability, but as finding it very difficult to translate.
Late from broad strategic goals into actual technical architectural decisions on the ground, which can make data better and support these aims. So one of the things we tried to do was add a bit of transparency to this and a bit of objective evidence to maybe help support strategy. And we did this by mapping.
I think it turned out to be around 60, 000 individual data flows. And starting from NHS providers in the bottom, that sort of the hospitals, the GP surgeries where data has been generated at source after clinical interactions with patients. And you can see these as little dots along the bottom of this graph here on the left.
And you can see very faint lines flowing upwards towards the dots in the middle. Which are entities and organizations which are extracting data from these hospital trusts and from these GP providers and often putting them in databases, and this data then flows up to other organizations that use them, often for research for public health or public health uses, but often, again, companies which take the data and sell it upwards.
to other companies and other users who then use it in things like AI model development, software as medical device development, and indeed life sciences research and drug development. And in doing this, I think we came upon a, quite a few conclusions. I think one of the obvious ones is a lack of transparency in this whole piece.
It's very difficult to know exactly what's been happening with your data. It's almost impossible to know and track exactly who is using your data and using it for what purposes. But actually there's a few others, especially in the context of what's happening in the NHS at the moment, which is building things like secure data environments and new research environments for the data.
One of the premises Your data environments is bringing the researcher to the data and not physically transferring data around to different places and bring the data to the researcher, which is also a premise of the goldacre review. And I think one of the messages there is actually our data is Absolutely everywhere, sitting on thousands, if not tens of thousands of hard drives around the world, and the horse there has absolutely bolted.
The second one is, there's a lot of talk now about the value of NHS data, and yes, NHS data is incredibly valuable. It's valuable at a baseline because we are one of the very few, where now, because of the digital maturity we have, almost every interaction with a patient,
System is captured in some sort of digital value of all the entities, which are using the data up here or brokering the data upwards, and actually the value being generated in the brokerage in the use D or software as medical device, large, very littered, also data flow or letting the data not flow. They're not able to differentiate between different uses.
Joe, I think you're glitching a bit. So I'm going to do you want to take a commercial.
Joe, you are buffering quite extremely do you wanna try and take Hello? Hi Joe. You are, this is great, but you are buffering quite a lot. Can you wanna try taking your camera off? And see if that makes it a bit smoother. Otherwise I'll take over talking for a sec. Oh, okay.
Joe has in fact, temporarily left the building, , so let me take over and drive for a minute. So essentially what Joe was just saying is that when we mapped all of this data flows, we found that everything is everywhere, all over the place. And as much as the NHS is now trying to re rejig this and bring everything back together.
There is a sort of challenge in the fact that we have already seen the fact that the horse is already bolted and then Joe was moving on to talk about the fact that as much as there is talk about the value of NHS data, actually trying to calculate that is extremely difficult. And indeed, the fact that the vast majority of that value sits outside the NHS, it sits in data brokers and they are using it for data for R& D.
Joe, are you back? Sorry about that, guys. Can you hear me? Am I still buffering? My internet went down. That's a very bad place. So hopefully you got some of that. But the last point I was going to make about this slide was even though this landscape appears incredibly rich, and we have a huge amount of data flowing outwards, actually, 80 percent maybe even up to 90 percent of the data with most value and most information about patients isn't flowing at all because it remains locked behind our systems in primary care and in secondary care and we'll talk about that a bit more on the next slide please.
Thank you. So this is a bit of work that again, Jess and I are doing in also doing some mapping, but taking an opposite approach. Instead of mapping every single individual flow, we're mapping a general framework of what NHS data architecture looks like. And I think one of the big messages from here is that data isn't just data.
There's many different types of data. They all have. different purposes, a different provenance for the generation, different use cases downstream. And it's not just a matter of getting more data into one place. It's also a question of getting the right data and getting the right data out from source systems.
So to give an example, the circle number one on the left with the green icons in the middle, which show the different types of source system from which data can be generated and extracted. It's from primary care. And we have a duopoly of primary care vendors in the country TPP and EMIS, and actually their data at least the structured data, so the clinical codes have been accessible at a population level for 10, almost 20 years now.
And we can see a lot of rich flows in that area going to at least a dozen research databases, some of which are public, many of which are private. from which big population GP data sets can be used for research or indeed brokered upwards to other users. What we don't have and where you'll see no flows coming out of are the unstructured data within GP surgeries.
So actually when a patient sees a GP, the GP will put some of that information into clinical codes by putting in a code into a system for something like high blood pressure or hypertension. But the majority of information about any consultation will get typed out. into the clinical notes as free text, and we'll talk more about that later.
You can also see a big gap in number two, and that is our social care and community care records. A lot of these things happen in systems where we don't have electronic health records, or in areas where the systems that we do have are very old. And don't really have any data flows that could come out of them.
Number three and the blue icons in the middle are our hospital systems. And at last count, we had more than a hundred different types of electronic health system in the UK used across different hospitals for different purposes. The trust where we're working now to build data pipelines, 30 different legacy systems.
And most of these systems have data that is. almost inaccessible, or at least very difficult to get out, which means that for the past more than 20 years, probably up to 30 years, the way that we capture information in hospital has been through a process called clinical coding. And this is actually its own medical profession.
So clinical coders will sit in the hospital and read through a patient's notes, including everything which happens as an inpatient. As an outpatient, and they will extract concepts by hand and turn those into clinical codes to represent what happened. For example, did the patient have a heart attack or a pneumonia or a chest infection, and these will get turned into clinical codes and passed upwards into the NHS.
And the primary reason for doing this is so we can keep track of how patients move through hospital, what they've had done to them, including procedures, and it's also used to inform payments back to hospital from the NHS. But this is a manual layer. So this is all created through humans who are reading through notes and abstracting them.
When instead, if you look towards number four, and indeed, actually, all the other systems, our electronic flow of data from hospital systems is very minimal. And that's simply a function of there being hundreds of different types of data extraction methods in lots of different places from more than 100 different systems, and no capability to build these pipelines at the source in the hospitals to get this data out.
And what this really shows is that If we develop from the top down and we can build as many research environments as we like to hold NHS data, unless we do the work at source to improve the architecture and the technology the data flowing upwards is still going to be the same data that we've had for 10, 20 years.
Next slide, please, Jess. So this has a direct impact on the quality and the usefulness of the data that we're able to use for all of the themes which are in the public eye right now, things including the value of data in itself, the use of data for life sciences and research, and indeed the use of data for AI predictive modelling and personalized healthcare.
And we can summarize this across different types of technological problems. We touched on free text, which we call unstructured documents. So this is stuff that we write as letters or as descriptions of a patient encounter, symptoms, examination findings. And these are found in symptoms just as free text notes.
But also in the types of documents even on your laptop or desktop. So PDF files and doc files, and these are not coded, which means it's very difficult, at least traditionally for you to put these through a computer and know exactly what you say, what they say without having a human actually read through the documents.
And now this is something that we can actually bridge through language AI. It's probably one of the most exciting areas of application for models such as GPT. But traditionally what this means is that very useful information, the things which really capture detail about patients are very difficult to surface, and they can't be used in research and they can't be used in model development.
To give you an example, if we have a predictive model in the UK for predicting, let's say, cancer risk. It's going to be using data from the structured record. So the stuff that GP is putting clinical codes and two patients which look identical, they may have had a cancer diagnosis at the same time, for example, might actually be very different.
Because that difference is captured in how the GP has written about the symptoms and what the GP has been found on has found on examination, or indeed more information that's written in test results. And unless that information can be surfaced, we cannot represent patients in enough detail. Secondly, The fact that the vast majority of the data we have available is all driven by clinical codes means that it's also driven by the factors which are driving how we code our patients.
So coding is there for a purpose and it's not there for research. Coding is predominantly there for sometimes audit purposes. but often also for payments. So GPs will often code according to what is required based on certain frameworks often known as quality and outcomes frameworks, as well as others, which define how we track GP performance and how we offer further payment to GP GP practices.
Similarly, in hospitals, we code in order that payments can be given to hospitals based on certain results or certain events and procedures which have happened. And these are things which change all the time, which means that the data sets which are made available for AI development and research are biased by incentives and dissenters, which have nothing to do with a patient and have nothing to do with the clinical context.
In addition, coding takes time. So if you have a GP who's sitting down with a patient who's very difficult in terms of having very complex comorbidities, maybe they're from a foreign country who don't speak English very well and their appointments lasting 15 minutes, 20 minutes, 25 minutes. Within that consultation, that GP is going to have far less time to put codes on the system, which really reflect what's going on in that consultation, because our priority is getting down all the details on paper or by typing it out into free text, which means that we do see a bias in data quality in that patients from more deprived backgrounds and from ethnic minority backgrounds tend to have worse quality coding and poorer data quality on the system.
And if you try and train predictive models on that. That bias is going to be embedded into your predictive modelling. We've touched on the problem of data being locked behind electronic systems. There's a lot of software where data is restored purely to serve the front end, which means serving when let's say clinicians use a software, and that data supports what the clinicians see.
That could be spread across hundreds, if not thousands of tables in the back end, and actually getting that data in a usable form for analysis is very difficult. And vendors can bridge this problem. Vendors can offer ways to access the data. They can offer access to cleaner data. But what you will often find is that this will come at a really large cost, even though it's the patient and the clinical teams and the clinical organization who partake in the data generation, because the data is being generated in a proprietary system.
It's possible that to access that data, it's easier for research. You might be paying half a million pounds in order to get a connection and in order to get clean data out. And finally, and we also touched on this, for 10 years, 20 years, the NHS has always built from top down, building platforms, building research infrastructure, building environments that people can log into.
But it's always the same data that tends to be shuffled around, because unless we start to build from the bottom and build the architectures within the trust, then the data that we use for research, development and AI is still the same data, and it's still of insufficient quality. And that has an impact on the findings and the AI products that we do create on such data.
Slide please, Jess. So we can talk about predictive analytics a little bit. So predictive analytics or predictive algorithms are algorithms which are trained on patient data to recognize patterns in patient data. And they can use these patterns that have been learned to make future predictions on data that they haven't seen before.
Let's say new patients, which means that anytime you adopt this. methodology, then the algorithms will always represent any mistakes or any biases found in the data that they've been trained on. And that is ultimately the biggest bottleneck for anything to do with prediction and anything to do with artificial intelligence.
The most widely deployed algorithm in the UK are types of algorithm we call risk stratification. And what these do is they predict across large populations of patients are usually in primary care. And they try and predict the risk of hospital admission for any patient in the next year. There's not very much evidence around that risk stratification works.
We do have an increasing amount of evidence to show that risk stratification doesn't work. There have been two high quality studies in the UK which look at risk stratification in prospective pathways, so deployed in a pathway for future patients. And they've both found that when you use risk stratification, you actually increase the workload of GPs.
piece. You increase the amount that patients go to hospital and you don't improve any outcomes. Jess and I with a colleague, Chris Oddy recently did a systematic review of all risk stratification algorithms that we could find in the literature. And we looked at performance. In evaluation and development in data sets in different places.
And then what happened when you deploy these algorithms into the real world? And we found that even where an algorithm performs well within a data set situation. So you've trained it on a data set and you validated it on a different part of that data. That performance simply doesn't translate and doesn't carry over into real world deployment.
And we've done some tests in the UK population. We've looked at before and after of risk stratification deployment across different ICBs in the UK, and we've not found any signal to suggest that this type of intervention is improving hospital admission rates. And one of the reasons for this is that the data is simply not good enough.
The data is not good enough, and it's also too unpredictable. Because if you think about how we code the type of data used for these models in the UK, the coding will often change over time. It will also often change from practice to practice. A lot of these algorithms are actually developed on United States data, who code things very differently.
Because this represents claims data, which is used in health insurance, and the patterns of coding are very different from how our GPs in primary care would code things and define conditions and define different features. And actually in the in the graphic on the far right, you can see a concept known as drift, which means that over time, codes will drift, and the meaning behind codes will drift, which means that if you try and use the same algorithm that you trained a few years ago, it will drift.
The performance is going to drift and you can read the charts on the right which say Q admissions drift and Q risk to drift. So these are called calibration curves. The curved lines represent how well the algorithm is performing. And what you want is for the line to be as close to the middle diagonal as possible.
And you can see that when the algorithms were developed in 2009 or in 2013 for Q admission the calibration curve is relatively close to the diagonal, but over time that performance will drift. if the algorithm isn't changed. And there's been similar studies to look at how algorithms perform across different locations, such as GP surgeries.
And we find there's a huge amount of variability there because the data simply isn't good enough or complete enough, or the data means different things depending on where you're standing in time and space. Slide, please, Jess. And this is probably the most widely implemented AI algorithm in the world.
And this is an epic sepsis model. This is a model developed by so for context, epic is one of the largest electronic health record vendors in the world. They're very big in America across hundreds and hundreds of hospitals. They're in five or six hospitals in the UK. It's a very it's a relatively easy to use EHR system, which means it's quite popular.
But EPIC also do hold a lot of clinical data. And what they are also doing is developing algorithms on top of this data. So this is an algorithm for detecting sepsis early. So detecting very severe infections in the patient population that's moving through the emergency department. This was developed quite a while ago on a small group of trust and it was rolled out to hospitals that used Epic.
And at that time, the studies where it was developed and validated showed very good performance. There was a study shortly after in 2021 which was the first external evaluation and independent evaluation of this algorithm. And it simply found that the model performed very poorly. in a setting where it was externally validated and the fact that it was being used in hundreds of hospitals across the United States raised concerns for this team that the performance simply wasn't there and it was potentially introducing harm rather than doing good.
And Epic took the model, they retrained it, they revalidated it. The performance improved. Quite recently, a few months ago, there was another external validation study in one of the New England Journal of Medicine journals, and it found that the performance again and in external validation was far below what Epic had promised in their original validation studies, and also when using the algorithm in where it was really useful.
So where a clinician hadn't spotted the sepsis the metric for scoring it dropped to 0. 47, which is actually worse than guessing. And this is an algorithm which is in hundreds of hospitals across the world. It's been actively used, but this again is evidence that there's contextual problems and issues in time, which means that it's quite likely that the data that's been used to serve the algorithm for prediction just isn't good enough or isn't representative algorithm was trained at source and slide.
Please, Jess. And when we go to a system where there's many data sets around and many companies or many teams or many research organizations who are developing models on these fixed data sets and then trying to deploy them into real world systems, then the mismatch happens. And we find that performance is never as representative or never as good as we can expect it to be.
And this is I think around 2 billion worth of company value in terms of raising VC or private equity raise, or indeed stock market valuation. And one of the big features behind all of these companies is that they've put AI at the core of what they do. And often their products simply do not translate into impact.
And then revenue when you put them into a real world and fundamental to all of this is the fact that the data that we have in different systems and particularly in the NHS at the moment simply isn't good enough or detailed enough. Thanks, Jess. Awesome. Thanks, Joe. That was very fun. I felt like I really enjoyed driving the slides.
But okay, so now we have seen why we think, or at least why policy believes, that the use of data is extremely exciting from the perspective of transforming care. And we have seen Also why that might be a bit more challenging than it has potentially been sold, why from a technical perspective.
And so now I'm going to depress you further before we build you back up by talking about the fact that transformation can be both positive and negative. And we have to remember where we are trying to get to so that we all drive in the same direction towards the type of transformation that we want, as opposed to the type of transformation that might just happen because we are simply driving along the motorway.
without really ever checking the map. So this is first, my sort of first point, and this is Joe did the technical, I'm going to do the the philosophical. And my sort of first point is to say that often when we hear these types of discussions about data in transforming healthcare, what you will get is something that's called data driven medicine.
You will say, oh, we're going to have data driven healthcare. I think even in some of the most recent documents that have come out, even from. from our new government have said that we want to turn the NHS or that into the greatest data driven health care system in the world. Now, what I want to tell stress now is that this is more than just data driven.
Healthcare has in fact, always been data driven. It hasn't necessarily that data hasn't necessarily been. in a digitized or an electronic form, but data has always driven healthcare. Even if you go right the way back to people like in the ancient Greeks, and we're talking like Aristotle and Hippocrates, they are writing about the fact that we should list and look at all of the symptoms of a patient.
That is exactly the same as taking the same logic as data driven medicine. And even if we look way, way back, When AI first became a thing, so we're talking about 1954, there is a paper, there is a systematic review paper talking about the fact that statistical modelling might be better to than clinical doctors at doing diagnostics.
So medicine, this idea that medicine might be data driven is not new. We've just become a little bit better at branding at it branding it as such. But there is a bigger shift happening therefore, that it's not just about taking a paper based system and making it electronic. And I'm here where I say 20th century versus 21st century.
I'm playing slightly fast and loose with my dates where I'm talking about really up until about 2010. And here I'm talking from about 2010 onwards. But if we take the sort of 20th century model What we have tried to do, and I am by no means, before people come at me, by no means saying that this model was perfect and that there were not flaws, but what we tried to do was develop a healthcare system that was evidence based.
So we had, you go out and you do a clinical trial, that clinical trial is then generates the evidence that the drug or the treatment is safe and effective. crucially important, with the clinician's ethics. expertise, and that combination of expertise and evidence is used to drive the patient's care, and the patient is put at the centre.
So we, we look at things like the NHS's constitution, it will say very clearly the NHS is placed at the centre of care, and the patient is placed at the centre of care. So patients are involved in the decision making both directly in the sense that patients should be given things like informed consent and talk to about what it is that they are being involved in, but also indirectly from that experience perspective, the clinician is not just saying on paper, this is the most effective drug.
They are also saying for this particular patient, based on their life, based on their circumstances, based on their own priorities, I believe this drug is better for them, even if it is not the most clinical effect, clinically effective. It's also built on a one to one relationship. So we have a dynamic between a patient and a clinician where that is a therapeutic relationship.
We have seen that dissolve slightly over time with the removal of your named practitioner, but still there has always been this sort of drive for there to be a therapeutic relationship between you and your clinician. Between the patient and the clinician. And we've also had a very narrow model of trust.
And what do I mean by that? I mean that we know who is responsible for each part of the care pathway. And therefore, if something goes wrong, there is a way of going back and tracking it, and we also know each of the components that link and lead to good trust in the healthcare system. And we know how to tweak them and improve them because the number of players in that space is, has been traditionally very narrow.
Now, what we are seeing, and Joe has already really highlighted this extremely well from talking about the architectural diagrams, et cetera, is in this drive to make data driven model, data driven healthcare, we are actually shifting towards an entirely different system. So we're moving from an evidence based model to an algorithmic based model.
So rather than it being this combination of the patient, of the experience of the clinician, And the evidence generated in sort of research environments, we are taking big data sets that might be decontextualized, that might be messy, as in the way that Joe has just described, we are letting the algorithm develop on them, and then we are letting the algorithm decide or drive what it is that it wants to make care.
It's also not about the patient. It's about the digital twin or the automated nature of the patient. So when our algorithm is making a decision or making a recommendation, being those risk prediction things that Jo's just talked about, or sepsis, it's not actually looking No one is physically touching and looking at the physical patient in that moment in time.
It is all being made on the date, the patient's digital or data twin what they are represented in the data space. And those two things are not necessarily always one and the same. For some of the technical reasons that Joe has talked about, there are often errors. And also because the patient does not have any, a great deal of control over that.
If anybody ever wants to experiment with this, one of the really great sort of simplest ways of bringing this to your attention is to go into your Google profile and see who Google thinks you are based on your Google searches. I did this several years ago when I did an experiment of my personal Google and my work Google, and my personal Google was pretty close.
It thought that I was, Female thought that I was a lot younger than I was, but it was like, Oh, this is a woman who's interested in things like Taylor Swift, et cetera. It was pretty accurate. Whereas my work one, because all I do all day is look up things to do with data and healthcare and tech actually thought I was a man and made a wide number of assumptions based on me that were not accurate.
So that's just a sort of silly example. of the fact that your data twin might not match your reality. And then there are also all of these problems like Joe talked about, the fact that clinical coding is incentivized that might make those two things even further apart. And the what we record increasingly in wearables, etc.
is also not necessarily a one to one because it's not designed to work equally well for every single person. And then we're shifting from a one to one relationship to a many to many relationship. So we now no longer have a clinician and we no longer have just one patient. Instead, what we have is a big data set of many different types of patients that are being used to compare and then drive to those this data patient of the person in front of themselves.
This is why I say That data driven medicine is less like personalized healthcare and more like targeted advertising. And we also have many different companies involved, as again, as Joe has flagged. And this causes a problem of distributed trust. So we now have multiple different partners involved from who collects the data, who cleans it, who curates it, who decides what algorithm is going to be used, who decides how that's going to be trained, how it's going to be how it's going to be validated, how it's going to be evaluated.
Then it comes into contact with clinicians who is using it. This whole system is completely disrupted. We now no longer know what the components of trust are. And we also no longer know who controls each of those different components and how to go back and fix things if they go wrong. And the biggest issue is that this is all happening inside a largely ungoverned black box.
So when we hear the word black box, what most people think we're talking about Is, can we look under the hood of the algorithm itself? I don't know why I'm miming hood, there we go. Can we look under the hood of the algorithm itself and see what it actually looks like see what it's making its decision based on, see what factors are associated in that particular model.
Whereas actually what I want to stress is that the whole entire system is a black box. We don't know, as I've just said, where these different points of accountability lie. And all of the law is largely broken. We have everything from data protection law is really struggling all the way through to medical device law is really struggling to keep up with all of this.
I'm not saying that people aren't trying. They absolutely are. The MHRA has a fantastic long roadmap of what it's trying to do with AI as a software, as a medical device, but everything is struggling. And this is causing a really fundamental shift in what we consider good quality or high quality care. For these reasons and the last one is the most important one in terms of the title of this webinar.
But first of all, we are outsourcing knowledge about the body and we are changing what counts as evidence of illness and evidence of its absence. So it's no longer me. As a person going to my doctor and saying, I don't feel well or my child is not really acting like themselves or something's a bit off.
It just doesn't feel right. It is instead what is happening in an algorithmic system. And if there is a signal in the data that you're unwell. then great, you can continue, everyone is going to try and treat you. If there is no signal in the data that you are unwell, but you do not feel well, what does that mean?
And then it's also undermining this concept, this is a meta right, or the right not to know, which is really well talked about in the field of genetics and genomics, but it's for some reason it's not talked about in so well in the field of predictive modeling. But patients or individuals have a right to say, actually, I don't want to know.
That in five years time, I have a risk of developing this disease or this particular type of cancer, et cetera. Especially if there's nothing that can be done to reduce that risk, because that's, in fact, a psychological, it's a form of psychological harm it damages that person's self integrity. And so people should have the right not to know.
But what we are instead seeing is that this stuff is all running and predictive all the time. And then we're seeing things like statements coming out saying we want to then push notify people through, for example, the NHS app and tell them that they should come in for screening for this, etc. And whilst there are enormously good benefits behind that type of logic.
We have to also remember that it is changing some of the dynamics of what we consider to be caring. It's disrupting the fundamentals of care for the reasons that I've just talked about. We now have new power dynamics. We have third party private providers. increasingly involved in that clinical or therapeutic relationship and we have things like chat GPT might be eventually used to do triaging.
That thing is a mimic machine. It is not capable of things like empathy or genuine semantic understanding. So that's devaluing the ethics of what we consider to be caring. And then as we've talked about the fact that there is enormous amounts of baked in bias in these models. And it's not just a case of, do we have enough data on all of the different ethnicities or all of the different types of genders that we have in the healthcare system?
There are multiple complexities of different types of bias that are built in, and this cannot be overcome just by solving the digital exclusion problem. So sometimes what we see in, particularly when you get this transformational types of statements made is, Oh, we're going to improve access to care.
Okay. Because where there's no longer a doctor, we can put an algorithm in place. And then the fact that people do not necessarily have devices is fine because we'll go and give everybody in the care system an iPad and it will be fine. That is an over simplistic and a reductionist interpretation of what it means to have equitable access to care when there are all of these multiple sources of bias underneath.
And so what I want to stress is that what we are at risk, and this is what I talk about when we say in transforming care, transforming it into what we want. We don't have sensible conversations about where we want to go and where we want to end up. It's a bit like saying we're going to shoot at the stars and we'll just decide which planet we land on when we get there, as opposed to having put in place a very careful map to decide what it is that we want.
Because if we don't think very carefully about these transformational types of things, we're We may end up with this two tiered system between the worried well, so those are people like myself really, who generate extremely high quality data on themselves and therefore have very good algorithms built around their care.
But I don't actually need the healthcare system that much. I'm a pretty healthy, I'm young, it's not really that much of a problem. And then the ignored six. So these are the people who are not in the data space. And so they are therefore. not necessarily considered. And what I think that there is this risk is anybody who's on this call might have heard of the inverse care law, which is developed in about 50 years ago.
And it says the inverse care law says that the availability of care varies inversely with need. And I think what we are now beginning to see is create the creation of the inverse data quality law where the availability of high quality medical or social care data varies inversely with the need of the population served.
I told you that we were going to depress you. And then I'm going to bring you back up for the last sort of three or so minutes before we open open things up to questions to talk about what we can do to get it right. So as a minimum, I think this is about reframing the conversation and making sure we are having very sensible conversations about what it means to transform healthcare system with the use of data from particularly from an equitable perspective.
First of all, we have to recognize that the informational environment that we all live in now is a social determinant of health. Just as the ecosystem or the, your sort of environment wider in air quality, et cetera, determines what quality of your health is now. So does the information that is generated about you.
and is built into algorithms. And unless we pay attention to that fact, we are at risk of causing harm. We need to recognize that digital health or AI or data driven healthcare is much more a public health intervention than a personalized intervention. As much as it is categorized or branded as being personalized health, as I talked about in the beginning, This is really a public health level it is talking about groups.
And so we need to treat it as such. And then we need to focus really more narrowly, I think, on information needs rather than information wants and that's a slightly irritating phrase. And by all I really mean. Is instead of assuming that more information or more data is always better and will always automatically lead to better outcomes, we should instead be reversing that chain of thought.
So what is the outcome that we want to achieve? What is the information that we need in order to achieve it? Where do we get it from? And then we reverse backwards. Instead of doing, here's a data set. I'm going to develop a model and then I'm going to find out where I can deploy it. So that's really all I mean by that phrase.
Okay. So this is just to say that we have to pay attention to where all of this data is coming from. It's not just about paying attention to the data sets that the NHS controls, because it's not just those data sets that have an impact on patient care. It's also about talking about things that's important.
generated on social media the things that are put in personalized recommender systems clinical decision support software. There's all matters and we have to pay equal attention to all of it. We need to take some specific actions as a research community, as a clinical community, as a group of people who care about this stuff.
We need to generate evidence of what's actually happening. We need to develop some theory about what is going on and what we can do about it. And we need to pilot some solutions to make sure that what we are doing is what we really want to do. And so this, I won't read to you. Don't worry.
Anybody can read these can read this in their own time. I'll put these slides up online after, after we've talked, but really, I think we have to be focusing on these areas we need to think about epistemic certainty, so how do we know what we know And all the way through to designing a system around, around meaningful accountability.
And so now I'm going to, you're going to all breathe a sigh of relief because that's the end of the slides and we'll open it up to questions for the last 10 or so, 12 minutes or so. Brilliant. Thanks so much, Jess. That was and Joe, that's absolutely fascinating. And we've got quite a lot of questions in already.
So I'll kick off just while I'm forming them together a bit, just to remind you that we've got a survey going. So if you're a healthcare professional involved in, digital or tech. We're going to put that in the chat now, a link to that. It's not directly related to this webinar, but we think a lot of people on this webinar will want to put their views into that.
So that's going to go in the chat as we go. So the first questions that came in were a group of questions really based and this probably for Joe thinking about, but also Jess, I think you could probably weigh in quite helpfully on the role of the vendors of Particularly of electronic health care records and how much power they have in the system, whether or not given the complexity and the role they're taking, they should be considered an medical device and really, what can we, what should be being done about their roles and their regulation and their influence.
So I don't know if you want to start Jess and Jo carry on or someone else. No, Joe, go. Joe is on mute. If you are talking, Joe, we can't hear you. Oh, Joe says it. Guys, can you hear me? Yeah, we can hear you. Did you hear that question?
No. Okay, I'll go. Helen, you were just, you were saying about the fact that these questions are around how we control? Is that sort of what you were saying? Yeah how the role of the vendors of it and of the companies that own electronic healthcare health record systems.
So Jo talks a little about Epic, but obviously there are others in the BBC style, but the the best example. How much control do they have? Should they have? And are they regulated in the way that they should be? Oh, okay. Great question. So they have an enormous amount of control. They have in fact what I think is probably a lot more control than people fully realized.
So everything in terms of what you can record in an electronic record all the way through to things like when you click. To prescribe the order of the drugs that appear in that list, which we know has a determining factor a determining influence on prescribing is all controlled by, by those vendors.
And in terms of the control that we as a system have over that is actually extremely limited. There is some control in terms of there are procurement frameworks about which ones are allowed to be used in the healthcare system. But we don't have an enormous amount of control over the design.
There's very little standardization. There's very little regulation. So for example, an electronic health record is not considered a medical device. It really should be. But and as Joe has put in the chat as well, we also don't have control over who they decide to give data to.
So they, that is also a very big problem, and we have seen issues happen in the NHS, for example, where The NHS has been charged to effectively get its own data back out of electronic health record providers. And that happens in, in the U. S. as, as well. So there really does need to be quite a significant change in the way in which we govern these systems.
Yeah, it's worth mentioning that we've got a pretty international audience. There's a number of people from the UK, but there's a number of people who obviously work and research in other health systems as well. So I'm sure it's justice. It's just true elsewhere. So there's no, there's and the reason we were using Epic, Epic is One that has a small market in the UK, but it's actually a far bigger market internationally.
So everything I've just said is true outside of the UK as well. Yes, great. A slightly UK focused question but I think it will bring up a bit. bit more of a general point as well is the Jan Osuto asks about the emergency care data set, which was designed by clinicians with the Royal College of Emergency Medicine in the UK.
Does that help having really carefully designed data sets by people who are Intimately involved with what care is does, is that solve some problems? It definitely helps. For the sort of the most glib way is that the people when you are close to.
Working in a care environment, you know the meaning behind the data, not just it as a sort of representation, you know what it represents in a more meaningful sense. And so yes, it does. Absolutely. It does help this sort of slight challenge that we have there is how do we compensate people if we are asking them increasingly to do more what you would call data work in addition to their day job.
And so that's sometimes. a sort of added complexity. And that comment I think follows on a little bit to another question we had from Mary Annick Le Pogam, who has pointed out that what is that role of a physician, a doctor? Is it bad? Is it so much worse that they're making a decision based on a black box algorithm than they're making a decision based on just looking at a few scans and not really talking to the patient very well?
Is there a bit, sometimes it's easy to be a bit luddite and go and think we're at the gold standard now, so any difference is different, but no, that's a great question. And that's why I said, when I was talking about slide, I'm not by any means saying what we have now is perfect or indeed that doing any of these shifts is necessarily worse.
In fact, there are, there is evidence that it could be significantly better. It's more. about the fact that we need to have I think we have to be having rational conversations about what does this mean. So the fact that you're asking that question is a good and valid question. And I just think we need to be asking those questions more.
And we also, I think, need to move away from this tendency that has developed to do clinician versus algorithm type of comparison. I don't think that's actually very helpful because it's like trying to compare, the cliche apples and oranges, but potentially even more apples and steak.
These two things are very not the same. And trying to compare them as though they're supposed to perform in the same ways is a little misleading. That's really helpful. And it's also worth saying that in our whole work around future health at BMJ where we're really noticing that sort of importance of people understanding that, they can't get away from data and understanding that their involvement in data and health technology being part of their job is something that we need to be ready for and prepared for and can help shift the dial in terms of outcomes and equity and all those big important issues.
We had another question, let me just quickly find it which was quite Oh, Joe's already, I knew it had gone somewhere. Joe's already answered it, but I'm going to share it in case people aren't following the Q& A closely, because I think it's quite interesting, and I'm going to try and paraphrase because it's also quite technical.
The area of in question is the data selection and pre processing. And the summary of the question is, in looking at the traditional, As a comparison, the traditional epidemiological models of data, are we throwing away a lot of value when we, in the way we're currently looking at the data and bringing it up, up the, up your funnels to the right place?
What's the answer there to how we yeah, so I, as Joe has said in the chat, and I think I am probably echoing Joe's thoughts in this. So one of my main frustrations in This whole space is that I think we have developed a bit of a tendency and this is a little bit what I mean about comparing the clinician versus the algorithm to just continue what we're doing in the data space.
So what do I already use in order to make a decision about this and how do I replicate it in the data and then I just funnel everything up towards that. And it's all based on, as Brendan has quite correctly said in this question, based on these phenotypic models that are predefined based on our.
own cognitive concepts, whereas actually we probably have an enormous amount of freedom and flexibility and value in moving away from those predefined structures and allowing things to develop in a slightly more free flowing way. And that might actually completely turn on our heads the way in which we think about certain types of diseases.
And I think that's really where the biggest opportunity is. But at the moment, I think we're missing it because we are cognitively constrained by what our existing knowledge is. Brilliant. Thank you so much, both of you. I think we'll finish the questions there. to finish on time. I do want to say that, unfortunately, we haven't got a great line to Joe on this webinar, but you can actually hear him speak live by being in the same room as him in November, when we have our BMJ future health event, where I'm sure he'll be touching on some of these issues and drawing, drawing out more, more interesting points is not more interesting, but more.
Interesting points as he speaks and we'll take questions there as well. So really hope to see a lot of people online here now in real life in November the tickets are open. The, there should be a slide coming up, I think Molly with a link to, to that event, but if not, you can find it at futurehealth.bmj.com. And we will see you at the next webinar or in London in November. And once again, a big thank you to Jess and to Joe for a really interesting Friday afternoon. Thank you