AI Sees Race Even When Humans Can't [Breakdowns]
Deep Learning can detect a person's race from medical images, even if those images are corrupted a lot.
Hey, it’s Devansh 👋👋
In my series Breakdowns, I go through complicated literature on Machine Learning to extract the most valuable insights. Expect concise, jargon-free, but still useful analysis aimed at helping you understand the intricacies of Cutting Edge AI Research and the applications of Deep Learning at the highest level.
If you’d like to support my writing, consider becoming a premium subscriber to my sister publication Tech Made Simple to support my crippling chocolate milk addiction. Use the button below for a discount.
p.s. you can learn more about the paid plan here. If your company is looking for software/tech consulting- my company is open to helping more clients. We help with everything- from staffing to consulting, all the way to end to website/application development. Message me using LinkedIn, by replying to this email, or on the social media links at the end of the article to discuss your needs and see if you’d be a good match.
I know Wednesdays are reserved for me to share interesting content (the updates post). But I went through a research paper and I wanted to bring it out ASAP. This week’s reading list comes out tomorrow.
Medical Imaging is undeniably one of the most impactful usages of AI. Leveraging AI to enhance medical images, detect possible disease/problematic outgrowth, or even create synthetic high-quality medical images for research can all bring significant value to society. But in an all too shocking and unforeseeable turn of events, turns out that Deep Learning can come with some unexpected and (dare I say) colorful side effects.
The authors of the paper “AI Recognition of Patient Race in Medical Imaging: A Modelling Study” shared some very interesting findings with regard to AI and its ability to accurately classify a patient's race based on their medical images (something experts thought was impossible). In their words, “In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities… AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot”.
These models show great robustness both to noise and even extreme cropping- “in most cases, using only one-ninth of the image was sufficient to obtain prediction performance that was almost identical to using the entire image”. Imagine the tears of all those Computer Vision researchers who have tried to reach this level of robustness through all the SOTA techniques. And our giga-chad researchers did this without even sweating.
While I don’t think the race classification ability of Deep Learning models is much to panic about, this paper shows us some interesting avenues to consider going forward. In this article, I will cover the results, implications, and highlights of the research paper. Given how critical medical AI is, this is not a piece you want to skip out on.
if an AI model relies on its ability to detect racial identity to make medical decisions, but in doing so produced race-specific errors, clinical radiologists (who do not typically have access to racial demographic information) would not be able to tell, thereby possibly leading to errors in health-care decision processes.
- This is a big deal in evaluating decisions.
Data Preparation
As with such papers, the first place we look is into the datasets and any particular experiment design decisions taken by the authors. The authors chose the following datasets-
CXP=CheXpert dataset.
DHA=Digital Hand Atlas.
EM-CS=Emory Cervical Spine radiograph dataset.
EM-CT=Emory Chest CT dataset.
EM-Mammo=Emory Mammogram dataset.
EMX=Emory chest x-ray dataset.
MXR=MIMIC-CXR dataset.
NLST=National Lung Cancer Screening Trial dataset. R
SPECT=RSNA Pulmonary Embolism CT dataset.
Seems like quite a bit of effort was put into ensuring the diversity of data distributions and situations. Kudos to that-
We obtained public and private datasets that covered several imaging modalities and clinical scenarios. No one single race was consistently dominant across the datasets (eg, the proportion of Black patients was between 6% and 72% across the datasets).
The most significant decision was in how they chose to measure race.
When it comes to race, they rely on self-identification with very broad categories. This is flimsy because there is large genetic diversity amongst people of the same color (the genetic diversity of Africa alone is greater than anywhere else in the world, even if most Africans would be lumped in Black). Genetic classifications would have a better carryover with the medical domain. The authors acknowledge as much but ultimately chose self-reported race since racial discrimination is not done on a genetic level, but a color level. I would love to see how things change if we change this race measure.
Now that we understand the basic setup, let’s look through the results.
Results
When it comes to straight AUC, the models do well. Very well.
The first instinct in those of you who are more experienced might be to control for confounding variables. Stratifying across BMI groups, using tissue density, age, disease labels, etc. also did not fully account for the performance. They also stacked a few ML algorithms using basic features against Deep Learning classifiers-
Overall, this gives us a strong hint that there is something else at play. Moving on to what I found the most interesting part of the paper- the AI’s ability to catch race, even from very corrupted images.
More strikingly, models that were trained on high-pass filtered images maintained performance well beyond the point that the degraded images contained no recognisable structures; to the human coauthors and radiologists it was not clear that the image was an x-ray at all.
Deep Learning pulls a Detective Conan
The authors ran the input images through filters. Specifically, they used a variety of low-pass (these reduce high-frequency noise from images, denoising the image) and high-pass (the opposite, used for sharpening) filters (watch this video for the details). During the high-pass filter (HPF) stage, the model maintains performance for quite a while, picking up race in significantly corrupted images.
A high performance (up to diameter 100) in the absence of discernible anatomical features was maintained with the addition of a high-pass filter (ie, model performance was maintained despite extreme degradation of the image visually).
You might think that the “absence of discernible anatomical features” is an exaggeration, but take a look at what HPF-100 looks like: it might as well be noise. Detection in this environment is quite impressive and might end up with some interesting applications in encryption and compression.
The ability to pull race from that level of noise is honestly impressive. But why does that matter? Let’s end with a discussion of some implications of this study.
Why this matters
First and foremost, it is well known that Medical AI has a propensity to lead to racially skewed results.
In 2019, a bombshell study found that a clinical algorithm many hospitals were using to decide which patients need care was showing racial bias — Black patients had to be deemed much sicker than white patients to be recommended for the same care. This happened because the algorithm had been trained on past data on health care spending, which reflects a history in which Black patients had less to spend on their health care compared to white patients, due to longstanding wealth and income disparities.
-Source. All data scientists need domain knowledge. It is criminal how tech/AI teams set out to disrupt sensitive industries while nothing next to nothing about them. This only leads to failure, for everyone.
AI being able to detect race when humans can’t increase the potential of racial inequality manyfold. If Deep Learning can use images to predict race (even accounting for standard confounding variables), then there is a significant chance that Deep Learning predictors are indirectly utilizing race when deciding what decisions to make. And people have no way of detecting that. I’ll let you connect the dots (I’m genuinely shocked AI Doomers haven’t made a bigger deal out of this paper).
AIs extreme robustness to noise and cropping also presents a very unique challenge because it seems to grant it immunity to one of the most common counters against this problem-
One commonly proposed method to mitigate the known disparity in AI model performance is through the selective removal of features that encode sensitive attributes to make AI models “colorblind”.
Although this approach has already been criticised as being ineffective, or even harmful in some circumstances, our work suggests that such an approach could be impossible in medical imaging because racial identity information appears to be incredibly difficult to isolate.
That to me is a big deal. A huge deal. I don’t understand why this didn’t get much attention. The technique of obscuring/changing details is used by many engineers in a variety of fields to create AI that is more fair. Take a look at one similar technique leveraged by Amazon to reduce historical bias in their language models-
This paper shows that there is a good chance that it is not enough. Deep Learning agents are picking up associations and correlations that are so embedded into data that we haven’t even started to consider them. How many other use cases need to be investigated for AI picking up hidden biases and potentially acting upon them? We need a very systematic and thorough investigation into this because there are systems out there ruining lives because such phenomena are flying under the hood.
Before you start panicking- keep in mind that most teams aren’t probably working on critical software with very strong direct impacts on people’s lives. You’re probably forecasting sales, grouping customers into spendy or cheap, etc. If that is the case, similar hidden phenomena will at worst impact business performance not actual human lives. If you have a halfway decent pipeline setup, you don’t really need to tear everything apart and build it up from scratch (the ROI is not worth it). Simple techniques and great experiments will be more than enough.
But when it comes to AI being used to judge people- whether it’s for hiring/employees, credit scores/financing, medically, legally, etc.- I would tattoo this paper on your favorite intern’s forehead and make sure to stare at it once an hour. The learnings from this are not something you want to ignore.
I’ve been investigating this in more depth to come up with solutions. If you’d like to speak to me about this, please reach out using the social media links at the end of the articles. This is a true societal risk from AI, and we need to work together to solve it.
If you liked this article and wish to share it, please refer to the following guidelines.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
I'm torn on this one because not that long ago Medicine was accused of 'whitewashing' all medical records and treating everyone the same. Now when it does find differences, we haven't answered whether those differences are bad... or good. We've just gotten bothered that AI can tell the difference.
But there IS difference in health profiles of different racial genetics. Sickle Cell Anemia is just one example. We really need to be careful before we go back and 'whitewash' all over again.
https://www.cnn.com/2021/04/03/us/arab-americans-covid-19-impact/index.html
This is absolutely crazy to me. Part of my research as a modeler working in MRI was trying to understand network feature extraction for medical images because we know so little about it. It's been proven time and time again how good these networks are at extracting features even humans can't see but something like is on another level and I love to see the capabilities shown here. However, this also further highlights the issues of racial bias in ML models when it can seen in such an important application like healthcare. This is a mega finding by this group.