Researchers from New York’s Columbia University are using AI and data science to understand why people are distrustful of vaccinations and to help convince them otherwise.

With the US needing to vacinate between 70%-90% of its population to achieve herd, or population, immunity, vaccine hesitancy could prove to be the major obstacle to finally emerging from the pandemic.

Over the past two decades, an increasing proportion of the population has come to view vaccines with skepticism, and in many cases, refrained from getting vaccinated altogether. While the procedural, behavioral and access barriers to vaccine uptake have been studied extensively, the emotional, ideological and rhetorical bases for vaccine hesitancy are poorly understood. This limited understanding, combined with public distrust in science and government, is likely to undermine efforts to robustly and quickly vaccinate against COVID-19.

The new project from Columbia World Projects in partnership with key Columbia faculties aims to tackle this problem by studying the language used by different groups of people in the U.S. who are vaccine reluctant.

The project will engage vaccine makers, literary scholars, data scientists, political scientists, community leaders and public health officials and will create the world’s largest public dataset of vaccine-hesitant language in English, collected from online forums such as Facebook, YouTube and Twitter, which have become primary platforms for discussing and disseminating vaccine skepticism and other vaccine-related concerns.

Leveraging the power of such collected data, the project will use artificial intelligence to develop public messaging that reflects the ways in which people express specific forms of hesitancy. This approach represents a significant effort to use AI to analyze the language of vaccine hesitancy and then use that language to combat vaccine skepticism.

Project lead, Rishi Goyal, assistant professor of emergency medicine at the Columbia University Irving Medical Center and a PhD in comparative literature, explained; “Getting high uptake of vaccines, we believe, is the largest driver to return the world back to some semblance of normalcy. But survey data shows vaccine hesitancy across the US remains high. There’s a growing variety of logic and rhetoric to it so vaccine hesitancy is not one thing and cannot neatly be understood in simple demographic terms.

“We’re struck by the fact that a lot of people think they know why everybody’s vaccine hesitant. But there hasn’t been a lot of actual data collected on this and sometimes I think data are thought of purely in numerical terms, but we treat language itself as data that we can analyze, interpret, count, and calculate.”

As for leveraging AI, co-project lead Dennis Tenen added; “AI is a very broad and imprecise term. We prefer to say we are using computational methods. When Rishi and I are wearing our hats as comparative literature professors, we observe language, we look for metaphors, we look for figurative language, and we close read. If we want to scale up these close reading insights, we have to leverage computational tools. When we are collecting millions and millions of messages around vaccination, you have to use computational tools to reveal patterns in that data.”

The project is in its early stages but the team are confident of making an impact. “If we can increase vaccine uptake and vaccine confidence overall and show that our methods were able to promote that, that would be incredible,” said Goyal. “Both of us are also excited and interested in engaging with these methods and the approach. Complex problems are often reduced to very simple solutions and simple root causes. I find that very dissatisfying. If we can keep the complexity of individual opinion alive during the project and still make a difference, that would be quite exciting.”