Digital surveys may have hit the AI point of no return

There’s bad news for those using digital surveys to try to understand people’s online behavior: We may no longer be able to determine whether a human is responding to them or not, a recent study has shown—and there seems to be no way around this problem. 

This means that all online canvassing could be vulnerable to misrepresenting people’s true opinions. This could have repercussions for anything that falls under the category of “information warfare,” from polling results, to misinformation, to fraud. Non-human survey respondents, in aggregate, could impact anything from flavors and pricing for a pack of gum, to something more damaging, such as whether or not someone could get government benefits—and what those should be.

The problem here is twofold: 1) humans not being able to tell the difference between human and bot responses, and 2) in instances where automation is regulating action based on these responses, there would be no way to use such polling and safeguard against potentially dangerous problems as a result of this indistinguishability.

The study by Dartmouth’s Sean J. Westwood in the PNAS journal of the National Academy of Sciences, titled “The potential existential threat of large language models to online survey research,” claims to show that, in survey research, we can no longer simply assume that a “coherent response is a human response.” Westwood created an autonomous agent capable of producing “high-quality survey responses that demonstrate reasoning and coherence expected of human responses.” 

To do this, Westwood designed a “model-agnostic” system designed for  general-purpose reasoning, that focuses on a two layer architecture: One that acts as an interface to the survey platform and can deal with multiple types of queries while extracting relevant content, and another“core layer” that uses a “reasoning engine” (like an LLM). When a survey is conducted, Westwood’s software loads a “demographic Persona” that can store some recall of prior answers and then process questions to provide a “contextually appropriate response” as an answer. 

Once the “reasoning engine” decides on an answer, the interface in the first layer outputs a mimicked human response. The system is also “designed to accommodate tools for bypassing antibot measures like reCAPTCHA.” Westwood’s system has an objective that isn’t to “perfectly replicate population distributions in aggregate . . . but to produce individual survey completions [that] would be seen as reasonable by a reasonable researcher.”

Westwood’s results suggest that digital surveys may or may not be a true reflection of people’s opinions. There is just as likely a chance that surveys could, instead, be describing what an LLM assumes is “human behavior.” Furthermore, humans or AI making decisions based on those results could be relying on the “opinions” of simulated humans. 

Personas 

Creating synthetic people is not a new concept. Novels, visual media, plays, and advertisers use all sorts of creative ideas to portray various people in order to tell their stories. In design, the idea of “Personas” have been used for decades in marketing and User Interface design as a cost-cutting and timesaving trend. Personas are fictional composites of people and are represented by categories like “Soccer Mom,” “Joe Six-pack,” “Technophobe Grandmother,” or “Business Executive.” Besides being steeped in bias, Personas are projections of what the people creating them think these people would be and what the groups they might belong to represent. 

Personas are a hidden problem in design and marketing, precisely because they are composites drawn from real or imaginary people, rather than actual people—the values ascribed to them are constructed by other people’s interpretations. When relying upon Personas instead of people, it’s impossible to divine the true context of how a product or service is actually being used, as the Personas are projected upon by the creator, and are not real people in real situations.  

Thus, the problems with using Personas to design products and services often aren’t identified until well after such products or services come to market and fail, or cause other unforeseen issues. This could be worse when these human-generated Personas are replaced with AI/LLM ChatBot Personas with all the biases that these entail—including slop influences or hallucinations that could make their responses even more odd or potentially even psychotic. 

Quant versus qual

Part of the larger problem of not understanding people’s needs with surveys started when research shifted to statistical data collection based on computation, also known as quantitative methods, rather than contextual queries based on conversations and social relationships with others, or qualitative methods. As big data came online, people began to use quantitative methods such as online surveys, A/B testing, and other techniques to understand customer/user behavior. Because machines could quickly compile results, quantitative research seems to have become an industry standard for understanding people.

It is not easy to automate qualitative methods, and replacing them with quantitative methods can forfeit important context. Since almost a generation has gone by with the world focused on computational counting, it’s easy to forget about the qualitative data methods—found in social sciences such as Anthropology—that use contextual inquiry interviews with real people to understand why people do what they do, rather than trying to infer this from numerical responses. 

Qualitative research can give context to the quantitative data and methods that rely upon machines to divine meaning. They can also work outside of big data methods, and are grounded in relationships with actual people, which provides accountability to their beliefs and opinions. The process of talking with real people first contextualizes that content, leading to better outcomes. Qualitative methods can be quantified and counted, but quantitative methods cannot yet easily be made to be truly broadly contextual.

One difference between using qualitative and quantitative methods has to do with transparency and understanding the validity of people’s responses. With older human-made Personas, there are obvious assumptions and gaps—it’s crude puppetry and projection. But when people become manufactured by Chatbots/LLMs that utilize a corpus of knowledge mined from massive volumes of data, there can be fewer ways to separate fact from fiction. With chatbots and LLMs, the artificial entity is both the creator of the “person,” potentially the responder to the person, and either the interpreter of that fake chatbot person’s responses, or being interpreted by an LLM. That’s where it can get dangerous, especially when the results of this type of slop tainted research are used for things like political polling or policing. 

Westwood’s research has shown that: “Rather than relying on brittle, question-specific rules, synthetic respondents maintain a consistent Persona by conditioning answers on an initial demographic profile and a dynamic memory of previous responses. This allows it to answer disparate questions in an internally coherent manner, generating plausible, human-like patterns . . .” It can mimic context, but not create it. 

Back to the basics

When GenAI is moving toward conducting the surveys, acting as respondents, and interpreting the surveys, will we be able to tell the difference between it and real people? 

A completely automated survey loop seems fictional, until we see how many people are already using Chatbots/LLMs to automate parts of the survey process even now. Someone might generate a Persona, then use that to answer surveys that “AI” has designed, that someone else will then use a Chatbot to access “AI” to interpret results. Making a complete loop could be terrible: someone may then use AI to turn the Chatbot created, Chatbot answered, and “AI” interpreted survey responses into something that impacts real people who have real needs in the real world, but instead has been designed for fake people with fake needs in a fake world. 

Qualitative research is one path forward. It enables us to get to know real people, validate their replies, and refine context through methods that explore each answer for more depth. This type of work AI cannot yet do as LLMs currently base answers on statistical word matching, which is unrefined. Bots that replicate human answers will mimic a type of simulated human answer, but to know what real people think, and what things mean to them, companies may have to go back to hiring anthropologists, who are trained to use qualitative methods to connect with real people. 

Now that AI can falsely replicate human responses to quantitative surveys, those who believe that both quantitative methods and AI are the answers to conducting accurate research, are about to learn a hard lesson that will unfortunately, impact all of us. 

Read More from Fast Company

Leave a Reply

Discover more from ZoomHoot - The Important Information You Need

Subscribe now to keep reading and get access to the full archive.

Continue reading