Terrence Neumann

About me

I am a fifth-year PhD candidate at the University of Texas at Austin studying Information Systems, advised by Maria De-Arteaga and Yan Leng. Throughout my PhD, I’ve been fortunate to collaborate with and learn from great researchers like Sina Fazelpour, Matt Lease, and Maytal Saar-Tsechansky.

My research focuses on trustworthy AI. I am particularly interested in the following research streams: (1) mechanistic interpretability of LLM agents, (2) responsible use of LLMs as silicon subjects for academic and social applications, and (3) algorithmic fairness on social media. I pursue research that bridges machine learning and computational social science to address pressing challenges at the intersection of AI and society.

Above: I have found the NIST AI Risk Management Framework’s Characteristics of Trustworthy AI as a valuable framework for scoping and defining my contributions. Click on a characteristic to explore my work in that area.

2025

Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation

Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour

Forthcoming in Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2025

Abs HTML

The emergent capabilities of large language models (LLMs) have prompted interest in using them as surrogates for human subjects in opinion surveys. However, prior evaluations of LLM-based opinion simulation have relied heavily on costly, domain-specific survey data, and mixed empirical results leave their reliability in question. To enable cost-effective, early-stage evaluation, we introduce a quality control assessment designed to test the viability of LLM-simulated opinions on Likert-scale tasks without requiring large-scale human data for validation. This assessment comprises two key tests: logical consistency and alignment with stakeholder expectations, offering a low-cost, domain-adaptable validation tool. We apply our quality control assessment to an opinion simulation task relevant to AI-assisted content moderation and fact-checking workflows—a socially impactful use case—and evaluate seven LLMs using a baseline prompt engineering method (backstory prompting), as well as fine-tuning and in-context learning variants. None of the models or methods pass the full assessment, revealing several failure modes. We conclude with a discussion of the risk management implications and release TopicMisinfo, a benchmark dataset with paired human and LLM annotations simulated by various models and approaches, to support future research.
From Statistical Patterns Emerge Human-Like Behaviors: How LLMs Learn Social Preferences

Terrence Neumann, and Yan Leng

Working Paper, 2025

Abs

Large language models (LLMs) are increasingly deployed in settings where their decisions carry ethical and interpersonal consequences. Despite being trained solely through next-token prediction, LLMs often display emergent social behaviors and preferences, such as altruism, fairness, and self-interest. Yet the mechanisms underlying these behaviors remain poorly understood, raising critical questions for both theory and deployment: Why do such behaviors emerge, and can they be controlled? This paper provides an initial answer by linking external social behavior to internal model computation. We introduce a two-step framework using Sparse Autoencoders (SAEs) to identify and manipulate latent features associated with human-like preferences. Using the dictator game—a canonical testbed from behavioral economics—we show that certain latent directions in an open-source LLM (Gemma 2 9b-IT) align with altruistic and self-interested concepts. Activating a single “self-interest” feature, for instance, nearly doubles the rate of self-serving decisions (from 35% to 72.5%). These results demonstrate that social preferences in LLMs can be traced and steered via an interpretable latent structure, providing a foundation for more transparent, controllable, and norm-sensitive AI behavior.

2024

Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion Related to Harms of Misinformation

Terrence Neumann, Sooyong Lee, Maria De-Arteaga, and 2 more authors

arXiv preprint arXiv:2401.16558, 2024

Abs HTML

The pervasive spread of misinformation and disinformation poses a significant threat to society. Professional fact-checkers play a key role in addressing this threat, but the vast scale of the problem forces them to prioritize their limited resources. This prioritization may consider a range of factors, such as varying risks of harm posed to specific groups of people. In this work, we investigate potential implications of using a large language model (LLM) to facilitate such prioritization. Because fact-checking impacts a wide range of diverse segments of society, it is important that diverse views are represented in the claim prioritization process. This paper examines whether a LLM can reflect the views of various groups when assessing the harms of misinformation, focusing on gender as a primary variable. We pose two central questions. (1) To what extent do prompts with explicit gender references reflect gender differences in opinion in the United States on topics of social relevance? and (2) To what extent do gender-neutral prompts align with gendered viewpoints on those topics? To analyze these questions, we present the TopicMisinfo dataset, containing 160 fact-checked claims from diverse topics, supplemented by nearly 1600 human annotations with subjective perceptions and annotator demographics. Analyzing responses to gender-specific and neutral prompts, we find that GPT 3.5-Turbo reflects empirically observed gender differences in opinion but amplifies the extent of these differences. These findings illuminate AI’s complex role in moderating online communication, with implications for fact-checkers, algorithm designers, and the use of crowd-workers as annotators. We also release the TopicMisinfo dataset to support continuing research in the community.
PRISM: A Design Framework for Open-Source Foundation Model Safety

Terrence Neumann, and Bryan Jones

arXiv preprint arXiv:2406.10415, 2024

Abs HTML

The rapid advancement of open-source foundation models has brought transparency and accessibility to this groundbreaking technology. However, this openness has also enabled the development of highly-capable, unsafe models, as exemplified by recent instances such as WormGPT and FraudGPT, which are specifically designed to facilitate criminal activity. As the capabilities of open foundation models continue to grow, potentially outpacing those of closed-source models, the risk of misuse by bad actors poses an increasingly serious threat to society. This paper addresses the critical question of how open foundation model developers should approach model safety in light of these challenges. Our analysis reveals that open-source foundation model companies often provide less restrictive acceptable use policies (AUPs) compared to their closed-source counterparts, likely due to the inherent difficulties in enforcing such policies once the models are released. To tackle this issue, we introduce PRISM, a design framework for open-source foundation model safety that emphasizes Private, Robust, Independent Safety measures, at Minimal marginal cost of compute. The PRISM framework proposes the use of modular functions that moderate prompts and outputs independently of the core language model, offering a more adaptable and resilient approach to safety compared to the brittle reinforcement learning methods currently used for value alignment. By focusing on identifying AUP violations and engaging the developer community in establishing consensus around safety design decisions, PRISM aims to create a safer open-source ecosystem that maximizes the potential of these powerful technologies while minimizing the risks to individuals and society as a whole.

2023

Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?

Terrence Neumann, and Nicholas Wolczynski

In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

Abs HTML

In recent years, algorithms have been incorporated into fact-checking pipelines. They are used not only to flag previously fact-checked misinformation, but also to provide suggestions about which trending claims should be prioritized for fact-checking - a paradigm called ’check-worthiness.’ While several studies have examined the accuracy of these algorithms, none have investigated how the benefits from these algorithms (via reduction in exposure to misinformation) are distributed amongst various online communities. In this paper, we investigate how diverse representation across multiple stages of the AI development pipeline affects the distribution of benefits from AI-assisted fact-checking for different online communities. We simulate information propagation through the network using our novel Topic-Aware, Community-Impacted Twitter (TACIT) simulator on a large Twitter followers network, tuned to produce realistic cascades of true and false information across multiple topics. Finally, using simulated data as a test bed, we implement numerous algorithmic fact-checking interventions that explicitly account for notions of diversity. We find that both representative and egalitarian methods for sampling and labeling check-worthiness model training data can lead to network-wide benefit concentrated in majority communities, while incorporating diversity into how fact-checkers use algorithmic recommendations can actively reduce inequalities in benefits between majority and minority communities. These findings contribute to an important conversation around the responsible implementation of AI-assisted fact-checking by social media platforms and fact-checking organizations.
Mitigating bias in organizational development and use of artificial intelligence

Proceedings of International Conference on Information Systems, 2023

Abs

We theorize why some artificial intelligence (AI) algorithms unexpectedly treat protected classes unfairly. We hypothesize that mechanisms by which AI assumes agencies, rights, and responsibilities of its stakeholders can affect AI bias by increasing complexity and irreducible uncertainties: eg, AI’s learning method, anthropomorphism level, stakeholder utility optimization approach, and acquisition mode (make, buy, collaborate). In a sample of 726 agentic AI, we find that unsupervised and hybrid learning methods increase the likelihood of AI bias, whereas “strict” supervised learning reduces it. Highly anthropomorphic AI increases the likelihood of AI bias. Using AI to optimize one stakeholder’s utility increases AI bias risk, whereas jointly optimizing the utilities of multiple stakeholders reduces it. User organizations that co-create AI with developer organizations instead of developing it in-house or acquiring it off-the-shelf reduce AI bias risk. The proposed theory and the findings advance our understanding of responsible development and use of agentic AI.

2022

Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms

Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour

In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022

Abs HTML

Faced with the scale and surge of misinformation on social media, many platforms and fact-checking organizations have turned to algorithms for automating key parts of misinformation detection pipelines. While offering a promising solution to the challenge of scale, the ethical and societal risks associated with algorithmic misinformation detection are not well-understood. In this paper, we employ and extend upon the notion of informational justice to develop a framework for explicating issues of justice relating to representation, participation, distribution of benefits and burdens, and credibility in the misinformation detection pipeline. Drawing on the framework (1) we show how injustices materialize for stakeholders across three algorithmic stages in the pipeline; (2) we suggest empirical measures for assessing these injustices; and (3) we identify potential sources of these harms. This framework should help researchers, policymakers, and practitioners reason about potential harms or risks associated with these algorithms and provide conceptual guidance for the design of algorithmic fairness audits in this domain.

Background

My path to academia included working as a Data Scientist at the University of Chicago Crime Lab, where I was fortunate to collaborate with exceptional researchers Jens Ludwig and Max Kapustin. Prior to that, I received a MS in Analytics from Northwestern University and a BA in Economics and Mathematics from Indiana University, Bloomington. Outside of research, I love to run (I ran my first 50k in 2024), cook, go to concerts, and play the guitar.

News

Nov 01, 2025	My paper Should You Use LLMs to Simulate Opinions? has been accepted to the AAAI Conference on Artificial Intelligence! I will update you when I hear about whether this will be a poster or presentation in the AI for Social Impact track. See you in Singapore this January!
Oct 21, 2025	My paper Should You Use LLMs To Simulate Opinions? received the Best Paper, Runner Up Award for the ISS Cluster at INFORMS 2025. Thank you to the judges for taking the time to review my paper!
Jul 01, 2025	I have been invited to present at INFORMS 2025 in the Generative AI session within the ISS Cluster! See you in Atlanta.