Scaling social science research A new tool to help researchers turn qualitative data into numbers they can analyze

https://www.linkedin.com/feed/update/urn:li:activity:7430802316263337984/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAJrHvsBTNSt86Fs5GFHTEcIGOuwz83Dslg

I'm super excited to announce the official release of the GABRIEL package by OpenAI, and an accompanying NBER working paper I wrote with Hemanth Asirvatham and our professor Andrei Shleifer. The goal of GABRIEL is to make it really easy to use LLMs for research, for things like analyzing huge datasets of texts, images, or audio -- or even just the wealth of unstructured data on the Internet. It's designed for anyone who wants to use LLMs as a tool of analysis, even without prior background in using AI at all.

Our paper attempts to validate GABRIEL and to show that LLMs are general reasoning tools. We find LLMs can be used across a wide range of qualitative tasks - just like humans, but often better and at a tiny fraction of the cost (17,000 times cheaper!). In our paper, we use GABRIEL across a series of applications, ranging from measuring the content of congressional speech in the US over time to understanding what's being taught in the history curriculum in every American county. We build the largest dataset of technologies to date using LLMs, and we use it to analyze the evolution of technology adoption since the Industrial Revolution.

Shared by Jimmy Breeze, like and 1 save total

Creative approaches to problem solving | Q Community

Q’s toolkit of 25 tried-and-tested methods for creative collaboration and problem-solving

Shared by Jimmy Breeze, like and 1 save total

AI Evidence Playbook | The Abdul Latif Jameel Poverty Action Lab

Shared by Jimmy Breeze, like and 1 save total

By far the most versatile Claude Code tools I built in the last few weeks were /prompt and /review-plan.One turns unstructured narrated brain dumps into coherent instructions, the other gets a… | Christopher Blattman

Could def use this for website dev

Shared by Jimmy Breeze, like and 1 save total

The missing discipline in AI: a call for behavioural science

Shared by Jimmy Breeze, like and 1 save total

Do You Agree? Do You Strongly Agree? The Effect of the Number of Response Categories on Response Processes and Verification of Substantive Hypotheses

https://www.linkedin.com/posts/marek-muszy%C5%84ski-34b9ba285_do-you-agree-do-you-strongly-agree-the-share-7432028016970637312-35_k/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAJrHvsBTNSt86Fs5GFHTEcIGOuwz83Dslg

Practical takeaway:
For most surveys, 5–7 point fully labeled scales seem to offer the best balance between measurement quality, respondent effort, and usability — especially in an increasingly mobile-first world. A reminder that in survey design, more is not always better.

Shared by Jimmy Breeze, like and 1 save total

Authenticity checks detect AI agents best | Prolific

https://www.linkedin.com/posts/phelim-bradley-84300826_ai-agents-completing-surveys-is-no-longer-share-7432597680356384768-dPOp/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAJrHvsBTNSt86Fs5GFHTEcIGOuwz83Dslg

AI agents completing surveys is no longer theoretical. Westwood’s paper (2025, PNAS) showed that autonomous agents can convincingly pass as human participants, and we’ve taken that seriously.

Our team ran a controlled study pitting 125 verified human participants against five AI agents (GPT, Claude, Gemini, Perplexity, and a custom adversarial agent built in-house to evade detection). We embedded 7 detection methods into a standard Qualtrics survey to see what actually works.

The results:

*Prolific's bot authenticity check: 100% accuracy (100% sensitivity, 100% specificity)
*Mouse tracking: 95% accuracy (promising, but requires per-survey calibration)
*Qualtrics reCAPTCHA: 94% accuracy, but higher false-positive rate
*Cognitive traps (visual illusions): 100% sensitivity, but 29% false-positive rate on real humans
*Consistency checks: 63% accuracy
*Classic attention checks: 60% accuracy (agents actually outperformed humans on these)

The takeaway is that traditional quality checks were designed for inattentive humans, not AI agents. They're not fit for purpose here. Even our custom adversarial agent (built specifically to avoid detection) was caught by our authenticity checks.

Shared by Jimmy Breeze, like and 1 save total

Testing normalization process theory in a randomized trial of mental health clinics implementing digital measurement-based care | Implementation Science | Springer Nature Link

https://www.linkedin.com/posts/carl-may-24b89885_this-very-important-paper-by-nathaniel-williams-share-7430521970720632832-_zSS/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAJrHvsBTNSt86Fs5GFHTEcIGOuwz83Dslg

This very important paper by Nathaniel Williams, Gregory Aarons, and colleagues in the US offers independent and unambiguous confirmation that implementation strategies that support the mechanisms described in normalization process theory (coherence, cognitive participation, collective action, reflexive monitoring), "predict greater sustainment of complex health interventions over 16 months." It contributes to the growing body of evidence -- in the form of qualitative and quantitative empirical studies and evidence syntheses -- that suggest that the decision to use NPT, and NPT informed implementation strategies, in implementation research has more than a robust scientific basis. It also has strong catalytic validity.

Shared by Jimmy Breeze, like and 1 save total

“Realist” evaluation and the more general “theory‑based” evaluation umbrella often present themselves as distinct forms of evaluation; however, their core assumptions that programmes work (or not)… | Andi Fugard | 27 comments

“Realist” evaluation and the more general “theory‑based” evaluation umbrella often present themselves as distinct forms of evaluation; however, their core assumptions that programmes work (or not) through underlying causal mechanisms that cannot be directly observed are restatements of what sciences already presuppose. It’s rare indeed for an evaluation not to begin with a theory of change; the issue is not whether that’s necessary but how good theories are. We’ve all seen boxology lists in the logic model style.

The full range of evaluation methods can test mechanisms rather than only whether an outcome has been shifted by a programme and current practice involves qual and quant evidence, e.g., the latter using mediation tests. The logic of “what works best for whom in what context” predates “realist” approaches by a few decades and is used across all methods. Similarly, there’s nothing uniquely “realist” about context-mechanism-outcome triads: it’s a useful way to move from boxology to theory prose.

Advocating “realist” or “theory‑based” evaluation is essentially advocating “scientific evaluation” – a kind of tautology. It may be helpful for some readers to learn about separating ontology from epistemology (the map is not the territory, to paraphrase Korzybski), but I’d suggest that’s best left to the first chapter of an introduction to evaluation rather than something that needs to be restated in every report. There are examples of fields that stick on the “science” term: neuroscience and cognitive science are two examples. But what would unscientific policy evaluation look like? What does it get called? (That’s a genuine question – I have a blog post brewing.)

Shared by Jimmy Breeze, like and 1 save total

Claude Container: A Complete Beginner's Guide

PGP guide to Claude Container
-
What Is This and Why Should I Care?

Claude Code is an AI assistant that can read, write, and run code directly on your computer. It is incredibly powerful — it can install software, create files, run scripts, and modify your system. That power is also what makes it risky. If you ask Claude to "clean up my project" and it misunderstands, it could delete files you care about. If a script goes wrong, it could affect other things on your machine.

Claude Container solves this problem. It gives Claude its own isolated computer (a "container") to work in. Claude gets full, unrestricted access inside that container — it can install anything, run anything, break anything — and none of it touches your real computer. Your project files are shared between the container and your computer through a single folder, so you always have access to the work Claude produces.

Think of it like giving Claude its own office with a desk, tools, and a copy machine. Claude can make a mess in that office all day long, and your office stays clean. The copy machine (the shared folder) lets you exchange documents back and forth.

Shared by Jimmy Breeze, like and 1 save total