Zotero + Claude Project: literature synthesis workflow for systematic reviews
The Reality of AI in Academic Research
LLMs are fluent at producing text that looks like sound science even when the underlying claims are wrong. The failure I see most often is the model rounding neutral or ambiguous evidence up to a confident conclusion — exactly the kind of error that survives a casual read and slips into a literature summary unnoticed. Structural fluency is not scientific validity: an abstract that reads like a real abstract can still misrepresent the evidence it summarizes. That is why I treat every AI-assisted step in a systematic review as something to be verified, never trusted on its face.
When I sit down to write, I don't want magic. I want reliability. I've tested dozens of tools, and most fail when subjected to the rigors of peer review. I use my own workflows to ensure that the data I present is accurate, verifiable, and free of hallucinations.
The Bottleneck of Systematic Reviews
I remember manually screening papers for my early meta-analyses — weeks of staring at abstracts. Screening is where most of the time goes, and it is also where a structured AI-assisted workflow makes the biggest difference, provided you keep a human in the loop for the borderline calls. I built a workflow that is reproducible, not based on luck. I rely heavily on the integration between Zotero and Claude.
Step 1: Capture and Organize
I start entirely in Zotero. I pull everything from PubMed and Embase into a dedicated collection. I clean the metadata. I make sure every entry has an abstract. I export the collection as a structured file. I do not let AI touch the raw search process because I need to ensure PRISMA compliance. I have to be able to defend my search strategy to reviewers.
Step 2: Synthesis and Screening
I upload the structured export into a Claude Project. I use the custom instructions feature to enforce strict screening criteria. I tell Claude to evaluate each abstract against my specific inclusion and exclusion criteria. I ask it to format the output as a table, explicitly citing the reason for exclusion if a paper is rejected. I then manually review the margins. I always check the borderline cases myself. I never blindly trust the output. I discuss the broader context of this approach in /blog/ai-research-stack-5-tools-that-save-time.
Step 3: Verification
Once the final list is generated, I use CiteCheck and the AVR platform at aiforacademic.world/tools to verify the data extraction. I need to be absolutely certain that the effect sizes I pull are correct. I wrote about the dangers of hallucination in /blog/citation-hallucination-ai-writing. I also use structured prompts to help draft the final methodology section, which you can read about in /blog/10-claude-prompts-paper-writing.