Research Workflow – Part 7: Research Interpretation Issues

Interpretation is often treated as a soft skill.

Something that happens after the “real work” is done. In reality, interpretation is where the meaning of a study is either preserved—or distorted.

Many flawed papers are not wrong in their methods. They are wrong in what they claim their results mean.

Statistical correctness is not interpretive correctness

A result can be statistically sound and still be misleading. Confidence intervals can be narrow. P-values can be convincing. Models can converge cleanly. And yet, the conclusion may still overreach.

This happens when interpretation ignores:

the limits imposed by design
the blind spots created by measurement
the distortions introduced by bias

Statistics answer narrow questions. Interpretation answers broader ones. Confusing the two is one of the most common—and costly—errors in research writing.

Example: When my statistics were right but my interpretation wasn’t

In my ARM imaging study, I found that ultrasound PR distance correlated strongly with surgical approach (p<0.001, ROC AUC 0.85). My first draft conclusion: “Ultrasound PR distance accurately predicts surgical complexity in anorectal malformations.”

Statistically correct. The correlation existed. The numbers were solid.

My supervisor crossed it out: “What about patients with normal PR distance but complex fistula anatomy? Your ultrasound didn’t see those. Are you saying the measurement predicts complexity, or that it predicts complexity in cases where the anatomy ultrasound can visualize?”

He was right. My data showed:

Ultrasound worked well when anatomy was clear on ultrasound
Ultrasound missed complexity when the determining factor (fistula type) wasn’t visible
The correlation reflected ultrasound’s strengths, not complete predictive validity

Revised conclusion: “Ultrasound PR distance correlates with surgical approach in cases with clearly visualized anatomy, but should be supplemented with contrast studies when fistula anatomy is uncertain.”

The statistics didn’t change. The interpretation became honest.

The pressure to say more than the data allow

Interpretive overreach rarely comes from arrogance. It comes from pressure. Pressure to:

justify the effort invested
make the study feel impactful
satisfy reviewers, supervisors, or journals
connect results to broader narratives

Under this pressure, language starts to drift:

associations begin to sound like causes
proxies are treated as direct measures
limitations are acknowledged, then quietly ignored

The study does not collapse dramatically. It erodes subtly. How language drifts without you noticing

I’ve caught myself making these shifts:

What the data showed:
“Low sacral ratio correlates with higher malformation complexity”

What I wrote first:
“Sacral development influences malformation severity”

The shift is subtle: correlation → influence. One describes association; the other implies mechanism.

Another example:

What I measured:
“Presence of fistula on contrast study”

What I almost wrote:
“Fistula anatomy determines surgical approach”

Again: presence → determines. I measured whether a fistula was there, not whether it caused surgical decisions. Surgeons consider many factors.

The pattern:

Data: Shows X correlates with Y
Draft: X affects Y
Final pressure: X causes Y

Each step feels like small refinement. Cumulatively, they transform description into overreach.

Prevention: Write your results section first. Write conclusions last. If conclusions claim more than results show, something has drifted.

Research interpretation should be anticipated, not improvised

Strong interpretation does not begin in the discussion section. It begins much earlier.

Researchers who interpret well often ask, before collecting data:

What conclusions would be tempting—but unjustified?
What alternative explanations will remain plausible no matter what the results show?
Which findings would genuinely change understanding, and which would simply confirm expectations?

By anticipating these questions early, interpretation later becomes more disciplined—and more credible.

Questions I now ask before data collection

“What will I be tempted to claim if results are positive?”

For my ARM study, I knew I’d be tempted to say: “Ultrasound can replace contrast studies.”

By anticipating this temptation, I designed the study to include contrast studies as comparison. This forced me to see where ultrasound agreed (rectal position) and where it failed (fistula detail).

Result: I couldn’t claim replacement because my own data showed complementarity.

“What alternative explanations will I have no way to rule out?”

Example: If PR distance correlates with surgical approach, is it because:

Distance actually predicts surgical difficulty?
Distance correlates with overall malformation severity (confounding)?
Surgeons use distance as one factor among many (partial explanation)?
Institutional protocols vary by distance cutoffs (practice variation)?

My observational design couldn’t separate these. Anticipating this prevented me from claiming causation I couldn’t establish.

“What would a skeptical reviewer ask?”

I role-played reviewer questions:

“Did you validate ultrasound measurements against surgical findings?” → Added intraoperative validation
“How did you handle cases where ultrasound was unclear?” → Created “indeterminate” category instead of forcing classification
“What about inter-observer reliability?” → Added blinded second measurements

These questions shaped data collection, not just discussion writing.

Saying less, but saying it precisely

One of the clearest markers of research maturity is restraint. Experienced researchers are comfortable saying:

“This suggests, but does not demonstrate…”
“This finding applies only within these conditions…”
“We cannot rule out alternative explanations…”

This is not weakness. It is respect—for the data, and for the reader.

Ironically, studies that claim less often carry more weight. Their conclusions invite trust rather than skepticism.

The conclusion I was proud to narrow

Original ambitious claim:
“Ultrasound provides accurate preoperative classification of anorectal malformations and can guide surgical planning.”

This sounded impressive. It suggested clinical practice could change based on my study.

What the data actually supported:
“Ultrasound PR distance correlates with surgical approach in tertiary referral cases with moderate-to-high malformations where anatomy is clearly visualized. It supplements but does not replace contrast studies for complete anatomical classification.”

This is narrower. More conditional. Less impactful-sounding.

But two things happened:

Reviewers trusted it. One wrote: “The authors appropriately limit claims to what their data support rather than overgeneralizing.”
It opened collaboration. A radiologist reached out: “Your honest limitations are exactly why we want to design a multi-center prospective study with you.”

The narrower claim invited engagement. The broad claim would have invited skepticism or dismissal.

The lesson: Precision in interpretation signals integrity. Overreach signals inexperience.

Using AI to check interpretation drift

AI can help catch interpretive overreach—if you use it deliberately.

Prompt I use before finalizing conclusions:

Here are my study results: [paste results section]

Here are my conclusions: [paste conclusion section]

Are there places where my conclusions claim more than the results show?
Where does my language shift from describing associations to implying causation?
What alternative explanations am I not acknowledging?

What this catches: Language drift you’re too close to see, causal claims embedded in subtle phrasing, conclusions that leap beyond data.

What it misses: Domain-specific reasonableness (AI doesn’t know surgical decision-making), appropriate inference levels for your field, strategic choices about emphasis.

Use AI as a mirror to see your own thinking, not as an authority on what’s acceptable.

Closing the workflow loop

By the time interpretation is reached, the shape of the study has long been fixed.

Design has set the boundaries. Measurement has defined what is visible. Bias has shaped what is distorted.

Interpretation is simply where all of these earlier decisions become visible to the reader.

This is why research workflow is not linear. It is cumulative.

Each step constrains the next. And no step can be treated as an afterthought.

How earlier decisions showed up in my interpretation

Design choice (retrospective, single-center):
→ Interpretation constraint: “Findings may not generalize to other practice settings”

Measurement choice (PR distance only initially):
→ Interpretation constraint: “Cannot assess complete anatomical classification”

Bias recognized (referral enrichment for complex cases):
→ Interpretation constraint: “Performance in simple cases remains unknown”

Each earlier decision created an interpretive boundary I couldn’t cross honestly.

But because I recognized these boundaries early, I didn’t struggle with discussion writing. I knew what I couldn’t claim before I tried to claim it.

The principle: Good interpretation is 80% decided before analysis begins.

A framework for disciplined interpretation

Before writing discussion, ask:

What did I actually measure?
(Not what I hoped to measure—what did the data capture?)
What would I need to have measured to claim X?
(If I want to claim causation, did I manipulate variables? If I want to claim generalizability, did I sample broadly?)
What remains unexplained even if my hypothesis is correct?
(Alternative mechanisms? Confounders? Context-dependency?)
Would I believe this claim if someone else made it with my exact same data?
(Removes ego from evaluation)

Then write conclusions that:

Stay within those boundaries
Acknowledge what’s uncertain
Specify the population/context where claims apply
Name alternative explanations explicitly

This isn’t about being pessimistic. It’s about being precise.

Where this leaves us

Good research does not come from mastering isolated techniques.

It comes from making a series of honest decisions, each informed by the limits of the previous one.

This is what a functional research workflow looks like:

clarity before ambition
feasibility before elegance
restraint before exaggeration

Everything else—tools, software, even AI—comes later.

What this series has covered

We’ve walked through the cumulative nature of research decisions:

Research questions: Must be answerable with available resources
Study design: Sets boundaries for what you can claim
Measurement: Defines what becomes visible
Bias: Shapes what gets distorted
Interpretation: Makes all prior decisions transparent

At each stage, workflow thinking means asking: “What does this decision make possible—and impossible—later?” This discipline doesn’t make research easier. But it makes it more honest.

And in the end, honesty is what allows research to be trusted—and built upon.

Next steps:

The workflow principles covered in this series apply regardless of tools. Next, we face the difficulties in academic writing, especially for non-natives. I will analyze and solve this in the next series, as well as introduce tools can support better workflow—if chosen and used appropriately.

In the next series, we’ll examine how AI tools fit into academic writing and research processes: not as replacements for thinking, but as instruments that amplify careful workflow when used with judgment.

Further resources:

For templates on anticipating interpretation challenges and checking conclusion validity, see the Research Workflow section.

Research Workflow – Part 7: Interpretation Is Where Most Research Quietly Breaks