Evidence Framework

How we evaluate the evidence

Every recommendation on evidage is tagged with an explicit evidence level (Level 1–4). “Works” / “Doesn’t work” is a poor binary — the actual research lives on a spectrum. A claim like “red wine is healthy” looks very different when it comes from a test-tube study, a 10-year cohort of 50,000 people, or a double-blind randomized trial. We make that difference visible at the top of every article.

The 4-level scale — at a glance

Level Evidence strength What it is based on
Level 1 Strongest 🥇 Systematic reviews / meta-analyses of large RCTs, reproduced across independent groups
Level 2 Strong 🥈 Multiple RCTs or large prospective cohorts with consistent findings
Level 3 Moderate 🥉 Observational studies or small RCTs — suggestive, not yet decisive
Level 4 Limited 🔬 Animal models, in vitro studies, case reports — no or minimal human trials

Important distinction: “evidence strength” and “recommendation strength” are different things. Smoking cessation has Level 1 (Strongest) evidence, which leads to a strong recommendation against smoking. We tag [Level X (evidence strength)] and [recommendation] separately throughout the site.

The 4 levels in detail

Level 1 — Strongest

Supported by systematic reviews or meta-analyses of large randomized controlled trials (RCTs), replicated across independent research groups. Examples: smoking cessation, regular aerobic exercise, the Mediterranean dietary pattern, 7–9 hours of sleep.

Level 2 — Strong

Supported by multiple RCTs or large prospective cohorts showing consistent findings. Examples: olive oil, yogurt, oily fish, coffee (moderate intake), green tea, matcha, vinegar. Most of our “Foods in Focus” articles sit here.

Level 3 — Moderate

Supported by observational studies or small RCTs — suggestive but not decisive. “Worth trying, but do not overstate” applies here.

Level 4 — Limited

Supported only by animal studies, in vitro work, or case reports. Interesting as a topic, but not yet ready to act on. Most “latest miracle supplements” and “new biohacks” live here.

The 4 evaluation axes

  • 1. Effect size — quantified where possible
  • 2. Reproducibility — single group vs. multiple independent confirmations
  • 3. Side effects / risks — benefits and harms always reported together
  • 4. Study population — healthy adults vs. patients, age, sex, ancestry

How to read our articles

Look at the [Level X (Strongest / Strong / Moderate / Limited)] tag at the top first. “Strongest” and “Strong” are worth implementing now. “Moderate” is wait-and-see. “Limited” is interesting but not ready to act on.

Last updated: May 2026