ERSA: Evidence Quality, Bias, and How They Affect Theory Ratings

Part 1: The Evidence Quality Hierarchy

The quality of evidence matters enormously in determining where a theory sits on the ERSA scale. Not all evidence is created equal. A single high-quality randomized controlled trial (RCT) provides stronger evidence than 50 anecdotal case reports.

The Evidence Pyramid: From Strongest to Weakest

                    /  Systematic Reviews & Meta-Analyses (SR/MA)  \
                   ╱                                                ╲
                  ╱         Randomized Controlled Trials (RCTs)      ╲
                 ╱                                                    ╲
                ╱     Well-Designed Cohort & Case-Control Studies      ╲
               ╱                                                        ╲
              ╱      Lower Quality Observational Studies                 ╲
             ╱                                                            ╲
            ╱              Case Series, Case Reports                       ╲
           ╱                                                                ╲
          ╱__________________Expert Opinion & Anecdotes______________________╲

Study Design Quality Rankings

Highest Quality (Most Resistant to Bias)

  1. Systematic Reviews with Meta-Analysis (SR/MA)

    • Combines multiple high-quality studies
    • Rigorous inclusion/exclusion criteria
    • Accounts for heterogeneity and publication bias
    • Quality: Can be High, Moderate, or Low depending on included studies
  2. Randomized Controlled Trial (RCT)

    • Random allocation minimizes selection bias and confounding
    • Blinding reduces observer bias
    • Can be High (well-designed) or Low (poorly designed) quality
    • Gold standard for intervention studies
  3. Well-Designed Prospective Cohort Study

    • Follows participants over time
    • Can measure dose-response relationships
    • Lower risk of selection bias than case-control
    • Better than retrospective designs
  4. Well-Designed Case-Control Study

    • Useful for rare diseases
    • Retrospective nature increases bias risk
    • More prone to recall bias than cohort studies

Medium Quality

  1. Lower-Quality Observational Studies
    • Cross-sectional surveys
    • Retrospective analyses
    • Studies with inadequate control of confounding
    • High risk of selection bias

Lower Quality (Most Prone to Bias)

  1. Case Series / Case Reports

    • Describes pattern across cases
    • No control group
    • High susceptibility to bias
    • Useful for hypothesis generation but weak confirmation
  2. Expert Opinion

    • Based on experience and judgment
    • No systematic data collection
    • Prone to cognitive biases
    • Lowest evidence level

Part 2: How Evidence Quality Affects ERSA Ratings

The same number of studies can produce very different ERSA levels depending on study quality.

Example 1: A Hypothesis With 10 Studies

Scenario A: All High-Quality RCTs

  • 10 randomized controlled trials
  • All well-designed, low risk of bias
  • Sample sizes adequate (500+ participants each)
  • Results consistent (all show effect in same direction)
  • Effect sizes moderate to strong

ERSA Impact:

  • All studies in “Highest Quality” category
  • Evidence composite score: 32-36/36
  • ERSA Likely: 5.5-6.5 (Robust theory with predictive power)
  • Interpretation: “The evidence is strong and consistent across multiple well-designed trials”

Scenario B: Mix of Low-Quality Studies

  • 10 observational studies
  • Small sample sizes (20-50 participants)
  • High risk of confounding (few variables controlled)
  • Results mixed (some support, some contradict)
  • Large effect sizes reported (suspicious)

ERSA Impact:

  • Most studies in “Lower Quality” category
  • Evidence composite score: 12-16/36
  • ERSA Likely: 3.0-3.5 (Emerging theory, mixed evidence)
  • Interpretation: “The evidence is weak and inconsistent; confounding likely explains apparent effects”

Scenario C: Single High-Quality RCT + 9 Low-Quality Studies

  • 1 well-designed RCT showing no effect
  • 9 observational studies showing effect

ERSA Impact:

  • Highest quality study contradicts others
  • Evidence composite score: 18-22/36
  • ERSA Likely: 3.5-4.0 (Emerging, needs replication in high-quality designs)
  • Interpretation: “Strongest evidence contradicts the apparent pattern; weak evidence may reflect bias not true effect”

Part 3: Key Sources of Bias in Evidence

Bias Type 1: Selection Bias

What it is: The way study participants are chosen systematically differs between groups

Example: Hand-washing study in 1800s

  • Hospital A: Women giving birth who agreed to hand-washing protocol vs. those who refused
  • Selection Bias: Women who agreed to hand-washing might have been more health-conscious generally
  • Therefore: Improved outcomes might reflect their health-consciousness, not hand-washing

How it affects ERSA:

  • Selection bias detected → Bradford Hill “Consistency” score reduced (2-3/4 instead of 3-4/4)
  • If selection bias severe → ERSA might drop 0.5-1.0 levels
  • Example: Hand-washing study moved from ERSA 3.5 → ERSA 3.0 after selection bias identified

How to detect:

  • Were study groups truly comparable at baseline?
  • Were there systematic differences in who participated?
  • Did some participants drop out? (Differential dropout = selection bias)

Bias Type 2: Confounding

What it is: A third variable influences both the exposure and outcome, creating false association

Example: Coffee consumption and heart disease

  • Observation: Coffee drinkers have higher heart disease rates
  • Confounding variable: Smokers drink more coffee AND smoking causes heart disease
  • True situation: Coffee doesn’t cause disease; smoking does

How it affects ERSA:

  • Uncontrolled confounding → Bradford Hill “Specificity” and “Strength” scores reduced
  • If major confounder not measured → ERSA drops 0.5-1.5 levels
  • If confounder identified and controlled → ERSA impact minimized

Example from real science:

  • Early studies suggested hormone replacement therapy (HRT) prevented heart disease
  • Confounding: Women who took HRT were healthier, wealthier, had better overall health behaviors
  • Large RCT showed no benefit when confounding controlled
  • ERSA of HRT benefits dropped from ~5.0 → 2.5 after realization

How to detect/control:

  • Randomization (RCTs eliminate confounding through randomization)
  • Statistical adjustment (measure confounders and mathematically control for them)
  • Stratification (separate analysis by confounder levels)
  • Matching (select participants similar on confounder)

Bias Type 3: Information Bias / Measurement Error

What it is: Inaccurate or biased measurement of exposure, outcome, or confounding variables

Subtypes:

3a. Recall Bias (Retrospective Studies)

  • Asking people to remember events from the past
  • People with disease often remember exposures better (trying to explain their illness)
  • People without disease forget details

Example:

  • “Did you use mobile phones heavily 10 years ago?”
  • People with brain cancer: “Yes, I remember using it constantly”
  • People without cancer: “I don’t remember, maybe sometimes”
  • Bias: Apparent link between phones and cancer created by differential recall

Impact: ERSA drops 0.5-1.0 if recall bias likely

3b. Observation/Measurement Error

  • Equipment malfunctions
  • Observer bias (seeing what you expect to see)
  • Non-standardized measurement procedures

Example:

  • Measuring blood pressure without proper technique
  • Blood pressure varies with time of day, stress level, arm position
  • Measurement error introduced that makes associations appear stronger or weaker than true value

Impact:

  • Measurement error typically REDUCES ability to detect true effects
  • But can sometimes INCREASE apparent effects if error is systematic
  • ERSA drops if measurement error likely (0.2-0.5 levels)

3c. Outcome Assessment Bias

  • The person measuring outcomes knows which group participant is in
  • Unconsciously biases measurement toward expected result

Example:

  • Teacher assesses whether a “gifted” child performed well vs. “typical” child
  • Same performance rated higher for gifted child
  • Unblinded assessment introduces bias

Impact: ERSA drops 0.3-0.8 if outcome assessor was unblinded

How to detect:

  • Was measurement standardized?
  • Were outcome assessors blinded to treatment assignment?
  • Was there quality control on measurement?

Bias Type 4: Publication Bias

What it is: Published literature is biased toward positive results; negative results never get published

Why it happens:

  • Journals preferentially publish positive findings
  • Researchers more likely to submit positive studies
  • Funding agencies emphasize positive outcomes

Example:

  • Suppose 50 researchers test if homoeopathy works
  • 45 find no effect (negative results)
  • 5 find apparent positive effect (by chance)
  • Only the 5 positive studies get published
  • Reader sees 5/5 published studies as positive
  • But true success rate: 5/50 = 10% (just chance)

How it affects ERSA:

  • If publication bias likely → Bradford Hill “Consistency” score reduced
  • ERSA drops 0.5-1.5 depending on bias magnitude
  • Systematic reviews/meta-analyses actively search for unpublished studies to counter this

How to detect:

  • Funnel plots (graphical method to detect asymmetry)
  • Calculate “file drawer number” (how many unpublished negative studies would be needed to change conclusion?)
  • Search for unpublished studies (dissertations, conference presentations, registered trials)

Bias Type 5: Allocation Concealment Failure

What it is: Researchers or participants know in advance which treatment group they’ll be in, allowing manipulation

Example:

  • Study predicts which children will benefit most from intervention
  • Researchers unconsciously assign these “best candidates” to treatment group
  • Positive outcomes reflect initial selection, not intervention

Impact:

  • Major threat to RCT validity
  • ERSA drops 1-2 levels if allocation not properly concealed
  • Example: RCT without allocation concealment often no better than observational study

Part 4: GRADE Framework: How to Adjust Evidence Quality

GRADE (Grading of Recommendations Assessment, Development and Evaluation) provides systematic approach to adjusting evidence quality based on specific factors.

GRADE Domains for DOWNGRADING Evidence

Domain 1: Risk of Bias

  • Were participants randomly allocated? (RCTs)
  • Was allocation concealed?
  • Were outcome assessors blinded?
  • Was blinding possible? (Some outcomes objective, don’t need blinding)
  • Did participants complete study? (Attrition/dropout bias)

Downgrade by 1 level: Some limitations in study design/execution Downgrade by 2 levels: Serious/multiple limitations

Example Application:

  • High-quality RCT with well-designed methodology: Risk of bias = NO downgrade
  • RCT without outcome assessor blinding on objective outcome: NO downgrade (blinding not needed for objective measurement)
  • RCT without allocation concealment: Downgrade by 1 level
  • RCT with 40% dropout rate: Downgrade by 2 levels (serious bias risk)

Domain 2: Inconsistency (Variability Across Studies)

  • Do results vary widely between studies?
  • Is variation explained by study-level factors?

Downgrade by 1 level: Moderate inconsistency; some variation but general direction consistent Downgrade by 2 levels: Serious inconsistency; studies contradict each other

Example Application:

  • 5 RCTs showing effect size of 1.2, 1.3, 1.1, 1.4, 1.2: NO downgrade (consistent)
  • 5 RCTs showing effect size of 0.5, 1.5, 2.0, 0.3, 1.8: Downgrade by 1 level (moderate inconsistency)
  • Some studies show benefit, others show harm: Downgrade by 2 levels (serious inconsistency)

Domain 3: Indirectness (Does Evidence Answer the Question?)

  • Do studies use same populations, interventions, outcomes as clinical question?
  • Are results from different setting that might not generalize?

Downgrade by 1 level: Somewhat indirect (studies in hospitals but question is community) Downgrade by 2 levels: Very indirect (studies in young adults but question is elderly)

Example Application:

  • Question: Does aspirin prevent heart attack in women?
  • Evidence: Studies mostly in men
  • Downgrade by 1 level (results may not apply to women; sex differences in response possible)

Domain 4: Imprecision (Wide Confidence Intervals)

  • Are confidence intervals wide, crossing the line of no effect?
  • Sample size adequate?
  • Number of events adequate (for rare outcomes)?

Downgrade by 1 level: Moderate imprecision; confidence intervals somewhat wide Downgrade by 2 levels: Serious imprecision; confidence intervals very wide or cross line of no effect

Example Application:

  • Effect size 1.5 with 95% CI [1.4-1.6]: High precision → NO downgrade
  • Effect size 1.5 with 95% CI [0.8-2.2]: Moderate precision (crosses toward small effects) → Downgrade by 1 level
  • Effect size 1.5 with 95% CI [0.2-2.8]: Low precision (crosses into harm) → Downgrade by 2 levels

Domain 5: Publication Bias

  • Evidence of bias toward publication of positive results?

Downgrade by 1 level: Suspected publication bias Downgrade by 2 levels: Likely publication bias (more than half of studies probably unpublished)

Example Application:

  • Funnel plot shows missing negative studies: Downgrade by 1 level
  • Search for unpublished studies reveals 20 unpublished negative studies vs. 5 published positive: Downgrade by 2 levels

GRADE Domains for UPGRADING Evidence (Primarily Non-Randomized Studies)

RCTs typically start as “High” and are downgraded. Non-randomized studies typically start as “Low” but can be upgraded.

Domain 1: Strength of Association

  • Is the effect size very large (2-fold increase) or very large (3-fold increase)?
  • Large effects are less likely to result from confounding

Upgrade by 1 level: Large effect (2-fold or greater) Upgrade by 2 levels: Very large effect (3-fold or greater)

Example:

  • Observational study shows smoking increases lung cancer risk 10-fold
  • This very large effect is unlikely to result from residual confounding
  • Upgrade by 2 levels (from Low → Moderate evidence)

Domain 2: Dose-Response Gradient

  • Does increasing exposure lead to increasing effect?
  • Dose-response is strong evidence for causation

Upgrade by 1 level: Dose-response demonstrated

Example:

  • No cigarette smoking: 1% lung cancer rate
  • 1-10 cigarettes/day: 5% rate
  • 11-20 cigarettes/day: 12% rate
  • 20+ cigarettes/day: 25% rate
  • Clear dose-response → Upgrade evidence

Domain 3: Opposing Plausible Confounding or Bias

  • Are there plausible confounders that would work AGAINST the observed association?
  • If confounders would create bias toward null rather than away, observed effect is stronger

Upgrade by 1 level: Plausible opposing confounding present

Example:

  • Observation: People who exercise have lower heart disease rates
  • Confounding worry: Wealthy people exercise more AND wealthy people have better healthcare
  • But: Wealth should improve health outcomes, making exercise appear LESS protective (bias toward null)
  • Fact that exercise still appears protective despite this bias → Upgrade evidence

Part 5: Detailed Examples of How Evidence Quality Affects ERSA

Example 1: Cranberry Juice for Urinary Tract Infections (UTIs)

Initial Claims (ERSA 1.5)

  • Anecdotal reports: “I drank cranberry juice and my UTI went away”
  • Mechanism plausible: Cranberries contain proanthocyanidins that prevent bacterial adhesion
  • Evidence: Case reports, no studies yet

Bradford Hill Profile (ERSA 1.5):

  • Strength 0/4 (anecdotal only)
  • Consistency 0/4 (no systematic studies)
  • Specificity 1/4 (specific claim but not tested)
  • Plausibility 2/4 (mechanism proposed)
  • Coherence 1/4 (doesn’t integrate with existing knowledge)

Early Studies (ERSA 2.5)

  • Several small studies conducted
  • Sample sizes: 20-100 women
  • Methods: Observational, not randomized
  • Results: Some found benefit, others found minimal effect
  • Problems: Selection bias (who chose cranberry?), confounding (other behaviors affecting UTI risk)

Bradford Hill Profile (ERSA 2.5):

  • Strength 1/4 (weak effects in some studies)
  • Consistency 1/4 (mixed results)
  • Specificity 1/4 (not clear who benefits)
  • Experiment 0/4 (no RCTs yet)
  • Plausibility 2/4 (mechanism still theoretical)

Higher-Quality Studies (ERSA 3.5-4.0)

  • Better-designed RCTs (200-400 participants)
  • Blinded design (participants didn’t know if getting cranberry or placebo)
  • Longer follow-up (6-12 months)
  • GRADE Assessment of these studies:
StudyDesignQuality IssuesGRADE Quality
Smith et al. (2015)RCTNo allocation concealment, 10% dropoutModerate
Johnson et al. (2017)RCTWell-designed, randomized, blinded, minimal dropoutHigh
Lee et al. (2016)Observational cohortNo randomization, potential confoundingLow
Meta-analysis (2020)SR/MA of 12 RCTsModerate heterogeneity, some publication biasModerate

Bradford Hill Profile with Quality Weighting (ERSA 4.0):

  • Strength 2/4 (modest effect when shown; high-quality RCTs show smaller effects than low-quality studies)
  • Consistency 2/4 (high-quality studies less consistent than low-quality)
  • Specificity 2/4 (works better for prevention than treatment; better in certain populations)
  • Experiment 3/4 (multiple RCTs now available, but not universally supportive)
  • Plausibility 2/4 (mechanism studied but not fully understood)

Key Insight About Quality:

  • Low-quality observational studies often showed larger cranberry benefits
  • High-quality RCTs showed smaller benefits
  • This discrepancy suggests low-quality studies had selection bias or confounding overestimating effect
  • ERSA 4.0 reflects that effect is real but smaller than initially appeared
  • Evidence quality → ERSA positioning

Current Consensus (ERSA 4.2)

  • Cranberry juice shows modest benefit for UTI prevention
  • Most useful in women with recurrent UTIs
  • Effect smaller than antacid

Example 2: Hormone Replacement Therapy (HRT) and Heart Disease

This is a dramatic example of how evidence quality completely changed the ERSA rating.

Initial Status (ERSA 6.0-6.5, 1990s)

  • Decade of observational studies
  • All showed: Women on HRT had 30-50% lower heart disease rates
  • Mechanism clear: Estrogen improves cholesterol, blood vessel function
  • Consensus: HRT recommended for menopausal women to prevent heart disease

Evidence Quality (ERSA 6.0 era):

  • Study Design: All observational (cohort studies)
  • Sample sizes: Large (50,000-100,000 women)
  • Risk of Bias: MODERATE-TO-HIGH
  • Confounding: Wealthy women more likely to use HRT; wealthy women have better healthcare
  • Selection bias: Health-conscious women chose HRT
  • GRADE Assessment: LOW to MODERATE quality
    • Risk of Bias: Downgrade 2 levels (severe confounding likely)
    • Inconsistency: No (all studies showed same direction)
    • Indirectness: No (direct population)
    • Imprecision: No (large studies, narrow confidence intervals)
    • Publication bias: Possible (studies showing no effect less likely published)

Bradford Hill Profile (ERSA 6.0, but was overestimated):

  • Strength 3/4 (large effect size observed)
  • Consistency 3/4 (multiple studies replicated)
  • Specificity 3/4 (specific effect in women 45-70 years old)
  • Experiment 0-1/4 (no RCTs available)
  • Plausibility 3/4 (mechanism well-understood)
  • Coherence 3/4 (fits with cardiovascular physiology)

Critical Problem: Experiment score was 0/4 — no high-quality RCTs confirming observational findings

The Game-Changer: The WHI RCT (2002)

Women’s Health Initiative was large, multi-center RCT:

  • 16,000 women randomized
  • Half received HRT; half received placebo
  • Allocation concealed; participants blinded
  • 5-year follow-up
  • GRADE Quality: HIGH

Results:

  • Contrary to observational studies
  • HRT did NOT prevent heart disease
  • Actually showed slight increase in cardiovascular events
  • Breast cancer risk increased

Evidence Quality Recalibration:

  • Observational studies had severe confounding (wealthy, health-conscious women chose HRT)
  • These confounders → better health outcomes through multiple pathways, not HRT
  • High-quality RCT removed confounding through randomization
  • Revealed truth: HRT doesn’t prevent heart disease

Bradford Hill Profile (ERSA 2.5-3.0, post-2002):

  • Strength 0/4 (effect disappeared in RCT; previous observational effect was illusory)
  • Consistency 1/4 (RCT contradicts observational studies; indicates bias in observational work)
  • Experiment 3/4 (large RCT showed no benefit)
  • Plausibility 2/4 (mechanism exists but effect doesn’t occur in reality; mechanism less important than empirical evidence)

ERSA Shift: ERSA 6.0 → ERSA 2.5 (a drop of 3.5 levels!)

The Lesson:

  • Observational studies had large effect sizes and multiple studies
  • But lack of high-quality experimental evidence was a critical weakness
  • GRADE framework correctly identified publication bias and confounding risk
  • One high-quality RCT overturned decade of observational research
  • Study DESIGN matters more than study QUANTITY for establishing causation

Example 3: Aspirin for Primary Prevention of Heart Disease

This example shows more nuanced evidence quality effects.

Initial Enthusiasm (ERSA 4.5, 1990s)

  • Observational studies: People taking aspirin had fewer heart attacks
  • Mechanism: Aspirin prevents platelet aggregation
  • Clinical logic: If aspirin works after heart attack, should work before

Early RCTs (ERSA 3.5-4.0)

  • Several RCTs in 1990s
  • Design: Reasonable but variable quality
  • Results: Modest benefit in some, no benefit in others
  • GRADE Issues: Inconsistency (studies contradict); publication bias (positive studies published, negative less likely)

Specific Quality Issues:

TrialSample SizeDesign IssuesResults
ISIS-2 (1988)17,000Post-MI population (not primary prevention)Massive benefit
PHS (1989)22,000Primary prevention, mostly menModest benefit
WOSCOPS (1998)6,000Primary prevention in high-risk menModest benefit
AAA Trial (2002)3,000Primary prevention in older adultsNO benefit
ARRIVE (2010)7,600Primary prevention in high-risk menNO benefit

The Problem:

  • Large trials in actual primary prevention (healthy people) showed WEAK or NO benefit
  • Benefits claimed in retrospective studies and trials in post-MI populations
  • Different populations show different results

Bradford Hill Assessment:

  • Specificity VERY IMPORTANT here: Effect depends heavily on who takes it
  • Post-heart-attack patients: Clear benefit (ERSA 7-8)
  • High-risk men: Modest benefit (ERSA 4-5)
  • Average healthy people: No detectable benefit (ERSA 1-2)

GRADE Downgrading:

  • Indirectness: Studies in post-MI don’t directly answer question about primary prevention
  • Inconsistency: Different populations show different results
  • Imprecision: In healthy populations, confidence intervals include possibility of harm

Final ERSA Status:

  • Aspirin in primary prevention: ERSA 3.5-4.0 (weak evidence for modest benefit in high-risk subgroups)
  • Aspirin in secondary prevention (post-MI): ERSA 7.5 (strong evidence for clear benefit)

The Lesson:

  • Specificity matters enormously
  • Same intervention can be ERSA 3 in one population and ERSA 7 in another
  • Low-quality evidence (observational) overestimated benefit
  • High-quality evidence (RCTs) in primary prevention showed limited benefit
  • Indirectness → ERSA downgrade

Example 4: Vitamin D Supplementation for Bone Health

This shows complexity of dose-response and mechanism relationships.

Phase 1 (ERSA 5.0, 1950s-1980s)

  • Strong mechanistic understanding: Vitamin D essential for calcium absorption
  • Dose-response clear: More vitamin D → better calcium absorption
  • Prediction: Low vitamin D → weak bones; supplementation should strengthen bones

Phase 2 (ERSA 5.5, 1990s-2000s)

  • RCTs show vitamin D prevents fractures in high-risk elderly
  • Clear dose-response: 800+ IU daily shows benefit
  • Multiple RCTs confirm

Bradford Hill Profile:

  • Strength 3/4 (risk reduction ~15-20%)
  • Consistency 3/4 (most RCTs show benefit)
  • Dose-Response 3/4 (clear gradient: more vitamin D → more benefit)
  • Experiment 3/4 (multiple RCTs supportive)

Phase 3 (ERSA 3.5-4.0, 2010s-present)

  • Mega RCTs testing high-dose vitamin D
  • Results: Disappointing
  • High-dose vitamin D (1000-4000 IU daily) doesn’t prevent fractures in general population
  • Only benefits in elderly/institutionalized
  • Publication bias: Positive studies more likely published

The Puzzle:

  • Mechanistic understanding still correct
  • Dose-response observed in some studies
  • But clinical benefit much smaller than predicted

Bradford Hill Recalibration:

  • Specificity 1/4 (doesn’t work as generally thought)
  • Consistency 1/4 (mixed results in general population)
  • Dose-Response: Complex (benefit plateaus; very high doses don’t help more)

GRADE Downgrading:

  • Publication Bias: Likely; many negative studies probably unpublished
  • Inconsistency: Results vary by population
  • Imprecision: Confidence intervals wide in large trials

Current ERSA:

  • Vitamin D for bone health in general population: ERSA 3.5
  • Vitamin D in elderly: ERSA 4.5-5.0
  • Vitamin D mechanism for calcium absorption: ERSA 8.5 (different from clinical benefit)

The Lesson:

  • Strong mechanistic understanding ≠ strong clinical benefit
  • Dose-response relationship ≠ effectiveness
  • Publication bias can exaggerate benefits
  • Large, well-designed RCTs sometimes contradict smaller positive studies
  • ERSA levels should reflect clinical utility, not just mechanism

Part 6: How to Improve Evidence Quality (Lowering ERSA Uncertainty)

For Observational Studies

1. Randomization is Ultimate Solution

  • Eliminates selection bias and confounding through randomization
  • RCT moving observational finding to confirmation level

2. Measure and Control for Confounders

  • Identify all potential confounders
  • Measure them in study
  • Use statistical methods to adjust
  • This can improve LOW-quality observational study

3. Blinding

  • Participants don’t know treatment
  • Outcome assessors don’t know treatment
  • Reduces expectation bias

4. Pre-registration

  • Register study protocol before data analysis
  • Prevents “p-hacking” (trying every analysis until one is significant)
  • Commits to primary analysis vs. exploratory analysis

For Clinical Trials

1. Adequate Sample Size

  • Power calculations ensure enough participants
  • Reduces imprecision
  • Reduces chance of false positive by random variation

2. Allocation Concealment

  • Researchers can’t predict group assignment
  • Prevents manipulation to put “best” participants in treatment

3. Blinding

  • Participants blinded (if possible)
  • Outcome assessors blinded
  • Analysts blinded (where feasible)

4. Intention-to-Treat Analysis

  • Include all randomized participants in analysis
  • Even if they didn’t complete treatment
  • Preserves randomization benefit
  • Prevents bias from differential dropout

5. Adequate Follow-up

  • Minimize dropout/attrition
  • Document reasons for dropout
  • Analyze whether dropout differential between groups

For Meta-Analyses

1. Comprehensive Search

  • Search published AND unpublished studies
  • Reduces publication bias
  • Contact researchers for unpublished data

2. Risk of Bias Assessment

  • Evaluate each study independently
  • Use standardized tools
  • Consider sensitivity analyses excluding high-bias studies

3. Heterogeneity Exploration

  • When results vary, explore why
  • Does effect differ by:
    • Population characteristics?
    • Intervention dose/duration?
    • Outcome definitions?
    • Study quality?

4. Transparency

  • Pre-register protocol
  • Follow PRISMA guidelines
  • Report all analyses

Part 7: The Interaction Between Evidence Quality and ERSA Level

Key Principle: ERSA Incorporates Evidence Quality

High ERSA levels (7+) essentially REQUIRE that supporting evidence be high quality. You cannot reach ERSA 7 with only case reports and anecdotes.

Quality Thresholds for Each ERSA Level

ERSA LevelMinimum Evidence Quality RequiredCan Reach With?Cannot Reach With?
ERSA -1Proven false by HIGH-quality evidenceHigh-quality RCTs showing contradictory effectAnecdotes; low-quality studies
ERSA 0Unfalsifiable or low-quality contradicting evidenceExpert consensus; poor-quality evidence(Cannot have “high-quality” evidence at ERSA 0 by definition)
ERSA 1-2Primarily anecdotal or very limited studiesCase reports; small observational studiesShould have high-quality evidence
ERSA 3-4Mixed-quality studies; emerging replicationMix of observational + some RCTsCannot be primarily high-quality RCTs (would be higher)
ERSA 5-6Multiple high-quality studies; predictive powerMultiple RCTs; SR/MA of high-quality studiesPrimarily anecdotes or case reports
ERSA 7-8Predominantly HIGH-quality evidence; real-world validationNumerous high-quality RCTs; confirmed predictions; practical applicationSingle study; primarily observational evidence
ERSA 9Overwhelming high-quality evidence; century+ validationExtensive high-quality RCTs; confirmed predictions; paradigm status; SR/MARecent discovery; limited replication
ERSA 10+Multiple independent lines of high-quality evidence; paradigm-shifting confirmationsConfirmed predictions so counterintuitive they constitute extraordinary evidenceContradicted by any credible evidence

The Paradox: More Studies Doesn’t Always Mean Higher ERSA

  • 100 case reports of positive effect → ERSA 2.0
  • 5 high-quality RCTs showing no effect → ERSA 2.5-3.0

Quality trumps quantity in ERSA assessment.

Quality-Evidence Trade-off

Sometimes high-quality evidence shows SMALLER effects than low-quality evidence. This is normal and expected:

Why:

  • Low-quality studies prone to bias that artificially inflates effect size
  • High-quality RCTs control bias; effect is smaller but more accurate

Example:

  • 20 observational studies show vitamin D reduces fractures 40%
  • 3 large RCTs show vitamin D reduces fractures 5%
  • The RCTs are more trustworthy because better designed
  • ERSA based on RCTs (~4.5) is more justified than based on observational studies (~5.5)

Part 8: Additional Examples Across Different Fields

Example A: Post-Scarcity Human Motivation (ERSA 1.0-1.5)

Evidence Characteristics:

  • No real post-scarcity societies exist for study
  • Observational data limited to small-scale gift economies, artist communities
  • Thought experiments and theoretical modeling
  • Some evolutionary psychology frameworks applicable

Evidence Quality Assessment:

  • Study Design: Primarily theoretical; limited observational
  • Sample Size: No direct data; analogies to small groups (500-5000 people)
  • Mechanism: Plausible but speculative
  • Confounding: Major issue — hard to separate “post-scarcity” from culture, group size, other variables
  • Publication Bias: Likely — positive findings about motivation changes more publishable

Bradford Hill Profile (ERSA 1.2):

  • Strength 0/4 (no real post-scarcity to measure)
  • Consistency 0/4 (no comparative studies)
  • Specificity 1/4 (prediction possible but untestable)
  • Temporality 0/4 (causation undetermined)
  • Plausibility 2/4 (fits some psychological theories)
  • Coherence 1/4 (conflicts with existing economics)
  • Experiment 0/4 (no experiments possible yet)

Why stuck at ERSA 1.0-1.5:

  • Falsifiability issue: Cannot test without creating actual post-scarcity society
  • No experimental framework
  • No dose-response could be measured
  • Proxy measures (small gift economies) have major confounding

Path to Higher ERSA:

  • Establish small-scale trial communities (difficult but possible)
  • Control for confounding variables (culture, group size, education, etc.)
  • Measure motivation systematically
  • Compare to matched control communities
  • Might reach ERSA 3-4 after 20+ years of study

Example B: 5G Cell Tower Brain Damage Claims (ERSA -0.5)

Evidence Characteristics:

  • Anecdotal reports of headaches, sleep problems near towers
  • No high-quality RCTs
  • Observational data confounded by:
    • Awareness bias (if you believe 5G harmful, you report symptoms more)
    • Nocebo effect (belief causes symptoms)
    • Pre-existing health conditions
    • Stress/anxiety about radiation

Evidence Quality Assessment:

  • Study Design: Primarily anecdotal; few case reports
  • Mechanism: Plausible from radiation physics perspective (but 5G non-ionizing radiation unlike cancer-causing ionizing)
  • Exposure measurement: Highly inaccurate (people don’t know actual exposure levels)
  • Outcome measurement: Subjective (headaches self-reported; not objectively measured)
  • Selection Bias: SEVERE (people who notice symptoms report; asymptomatic people don’t)
  • Publication Bias: MAJOR (positive anecdotes spread; negative reports ignored)

Bradford Hill Profile (ERSA -0.5):

  • Strength 0/4 (no objective effects measured in research)
  • Consistency 0/4 (studies contradicting each other; high-quality studies show no effect)
  • Specificity 0/4 (symptoms nonspecific; same symptoms from many causes)
  • Temporality 1/4 (temporal relationship unclear; symptoms predate towers in many cases)
  • Biological Gradient 0/4 (no dose-response shown; people far from towers report same symptoms)
  • Plausibility 1/4 (mechanism implausible; non-ionizing radiation insufficient for DNA damage; power levels too low)
  • Coherence 0/4 (contradicts physics; contradicts large body of research on non-ionizing radiation)
  • Experiment 0/4 (RCTs show no effect; blinded studies show no difference from sham)

High-Quality Evidence Contradicting Claim:

  • Large-scale blinded studies: Exposing people to 5G and to sham 5G; no difference in reported symptoms
  • Selection of placebo responders: When told “this might cause symptoms,” 50% report symptoms even in sham group
  • Dose-response absent: People reporting maximum symptoms live in areas with lowest 5G exposure
  • Biological mechanism absent: Established biological effects of non-ionizing radiation occur at power levels 1000x higher than 5G

ERSA Assessment:

  • Anecdotal claims: ERSA -0.5 (not consistent with high-quality evidence; selective reporting of confirmatory cases)
  • 5G safety in general: ERSA 8.0 (extensive research showing safety; no plausible mechanism for harm)

Why claims persist despite evidence:

  • Availability bias: News stories about 5G health scares more memorable
  • Confirmation bias: People interpret every health problem as 5G-related
  • Publication bias: Negative findings (no effect) less newsworthy than anecdotal positive claims
  • Healthy skepticism sometimes becomes conspiracy thinking when evidence is misunderstood

Example C: Light Bulb Conspiracy (ERSA 7.5-8.0)

The Claim: Manufacturers conspired to create short-lived light bulbs to force consumers to buy more

Evidence Quality: HIGH and multi-source

Historical Record:

  • 1920s: Phoebus Cartel formed by light bulb manufacturers
  • Goal: Limit light bulb lifespan to shorten replacement cycle
  • Documents: Explicit written records of conspiracy
  • Result: Light bulbs intentionally designed to fail after ~1000 hours
  • Duration: 1920s-1940s (until antitrust prosecution)

Evidence Quality Assessment:

  • Study Design: Historical documentary evidence (contracts, meeting notes, patent restrictions)
  • Primary Source: Cartel documents (highest quality evidence for historical claims)
  • Sample Size: N/A (complete documentation available)
  • Mechanism: Clear and documented
  • Publication Bias: Low (conspiracy was exposed through official prosecution)
  • Confounding: None (direct evidence of intent)

Bradford Hill Profile (ERSA 8.0 for historical claim):

  • Strength 4/4 (overwhelming documentary evidence)
  • Consistency 4/4 (consistent across multiple sources)
  • Specificity 4/4 (specific dates, people, actions documented)
  • Temporality 4/4 (clear timeline)
  • Biological Gradient N/A (not applicable to historical claim)
  • Plausibility 4/4 (aligns with known manufacturing practices)
  • Coherence 4/4 (explains observed pattern of light bulb life)
  • Experiment 4/4 (evidence from actual historical events)
  • Analogy 4/4 (similar conspiracy practices documented in other industries)

Why ERSA 8.0 rather than 9.0:

  • Historical claim (not ongoing theory)
  • No predictions about future (testing mechanism stopped post-prosecution)
  • Would be ERSA 9.0 if conspiracy continued and predictions about modern bulbs confirmed

Important Distinction:

  • This is PROVEN conspiracy (ERSA 8.0)
  • Different from unproven 5G conspiracy claims (ERSA -0.5)
  • Difference: Documentary evidence vs. anecdotal evidence

Example D: BlackRock Ownership Conspiracy (ERSA 4.5-5.0)

The Claim: BlackRock owns significant shares in most publicly traded companies; this represents dangerous consolidation

Evidence Quality: MIXED but largely HIGH for factual claim (ownership); LOWER for implications

Factual Component (Ownership): ERSA 8.5

  • BlackRock is world’s largest asset manager
  • Manages $10+ trillion in assets
  • Owns shares in ~95% of S&P 500 companies
  • Owns shares in all major competitors

Evidence Quality:

  • Source: SEC filings (highest quality, public record)
  • Mechanism: Index fund ownership (if you own S&P 500 index, you own bit of every company)
  • Verification: Public databases easily confirm holdings
  • Confounding: None (facts directly verifiable)

Bradford Hill Profile (ERSA 8.5 for ownership fact):

  • Strength 4/4 (documented in SEC filings)
  • Consistency 4/4 (consistent across multiple reports)
  • Specificity 4/4 (exact holdings documented)
  • Experiment 4/4 (empirically verifiable from public records)

Implications Component (Danger/Conspiracy): ERSA 3.5-4.5

The Claim: This ownership structure allows BlackRock to control companies unfairly

Evidence Quality: MODERATE but mixed

Factual Support:

  • BlackRock does influence corporate governance through voting
  • BlackRock has pushed for environmental, social, governance (ESG) criteria
  • These activities documented in proxy voting records

Evidence Against:

  • No evidence of hidden coordination between competing companies
  • Companies still compete aggressively in market
  • Stock prices still vary based on company performance
  • Other institutions have similar ownership patterns

Confounding Variables:

  • Index fund ownership is structural (inevitable result of indexing)
  • BlackRock doesn’t “choose” which companies to own (owns all S&P 500 companies)
  • ESG voting is transparent and publicly stated (not hidden conspiracy)

Mechanism Implausibility:

  • For conspiracy to work, would require BlackRock to coordinate competing firms
  • Competing firms have opposing interests
  • No mechanism for BlackRock to enforce coordination without detection
  • Antitrust law prohibits such coordination

Bradford Hill Profile (ERSA 4.0 for conspiracy implications):

  • Strength 1/4 (some influence documented, but effect size small compared to company competition)
  • Consistency 2/4 (mixed evidence; some cases show correlation, but causation unclear)
  • Specificity 1/4 (unclear what “control” means; competition appears intact)
  • Plausibility 2/4 (mechanism implausible given legal restrictions)
  • Coherence 1/4 (contradicts observed competition in markets; stock prices still vary)
  • Experiment 0/4 (no experimental evidence; can’t test counterfactual)
  • Mechanism 1/4 (mechanism for enforcement of alleged conspiracy unclear)

ERSA Assessment:

  • Ownership fact: ERSA 8.5 (proven; documented in SEC filings)
  • Conspiracy implications: ERSA 4.0 (plausible but unproven; alternative explanations better fit evidence)

The Distinction:

  • Proven conspiracy (light bulb cartel): ERSA 8.0 (documentary evidence)
  • Documented ownership (BlackRock): ERSA 8.5 (SEC filings)
  • Alleged conspiracy from ownership: ERSA 4.0 (plausible but insufficient evidence)
  • Unproven conspiracy (5G): ERSA -0.5 (contradicted by high-quality evidence)

Why Different:

  • Each claim requires different evidence quality threshold
  • Ownership claims require SEC documentation (high quality)
  • Conspiracy claims require evidence of coordination (not just ownership)
  • Harm claims require evidence of actual harm (not just mechanism)

Summary: Evidence Quality Framework for ERSA

Quick Reference: How to Evaluate Evidence Quality

  1. Study Design Quality:

    • High: SR/MA, RCTs, well-designed prospective cohorts
    • Medium: Less-well-designed observational studies
    • Low: Case series, anecdotes, expert opinion
  2. Risk of Bias Assessment (Using GRADE):

    • Selection bias: Random allocation, concealment, comparable groups
    • Confounding: Measured and controlled, or randomized
    • Information bias: Standardized measurement, blinding
    • Publication bias: Comprehensive search, funnel plot analysis
  3. Consistency:

    • Multiple independent studies reaching same conclusion
    • Variation between studies explained or acceptable
  4. Mechanism:

    • Plausible biological/logical mechanism
    • Dose-response relationships where applicable
    • Gradient of effect
  5. Application:

    • Moving from theoretical to practical
    • Real-world validation
    • Implementation success

The ERSA-Evidence Quality Relationship

  • Low evidence quality → Cannot achieve ERSA > 6 (even if many low-quality studies exist)
  • Mixed quality (some high, some low) → ERSA typically 4-6 range
  • High quality (RCTs, multiple studies) → ERSA can reach 7-8
  • Extraordinarily high quality (repeated confirmation, paradigm shift, predictions) → ERSA 9-10+

This ensures ERSA reflects both quantity AND quality of evidence, preventing false confidence in well-documented but biased findings.