ERSA: Evidence Quality, Bias, and How They Affect Theory Ratings
Part 1: The Evidence Quality Hierarchy
The quality of evidence matters enormously in determining where a theory sits on the ERSA scale. Not all evidence is created equal. A single high-quality randomized controlled trial (RCT) provides stronger evidence than 50 anecdotal case reports.
The Evidence Pyramid: From Strongest to Weakest
/ Systematic Reviews & Meta-Analyses (SR/MA) \
╱ ╲
╱ Randomized Controlled Trials (RCTs) ╲
╱ ╲
╱ Well-Designed Cohort & Case-Control Studies ╲
╱ ╲
╱ Lower Quality Observational Studies ╲
╱ ╲
╱ Case Series, Case Reports ╲
╱ ╲
╱__________________Expert Opinion & Anecdotes______________________╲
Study Design Quality Rankings
Highest Quality (Most Resistant to Bias)
-
Systematic Reviews with Meta-Analysis (SR/MA)
- Combines multiple high-quality studies
- Rigorous inclusion/exclusion criteria
- Accounts for heterogeneity and publication bias
- Quality: Can be High, Moderate, or Low depending on included studies
-
Randomized Controlled Trial (RCT)
- Random allocation minimizes selection bias and confounding
- Blinding reduces observer bias
- Can be High (well-designed) or Low (poorly designed) quality
- Gold standard for intervention studies
-
Well-Designed Prospective Cohort Study
- Follows participants over time
- Can measure dose-response relationships
- Lower risk of selection bias than case-control
- Better than retrospective designs
-
Well-Designed Case-Control Study
- Useful for rare diseases
- Retrospective nature increases bias risk
- More prone to recall bias than cohort studies
Medium Quality
- Lower-Quality Observational Studies
- Cross-sectional surveys
- Retrospective analyses
- Studies with inadequate control of confounding
- High risk of selection bias
Lower Quality (Most Prone to Bias)
-
Case Series / Case Reports
- Describes pattern across cases
- No control group
- High susceptibility to bias
- Useful for hypothesis generation but weak confirmation
-
Expert Opinion
- Based on experience and judgment
- No systematic data collection
- Prone to cognitive biases
- Lowest evidence level
Part 2: How Evidence Quality Affects ERSA Ratings
The same number of studies can produce very different ERSA levels depending on study quality.
Example 1: A Hypothesis With 10 Studies
Scenario A: All High-Quality RCTs
- 10 randomized controlled trials
- All well-designed, low risk of bias
- Sample sizes adequate (500+ participants each)
- Results consistent (all show effect in same direction)
- Effect sizes moderate to strong
ERSA Impact:
- All studies in “Highest Quality” category
- Evidence composite score: 32-36/36
- ERSA Likely: 5.5-6.5 (Robust theory with predictive power)
- Interpretation: “The evidence is strong and consistent across multiple well-designed trials”
Scenario B: Mix of Low-Quality Studies
- 10 observational studies
- Small sample sizes (20-50 participants)
- High risk of confounding (few variables controlled)
- Results mixed (some support, some contradict)
- Large effect sizes reported (suspicious)
ERSA Impact:
- Most studies in “Lower Quality” category
- Evidence composite score: 12-16/36
- ERSA Likely: 3.0-3.5 (Emerging theory, mixed evidence)
- Interpretation: “The evidence is weak and inconsistent; confounding likely explains apparent effects”
Scenario C: Single High-Quality RCT + 9 Low-Quality Studies
- 1 well-designed RCT showing no effect
- 9 observational studies showing effect
ERSA Impact:
- Highest quality study contradicts others
- Evidence composite score: 18-22/36
- ERSA Likely: 3.5-4.0 (Emerging, needs replication in high-quality designs)
- Interpretation: “Strongest evidence contradicts the apparent pattern; weak evidence may reflect bias not true effect”
Part 3: Key Sources of Bias in Evidence
Bias Type 1: Selection Bias
What it is: The way study participants are chosen systematically differs between groups
Example: Hand-washing study in 1800s
- Hospital A: Women giving birth who agreed to hand-washing protocol vs. those who refused
- Selection Bias: Women who agreed to hand-washing might have been more health-conscious generally
- Therefore: Improved outcomes might reflect their health-consciousness, not hand-washing
How it affects ERSA:
- Selection bias detected → Bradford Hill “Consistency” score reduced (2-3/4 instead of 3-4/4)
- If selection bias severe → ERSA might drop 0.5-1.0 levels
- Example: Hand-washing study moved from ERSA 3.5 → ERSA 3.0 after selection bias identified
How to detect:
- Were study groups truly comparable at baseline?
- Were there systematic differences in who participated?
- Did some participants drop out? (Differential dropout = selection bias)
Bias Type 2: Confounding
What it is: A third variable influences both the exposure and outcome, creating false association
Example: Coffee consumption and heart disease
- Observation: Coffee drinkers have higher heart disease rates
- Confounding variable: Smokers drink more coffee AND smoking causes heart disease
- True situation: Coffee doesn’t cause disease; smoking does
How it affects ERSA:
- Uncontrolled confounding → Bradford Hill “Specificity” and “Strength” scores reduced
- If major confounder not measured → ERSA drops 0.5-1.5 levels
- If confounder identified and controlled → ERSA impact minimized
Example from real science:
- Early studies suggested hormone replacement therapy (HRT) prevented heart disease
- Confounding: Women who took HRT were healthier, wealthier, had better overall health behaviors
- Large RCT showed no benefit when confounding controlled
- ERSA of HRT benefits dropped from ~5.0 → 2.5 after realization
How to detect/control:
- Randomization (RCTs eliminate confounding through randomization)
- Statistical adjustment (measure confounders and mathematically control for them)
- Stratification (separate analysis by confounder levels)
- Matching (select participants similar on confounder)
Bias Type 3: Information Bias / Measurement Error
What it is: Inaccurate or biased measurement of exposure, outcome, or confounding variables
Subtypes:
3a. Recall Bias (Retrospective Studies)
- Asking people to remember events from the past
- People with disease often remember exposures better (trying to explain their illness)
- People without disease forget details
Example:
- “Did you use mobile phones heavily 10 years ago?”
- People with brain cancer: “Yes, I remember using it constantly”
- People without cancer: “I don’t remember, maybe sometimes”
- Bias: Apparent link between phones and cancer created by differential recall
Impact: ERSA drops 0.5-1.0 if recall bias likely
3b. Observation/Measurement Error
- Equipment malfunctions
- Observer bias (seeing what you expect to see)
- Non-standardized measurement procedures
Example:
- Measuring blood pressure without proper technique
- Blood pressure varies with time of day, stress level, arm position
- Measurement error introduced that makes associations appear stronger or weaker than true value
Impact:
- Measurement error typically REDUCES ability to detect true effects
- But can sometimes INCREASE apparent effects if error is systematic
- ERSA drops if measurement error likely (0.2-0.5 levels)
3c. Outcome Assessment Bias
- The person measuring outcomes knows which group participant is in
- Unconsciously biases measurement toward expected result
Example:
- Teacher assesses whether a “gifted” child performed well vs. “typical” child
- Same performance rated higher for gifted child
- Unblinded assessment introduces bias
Impact: ERSA drops 0.3-0.8 if outcome assessor was unblinded
How to detect:
- Was measurement standardized?
- Were outcome assessors blinded to treatment assignment?
- Was there quality control on measurement?
Bias Type 4: Publication Bias
What it is: Published literature is biased toward positive results; negative results never get published
Why it happens:
- Journals preferentially publish positive findings
- Researchers more likely to submit positive studies
- Funding agencies emphasize positive outcomes
Example:
- Suppose 50 researchers test if homoeopathy works
- 45 find no effect (negative results)
- 5 find apparent positive effect (by chance)
- Only the 5 positive studies get published
- Reader sees 5/5 published studies as positive
- But true success rate: 5/50 = 10% (just chance)
How it affects ERSA:
- If publication bias likely → Bradford Hill “Consistency” score reduced
- ERSA drops 0.5-1.5 depending on bias magnitude
- Systematic reviews/meta-analyses actively search for unpublished studies to counter this
How to detect:
- Funnel plots (graphical method to detect asymmetry)
- Calculate “file drawer number” (how many unpublished negative studies would be needed to change conclusion?)
- Search for unpublished studies (dissertations, conference presentations, registered trials)
Bias Type 5: Allocation Concealment Failure
What it is: Researchers or participants know in advance which treatment group they’ll be in, allowing manipulation
Example:
- Study predicts which children will benefit most from intervention
- Researchers unconsciously assign these “best candidates” to treatment group
- Positive outcomes reflect initial selection, not intervention
Impact:
- Major threat to RCT validity
- ERSA drops 1-2 levels if allocation not properly concealed
- Example: RCT without allocation concealment often no better than observational study
Part 4: GRADE Framework: How to Adjust Evidence Quality
GRADE (Grading of Recommendations Assessment, Development and Evaluation) provides systematic approach to adjusting evidence quality based on specific factors.
GRADE Domains for DOWNGRADING Evidence
Domain 1: Risk of Bias
- Were participants randomly allocated? (RCTs)
- Was allocation concealed?
- Were outcome assessors blinded?
- Was blinding possible? (Some outcomes objective, don’t need blinding)
- Did participants complete study? (Attrition/dropout bias)
Downgrade by 1 level: Some limitations in study design/execution Downgrade by 2 levels: Serious/multiple limitations
Example Application:
- High-quality RCT with well-designed methodology: Risk of bias = NO downgrade
- RCT without outcome assessor blinding on objective outcome: NO downgrade (blinding not needed for objective measurement)
- RCT without allocation concealment: Downgrade by 1 level
- RCT with 40% dropout rate: Downgrade by 2 levels (serious bias risk)
Domain 2: Inconsistency (Variability Across Studies)
- Do results vary widely between studies?
- Is variation explained by study-level factors?
Downgrade by 1 level: Moderate inconsistency; some variation but general direction consistent Downgrade by 2 levels: Serious inconsistency; studies contradict each other
Example Application:
- 5 RCTs showing effect size of 1.2, 1.3, 1.1, 1.4, 1.2: NO downgrade (consistent)
- 5 RCTs showing effect size of 0.5, 1.5, 2.0, 0.3, 1.8: Downgrade by 1 level (moderate inconsistency)
- Some studies show benefit, others show harm: Downgrade by 2 levels (serious inconsistency)
Domain 3: Indirectness (Does Evidence Answer the Question?)
- Do studies use same populations, interventions, outcomes as clinical question?
- Are results from different setting that might not generalize?
Downgrade by 1 level: Somewhat indirect (studies in hospitals but question is community) Downgrade by 2 levels: Very indirect (studies in young adults but question is elderly)
Example Application:
- Question: Does aspirin prevent heart attack in women?
- Evidence: Studies mostly in men
- Downgrade by 1 level (results may not apply to women; sex differences in response possible)
Domain 4: Imprecision (Wide Confidence Intervals)
- Are confidence intervals wide, crossing the line of no effect?
- Sample size adequate?
- Number of events adequate (for rare outcomes)?
Downgrade by 1 level: Moderate imprecision; confidence intervals somewhat wide Downgrade by 2 levels: Serious imprecision; confidence intervals very wide or cross line of no effect
Example Application:
- Effect size 1.5 with 95% CI [1.4-1.6]: High precision → NO downgrade
- Effect size 1.5 with 95% CI [0.8-2.2]: Moderate precision (crosses toward small effects) → Downgrade by 1 level
- Effect size 1.5 with 95% CI [0.2-2.8]: Low precision (crosses into harm) → Downgrade by 2 levels
Domain 5: Publication Bias
- Evidence of bias toward publication of positive results?
Downgrade by 1 level: Suspected publication bias Downgrade by 2 levels: Likely publication bias (more than half of studies probably unpublished)
Example Application:
- Funnel plot shows missing negative studies: Downgrade by 1 level
- Search for unpublished studies reveals 20 unpublished negative studies vs. 5 published positive: Downgrade by 2 levels
GRADE Domains for UPGRADING Evidence (Primarily Non-Randomized Studies)
RCTs typically start as “High” and are downgraded. Non-randomized studies typically start as “Low” but can be upgraded.
Domain 1: Strength of Association
- Is the effect size very large (2-fold increase) or very large (3-fold increase)?
- Large effects are less likely to result from confounding
Upgrade by 1 level: Large effect (2-fold or greater) Upgrade by 2 levels: Very large effect (3-fold or greater)
Example:
- Observational study shows smoking increases lung cancer risk 10-fold
- This very large effect is unlikely to result from residual confounding
- Upgrade by 2 levels (from Low → Moderate evidence)
Domain 2: Dose-Response Gradient
- Does increasing exposure lead to increasing effect?
- Dose-response is strong evidence for causation
Upgrade by 1 level: Dose-response demonstrated
Example:
- No cigarette smoking: 1% lung cancer rate
- 1-10 cigarettes/day: 5% rate
- 11-20 cigarettes/day: 12% rate
- 20+ cigarettes/day: 25% rate
- Clear dose-response → Upgrade evidence
Domain 3: Opposing Plausible Confounding or Bias
- Are there plausible confounders that would work AGAINST the observed association?
- If confounders would create bias toward null rather than away, observed effect is stronger
Upgrade by 1 level: Plausible opposing confounding present
Example:
- Observation: People who exercise have lower heart disease rates
- Confounding worry: Wealthy people exercise more AND wealthy people have better healthcare
- But: Wealth should improve health outcomes, making exercise appear LESS protective (bias toward null)
- Fact that exercise still appears protective despite this bias → Upgrade evidence
Part 5: Detailed Examples of How Evidence Quality Affects ERSA
Example 1: Cranberry Juice for Urinary Tract Infections (UTIs)
Initial Claims (ERSA 1.5)
- Anecdotal reports: “I drank cranberry juice and my UTI went away”
- Mechanism plausible: Cranberries contain proanthocyanidins that prevent bacterial adhesion
- Evidence: Case reports, no studies yet
Bradford Hill Profile (ERSA 1.5):
- Strength 0/4 (anecdotal only)
- Consistency 0/4 (no systematic studies)
- Specificity 1/4 (specific claim but not tested)
- Plausibility 2/4 (mechanism proposed)
- Coherence 1/4 (doesn’t integrate with existing knowledge)
Early Studies (ERSA 2.5)
- Several small studies conducted
- Sample sizes: 20-100 women
- Methods: Observational, not randomized
- Results: Some found benefit, others found minimal effect
- Problems: Selection bias (who chose cranberry?), confounding (other behaviors affecting UTI risk)
Bradford Hill Profile (ERSA 2.5):
- Strength 1/4 (weak effects in some studies)
- Consistency 1/4 (mixed results)
- Specificity 1/4 (not clear who benefits)
- Experiment 0/4 (no RCTs yet)
- Plausibility 2/4 (mechanism still theoretical)
Higher-Quality Studies (ERSA 3.5-4.0)
- Better-designed RCTs (200-400 participants)
- Blinded design (participants didn’t know if getting cranberry or placebo)
- Longer follow-up (6-12 months)
- GRADE Assessment of these studies:
| Study | Design | Quality Issues | GRADE Quality |
|---|---|---|---|
| Smith et al. (2015) | RCT | No allocation concealment, 10% dropout | Moderate |
| Johnson et al. (2017) | RCT | Well-designed, randomized, blinded, minimal dropout | High |
| Lee et al. (2016) | Observational cohort | No randomization, potential confounding | Low |
| Meta-analysis (2020) | SR/MA of 12 RCTs | Moderate heterogeneity, some publication bias | Moderate |
Bradford Hill Profile with Quality Weighting (ERSA 4.0):
- Strength 2/4 (modest effect when shown; high-quality RCTs show smaller effects than low-quality studies)
- Consistency 2/4 (high-quality studies less consistent than low-quality)
- Specificity 2/4 (works better for prevention than treatment; better in certain populations)
- Experiment 3/4 (multiple RCTs now available, but not universally supportive)
- Plausibility 2/4 (mechanism studied but not fully understood)
Key Insight About Quality:
- Low-quality observational studies often showed larger cranberry benefits
- High-quality RCTs showed smaller benefits
- This discrepancy suggests low-quality studies had selection bias or confounding overestimating effect
- ERSA 4.0 reflects that effect is real but smaller than initially appeared
- Evidence quality → ERSA positioning
Current Consensus (ERSA 4.2)
- Cranberry juice shows modest benefit for UTI prevention
- Most useful in women with recurrent UTIs
- Effect smaller than antacid
Example 2: Hormone Replacement Therapy (HRT) and Heart Disease
This is a dramatic example of how evidence quality completely changed the ERSA rating.
Initial Status (ERSA 6.0-6.5, 1990s)
- Decade of observational studies
- All showed: Women on HRT had 30-50% lower heart disease rates
- Mechanism clear: Estrogen improves cholesterol, blood vessel function
- Consensus: HRT recommended for menopausal women to prevent heart disease
Evidence Quality (ERSA 6.0 era):
- Study Design: All observational (cohort studies)
- Sample sizes: Large (50,000-100,000 women)
- Risk of Bias: MODERATE-TO-HIGH
- Confounding: Wealthy women more likely to use HRT; wealthy women have better healthcare
- Selection bias: Health-conscious women chose HRT
- GRADE Assessment: LOW to MODERATE quality
- Risk of Bias: Downgrade 2 levels (severe confounding likely)
- Inconsistency: No (all studies showed same direction)
- Indirectness: No (direct population)
- Imprecision: No (large studies, narrow confidence intervals)
- Publication bias: Possible (studies showing no effect less likely published)
Bradford Hill Profile (ERSA 6.0, but was overestimated):
- Strength 3/4 (large effect size observed)
- Consistency 3/4 (multiple studies replicated)
- Specificity 3/4 (specific effect in women 45-70 years old)
- Experiment 0-1/4 (no RCTs available)
- Plausibility 3/4 (mechanism well-understood)
- Coherence 3/4 (fits with cardiovascular physiology)
Critical Problem: Experiment score was 0/4 — no high-quality RCTs confirming observational findings
The Game-Changer: The WHI RCT (2002)
Women’s Health Initiative was large, multi-center RCT:
- 16,000 women randomized
- Half received HRT; half received placebo
- Allocation concealed; participants blinded
- 5-year follow-up
- GRADE Quality: HIGH
Results:
- Contrary to observational studies
- HRT did NOT prevent heart disease
- Actually showed slight increase in cardiovascular events
- Breast cancer risk increased
Evidence Quality Recalibration:
- Observational studies had severe confounding (wealthy, health-conscious women chose HRT)
- These confounders → better health outcomes through multiple pathways, not HRT
- High-quality RCT removed confounding through randomization
- Revealed truth: HRT doesn’t prevent heart disease
Bradford Hill Profile (ERSA 2.5-3.0, post-2002):
- Strength 0/4 (effect disappeared in RCT; previous observational effect was illusory)
- Consistency 1/4 (RCT contradicts observational studies; indicates bias in observational work)
- Experiment 3/4 (large RCT showed no benefit)
- Plausibility 2/4 (mechanism exists but effect doesn’t occur in reality; mechanism less important than empirical evidence)
ERSA Shift: ERSA 6.0 → ERSA 2.5 (a drop of 3.5 levels!)
The Lesson:
- Observational studies had large effect sizes and multiple studies
- But lack of high-quality experimental evidence was a critical weakness
- GRADE framework correctly identified publication bias and confounding risk
- One high-quality RCT overturned decade of observational research
- Study DESIGN matters more than study QUANTITY for establishing causation
Example 3: Aspirin for Primary Prevention of Heart Disease
This example shows more nuanced evidence quality effects.
Initial Enthusiasm (ERSA 4.5, 1990s)
- Observational studies: People taking aspirin had fewer heart attacks
- Mechanism: Aspirin prevents platelet aggregation
- Clinical logic: If aspirin works after heart attack, should work before
Early RCTs (ERSA 3.5-4.0)
- Several RCTs in 1990s
- Design: Reasonable but variable quality
- Results: Modest benefit in some, no benefit in others
- GRADE Issues: Inconsistency (studies contradict); publication bias (positive studies published, negative less likely)
Specific Quality Issues:
| Trial | Sample Size | Design Issues | Results |
|---|---|---|---|
| ISIS-2 (1988) | 17,000 | Post-MI population (not primary prevention) | Massive benefit |
| PHS (1989) | 22,000 | Primary prevention, mostly men | Modest benefit |
| WOSCOPS (1998) | 6,000 | Primary prevention in high-risk men | Modest benefit |
| AAA Trial (2002) | 3,000 | Primary prevention in older adults | NO benefit |
| ARRIVE (2010) | 7,600 | Primary prevention in high-risk men | NO benefit |
The Problem:
- Large trials in actual primary prevention (healthy people) showed WEAK or NO benefit
- Benefits claimed in retrospective studies and trials in post-MI populations
- Different populations show different results
Bradford Hill Assessment:
- Specificity VERY IMPORTANT here: Effect depends heavily on who takes it
- Post-heart-attack patients: Clear benefit (ERSA 7-8)
- High-risk men: Modest benefit (ERSA 4-5)
- Average healthy people: No detectable benefit (ERSA 1-2)
GRADE Downgrading:
- Indirectness: Studies in post-MI don’t directly answer question about primary prevention
- Inconsistency: Different populations show different results
- Imprecision: In healthy populations, confidence intervals include possibility of harm
Final ERSA Status:
- Aspirin in primary prevention: ERSA 3.5-4.0 (weak evidence for modest benefit in high-risk subgroups)
- Aspirin in secondary prevention (post-MI): ERSA 7.5 (strong evidence for clear benefit)
The Lesson:
- Specificity matters enormously
- Same intervention can be ERSA 3 in one population and ERSA 7 in another
- Low-quality evidence (observational) overestimated benefit
- High-quality evidence (RCTs) in primary prevention showed limited benefit
- Indirectness → ERSA downgrade
Example 4: Vitamin D Supplementation for Bone Health
This shows complexity of dose-response and mechanism relationships.
Phase 1 (ERSA 5.0, 1950s-1980s)
- Strong mechanistic understanding: Vitamin D essential for calcium absorption
- Dose-response clear: More vitamin D → better calcium absorption
- Prediction: Low vitamin D → weak bones; supplementation should strengthen bones
Phase 2 (ERSA 5.5, 1990s-2000s)
- RCTs show vitamin D prevents fractures in high-risk elderly
- Clear dose-response: 800+ IU daily shows benefit
- Multiple RCTs confirm
Bradford Hill Profile:
- Strength 3/4 (risk reduction ~15-20%)
- Consistency 3/4 (most RCTs show benefit)
- Dose-Response 3/4 (clear gradient: more vitamin D → more benefit)
- Experiment 3/4 (multiple RCTs supportive)
Phase 3 (ERSA 3.5-4.0, 2010s-present)
- Mega RCTs testing high-dose vitamin D
- Results: Disappointing
- High-dose vitamin D (1000-4000 IU daily) doesn’t prevent fractures in general population
- Only benefits in elderly/institutionalized
- Publication bias: Positive studies more likely published
The Puzzle:
- Mechanistic understanding still correct
- Dose-response observed in some studies
- But clinical benefit much smaller than predicted
Bradford Hill Recalibration:
- Specificity 1/4 (doesn’t work as generally thought)
- Consistency 1/4 (mixed results in general population)
- Dose-Response: Complex (benefit plateaus; very high doses don’t help more)
GRADE Downgrading:
- Publication Bias: Likely; many negative studies probably unpublished
- Inconsistency: Results vary by population
- Imprecision: Confidence intervals wide in large trials
Current ERSA:
- Vitamin D for bone health in general population: ERSA 3.5
- Vitamin D in elderly: ERSA 4.5-5.0
- Vitamin D mechanism for calcium absorption: ERSA 8.5 (different from clinical benefit)
The Lesson:
- Strong mechanistic understanding ≠ strong clinical benefit
- Dose-response relationship ≠ effectiveness
- Publication bias can exaggerate benefits
- Large, well-designed RCTs sometimes contradict smaller positive studies
- ERSA levels should reflect clinical utility, not just mechanism
Part 6: How to Improve Evidence Quality (Lowering ERSA Uncertainty)
For Observational Studies
1. Randomization is Ultimate Solution
- Eliminates selection bias and confounding through randomization
- RCT moving observational finding to confirmation level
2. Measure and Control for Confounders
- Identify all potential confounders
- Measure them in study
- Use statistical methods to adjust
- This can improve LOW-quality observational study
3. Blinding
- Participants don’t know treatment
- Outcome assessors don’t know treatment
- Reduces expectation bias
4. Pre-registration
- Register study protocol before data analysis
- Prevents “p-hacking” (trying every analysis until one is significant)
- Commits to primary analysis vs. exploratory analysis
For Clinical Trials
1. Adequate Sample Size
- Power calculations ensure enough participants
- Reduces imprecision
- Reduces chance of false positive by random variation
2. Allocation Concealment
- Researchers can’t predict group assignment
- Prevents manipulation to put “best” participants in treatment
3. Blinding
- Participants blinded (if possible)
- Outcome assessors blinded
- Analysts blinded (where feasible)
4. Intention-to-Treat Analysis
- Include all randomized participants in analysis
- Even if they didn’t complete treatment
- Preserves randomization benefit
- Prevents bias from differential dropout
5. Adequate Follow-up
- Minimize dropout/attrition
- Document reasons for dropout
- Analyze whether dropout differential between groups
For Meta-Analyses
1. Comprehensive Search
- Search published AND unpublished studies
- Reduces publication bias
- Contact researchers for unpublished data
2. Risk of Bias Assessment
- Evaluate each study independently
- Use standardized tools
- Consider sensitivity analyses excluding high-bias studies
3. Heterogeneity Exploration
- When results vary, explore why
- Does effect differ by:
- Population characteristics?
- Intervention dose/duration?
- Outcome definitions?
- Study quality?
4. Transparency
- Pre-register protocol
- Follow PRISMA guidelines
- Report all analyses
Part 7: The Interaction Between Evidence Quality and ERSA Level
Key Principle: ERSA Incorporates Evidence Quality
High ERSA levels (7+) essentially REQUIRE that supporting evidence be high quality. You cannot reach ERSA 7 with only case reports and anecdotes.
Quality Thresholds for Each ERSA Level
| ERSA Level | Minimum Evidence Quality Required | Can Reach With? | Cannot Reach With? |
|---|---|---|---|
| ERSA -1 | Proven false by HIGH-quality evidence | High-quality RCTs showing contradictory effect | Anecdotes; low-quality studies |
| ERSA 0 | Unfalsifiable or low-quality contradicting evidence | Expert consensus; poor-quality evidence | (Cannot have “high-quality” evidence at ERSA 0 by definition) |
| ERSA 1-2 | Primarily anecdotal or very limited studies | Case reports; small observational studies | Should have high-quality evidence |
| ERSA 3-4 | Mixed-quality studies; emerging replication | Mix of observational + some RCTs | Cannot be primarily high-quality RCTs (would be higher) |
| ERSA 5-6 | Multiple high-quality studies; predictive power | Multiple RCTs; SR/MA of high-quality studies | Primarily anecdotes or case reports |
| ERSA 7-8 | Predominantly HIGH-quality evidence; real-world validation | Numerous high-quality RCTs; confirmed predictions; practical application | Single study; primarily observational evidence |
| ERSA 9 | Overwhelming high-quality evidence; century+ validation | Extensive high-quality RCTs; confirmed predictions; paradigm status; SR/MA | Recent discovery; limited replication |
| ERSA 10+ | Multiple independent lines of high-quality evidence; paradigm-shifting confirmations | Confirmed predictions so counterintuitive they constitute extraordinary evidence | Contradicted by any credible evidence |
The Paradox: More Studies Doesn’t Always Mean Higher ERSA
- 100 case reports of positive effect → ERSA 2.0
- 5 high-quality RCTs showing no effect → ERSA 2.5-3.0
Quality trumps quantity in ERSA assessment.
Quality-Evidence Trade-off
Sometimes high-quality evidence shows SMALLER effects than low-quality evidence. This is normal and expected:
Why:
- Low-quality studies prone to bias that artificially inflates effect size
- High-quality RCTs control bias; effect is smaller but more accurate
Example:
- 20 observational studies show vitamin D reduces fractures 40%
- 3 large RCTs show vitamin D reduces fractures 5%
- The RCTs are more trustworthy because better designed
- ERSA based on RCTs (~4.5) is more justified than based on observational studies (~5.5)
Part 8: Additional Examples Across Different Fields
Example A: Post-Scarcity Human Motivation (ERSA 1.0-1.5)
Evidence Characteristics:
- No real post-scarcity societies exist for study
- Observational data limited to small-scale gift economies, artist communities
- Thought experiments and theoretical modeling
- Some evolutionary psychology frameworks applicable
Evidence Quality Assessment:
- Study Design: Primarily theoretical; limited observational
- Sample Size: No direct data; analogies to small groups (500-5000 people)
- Mechanism: Plausible but speculative
- Confounding: Major issue — hard to separate “post-scarcity” from culture, group size, other variables
- Publication Bias: Likely — positive findings about motivation changes more publishable
Bradford Hill Profile (ERSA 1.2):
- Strength 0/4 (no real post-scarcity to measure)
- Consistency 0/4 (no comparative studies)
- Specificity 1/4 (prediction possible but untestable)
- Temporality 0/4 (causation undetermined)
- Plausibility 2/4 (fits some psychological theories)
- Coherence 1/4 (conflicts with existing economics)
- Experiment 0/4 (no experiments possible yet)
Why stuck at ERSA 1.0-1.5:
- Falsifiability issue: Cannot test without creating actual post-scarcity society
- No experimental framework
- No dose-response could be measured
- Proxy measures (small gift economies) have major confounding
Path to Higher ERSA:
- Establish small-scale trial communities (difficult but possible)
- Control for confounding variables (culture, group size, education, etc.)
- Measure motivation systematically
- Compare to matched control communities
- Might reach ERSA 3-4 after 20+ years of study
Example B: 5G Cell Tower Brain Damage Claims (ERSA -0.5)
Evidence Characteristics:
- Anecdotal reports of headaches, sleep problems near towers
- No high-quality RCTs
- Observational data confounded by:
- Awareness bias (if you believe 5G harmful, you report symptoms more)
- Nocebo effect (belief causes symptoms)
- Pre-existing health conditions
- Stress/anxiety about radiation
Evidence Quality Assessment:
- Study Design: Primarily anecdotal; few case reports
- Mechanism: Plausible from radiation physics perspective (but 5G non-ionizing radiation unlike cancer-causing ionizing)
- Exposure measurement: Highly inaccurate (people don’t know actual exposure levels)
- Outcome measurement: Subjective (headaches self-reported; not objectively measured)
- Selection Bias: SEVERE (people who notice symptoms report; asymptomatic people don’t)
- Publication Bias: MAJOR (positive anecdotes spread; negative reports ignored)
Bradford Hill Profile (ERSA -0.5):
- Strength 0/4 (no objective effects measured in research)
- Consistency 0/4 (studies contradicting each other; high-quality studies show no effect)
- Specificity 0/4 (symptoms nonspecific; same symptoms from many causes)
- Temporality 1/4 (temporal relationship unclear; symptoms predate towers in many cases)
- Biological Gradient 0/4 (no dose-response shown; people far from towers report same symptoms)
- Plausibility 1/4 (mechanism implausible; non-ionizing radiation insufficient for DNA damage; power levels too low)
- Coherence 0/4 (contradicts physics; contradicts large body of research on non-ionizing radiation)
- Experiment 0/4 (RCTs show no effect; blinded studies show no difference from sham)
High-Quality Evidence Contradicting Claim:
- Large-scale blinded studies: Exposing people to 5G and to sham 5G; no difference in reported symptoms
- Selection of placebo responders: When told “this might cause symptoms,” 50% report symptoms even in sham group
- Dose-response absent: People reporting maximum symptoms live in areas with lowest 5G exposure
- Biological mechanism absent: Established biological effects of non-ionizing radiation occur at power levels 1000x higher than 5G
ERSA Assessment:
- Anecdotal claims: ERSA -0.5 (not consistent with high-quality evidence; selective reporting of confirmatory cases)
- 5G safety in general: ERSA 8.0 (extensive research showing safety; no plausible mechanism for harm)
Why claims persist despite evidence:
- Availability bias: News stories about 5G health scares more memorable
- Confirmation bias: People interpret every health problem as 5G-related
- Publication bias: Negative findings (no effect) less newsworthy than anecdotal positive claims
- Healthy skepticism sometimes becomes conspiracy thinking when evidence is misunderstood
Example C: Light Bulb Conspiracy (ERSA 7.5-8.0)
The Claim: Manufacturers conspired to create short-lived light bulbs to force consumers to buy more
Evidence Quality: HIGH and multi-source
Historical Record:
- 1920s: Phoebus Cartel formed by light bulb manufacturers
- Goal: Limit light bulb lifespan to shorten replacement cycle
- Documents: Explicit written records of conspiracy
- Result: Light bulbs intentionally designed to fail after ~1000 hours
- Duration: 1920s-1940s (until antitrust prosecution)
Evidence Quality Assessment:
- Study Design: Historical documentary evidence (contracts, meeting notes, patent restrictions)
- Primary Source: Cartel documents (highest quality evidence for historical claims)
- Sample Size: N/A (complete documentation available)
- Mechanism: Clear and documented
- Publication Bias: Low (conspiracy was exposed through official prosecution)
- Confounding: None (direct evidence of intent)
Bradford Hill Profile (ERSA 8.0 for historical claim):
- Strength 4/4 (overwhelming documentary evidence)
- Consistency 4/4 (consistent across multiple sources)
- Specificity 4/4 (specific dates, people, actions documented)
- Temporality 4/4 (clear timeline)
- Biological Gradient N/A (not applicable to historical claim)
- Plausibility 4/4 (aligns with known manufacturing practices)
- Coherence 4/4 (explains observed pattern of light bulb life)
- Experiment 4/4 (evidence from actual historical events)
- Analogy 4/4 (similar conspiracy practices documented in other industries)
Why ERSA 8.0 rather than 9.0:
- Historical claim (not ongoing theory)
- No predictions about future (testing mechanism stopped post-prosecution)
- Would be ERSA 9.0 if conspiracy continued and predictions about modern bulbs confirmed
Important Distinction:
- This is PROVEN conspiracy (ERSA 8.0)
- Different from unproven 5G conspiracy claims (ERSA -0.5)
- Difference: Documentary evidence vs. anecdotal evidence
Example D: BlackRock Ownership Conspiracy (ERSA 4.5-5.0)
The Claim: BlackRock owns significant shares in most publicly traded companies; this represents dangerous consolidation
Evidence Quality: MIXED but largely HIGH for factual claim (ownership); LOWER for implications
Factual Component (Ownership): ERSA 8.5
- BlackRock is world’s largest asset manager
- Manages $10+ trillion in assets
- Owns shares in ~95% of S&P 500 companies
- Owns shares in all major competitors
Evidence Quality:
- Source: SEC filings (highest quality, public record)
- Mechanism: Index fund ownership (if you own S&P 500 index, you own bit of every company)
- Verification: Public databases easily confirm holdings
- Confounding: None (facts directly verifiable)
Bradford Hill Profile (ERSA 8.5 for ownership fact):
- Strength 4/4 (documented in SEC filings)
- Consistency 4/4 (consistent across multiple reports)
- Specificity 4/4 (exact holdings documented)
- Experiment 4/4 (empirically verifiable from public records)
Implications Component (Danger/Conspiracy): ERSA 3.5-4.5
The Claim: This ownership structure allows BlackRock to control companies unfairly
Evidence Quality: MODERATE but mixed
Factual Support:
- BlackRock does influence corporate governance through voting
- BlackRock has pushed for environmental, social, governance (ESG) criteria
- These activities documented in proxy voting records
Evidence Against:
- No evidence of hidden coordination between competing companies
- Companies still compete aggressively in market
- Stock prices still vary based on company performance
- Other institutions have similar ownership patterns
Confounding Variables:
- Index fund ownership is structural (inevitable result of indexing)
- BlackRock doesn’t “choose” which companies to own (owns all S&P 500 companies)
- ESG voting is transparent and publicly stated (not hidden conspiracy)
Mechanism Implausibility:
- For conspiracy to work, would require BlackRock to coordinate competing firms
- Competing firms have opposing interests
- No mechanism for BlackRock to enforce coordination without detection
- Antitrust law prohibits such coordination
Bradford Hill Profile (ERSA 4.0 for conspiracy implications):
- Strength 1/4 (some influence documented, but effect size small compared to company competition)
- Consistency 2/4 (mixed evidence; some cases show correlation, but causation unclear)
- Specificity 1/4 (unclear what “control” means; competition appears intact)
- Plausibility 2/4 (mechanism implausible given legal restrictions)
- Coherence 1/4 (contradicts observed competition in markets; stock prices still vary)
- Experiment 0/4 (no experimental evidence; can’t test counterfactual)
- Mechanism 1/4 (mechanism for enforcement of alleged conspiracy unclear)
ERSA Assessment:
- Ownership fact: ERSA 8.5 (proven; documented in SEC filings)
- Conspiracy implications: ERSA 4.0 (plausible but unproven; alternative explanations better fit evidence)
The Distinction:
- Proven conspiracy (light bulb cartel): ERSA 8.0 (documentary evidence)
- Documented ownership (BlackRock): ERSA 8.5 (SEC filings)
- Alleged conspiracy from ownership: ERSA 4.0 (plausible but insufficient evidence)
- Unproven conspiracy (5G): ERSA -0.5 (contradicted by high-quality evidence)
Why Different:
- Each claim requires different evidence quality threshold
- Ownership claims require SEC documentation (high quality)
- Conspiracy claims require evidence of coordination (not just ownership)
- Harm claims require evidence of actual harm (not just mechanism)
Summary: Evidence Quality Framework for ERSA
Quick Reference: How to Evaluate Evidence Quality
-
Study Design Quality:
- High: SR/MA, RCTs, well-designed prospective cohorts
- Medium: Less-well-designed observational studies
- Low: Case series, anecdotes, expert opinion
-
Risk of Bias Assessment (Using GRADE):
- Selection bias: Random allocation, concealment, comparable groups
- Confounding: Measured and controlled, or randomized
- Information bias: Standardized measurement, blinding
- Publication bias: Comprehensive search, funnel plot analysis
-
Consistency:
- Multiple independent studies reaching same conclusion
- Variation between studies explained or acceptable
-
Mechanism:
- Plausible biological/logical mechanism
- Dose-response relationships where applicable
- Gradient of effect
-
Application:
- Moving from theoretical to practical
- Real-world validation
- Implementation success
The ERSA-Evidence Quality Relationship
- Low evidence quality → Cannot achieve ERSA > 6 (even if many low-quality studies exist)
- Mixed quality (some high, some low) → ERSA typically 4-6 range
- High quality (RCTs, multiple studies) → ERSA can reach 7-8
- Extraordinarily high quality (repeated confirmation, paradigm shift, predictions) → ERSA 9-10+
This ensures ERSA reflects both quantity AND quality of evidence, preventing false confidence in well-documented but biased findings.