Framework for detecting, measuring, and mitigating bias in AI systems.
Hanzo AI Team (@hanzoai)
Created: 2025-12-17
ai-ethicsfairnessbiasevaluation
Requires: HIP-200, HIP-201
HIP-220: Bias Detection & Mitigation
Abstract
This HIP establishes the framework for detecting, measuring, and mitigating bias in Hanzo AI systems. It defines bias categories, evaluation methodologies, fairness metrics, and remediation processes to ensure AI systems treat all users equitably.
Bias Framework
Definitions
Algorithmic bias: Systematic and repeatable errors in a computer system that create unfair outcomes for certain groups.
Fairness: The principle that AI systems should treat individuals and groups equitably, without discrimination based on protected characteristics.
Protected Characteristics
Category
Characteristics
Demographic
Race, ethnicity, nationality, religion
Personal
Age, gender, sexual orientation, disability
Socioeconomic
Income, education, occupation
Geographic
Region, urban/rural, language
Bias Types
Pre-existing Bias
Type
Source
Example
Historical
Past discrimination in data
Hiring data reflecting past discrimination
Representation
Underrepresentation in data
Fewer examples of certain groups
Measurement
How data is collected
Sensors less accurate for certain skin tones
Technical Bias
Type
Source
Example
Aggregation
One-size-fits-all models
Medical model trained primarily on one population
Learning
Algorithm amplification
Feedback loops reinforcing initial bias
Evaluation
Biased benchmarks
Test sets not representative
Emergent Bias
Type
Source
Example
Deployment
Context mismatch
Model used in unintended population
Interaction
User behavior patterns
Different usage across groups
Temporal
Changing contexts
Society changes, model doesn't
Bias Detection
Detection Methods
Quantitative Analysis
Method
Application
Tools
Demographic parity
Equal prediction rates
Statistical analysis
Equalized odds
Equal TPR/FPR across groups
Fairlearn
Calibration
Consistent probability meaning
Reliability diagrams
Individual fairness
Similar treatment for similar individuals
Distance metrics
Qualitative Analysis
Method
Application
Approach
Output auditing
Review generated content
Human evaluation
Prompt testing
Test with demographic markers
Structured prompts
User studies
Perception of fairness
Surveys, interviews
Evaluation Datasets
Standard Benchmarks
Benchmark
Bias Type
Metrics
WinoBias
Gender
Accuracy parity
StereoSet
Stereotype
Language model score
CrowS-Pairs
Multiple
Preference score
BBQ
Social bias
Accuracy in ambiguous contexts
Custom Evaluation Sets
Dataset
Coverage
Size
Hanzo-Bias-1K
Multi-category bias probes
1,000+
Regional-Fairness
Geographic/cultural bias
500+
Intersectional-Set
Intersecting identities
300+
Audit Process
Pre-Deployment Audit
1. Define evaluation scope (groups, use cases)
↓
2. Select appropriate metrics
↓
3. Run quantitative evaluation
↓
4. Conduct qualitative review
↓
5. Document findings
↓
6. Determine if thresholds met
Ongoing Monitoring
Activity
Frequency
Scope
Automated metrics
Weekly
Key fairness metrics
Sample audits
Monthly
Human review sample
Full audit
Quarterly
Comprehensive evaluation
Fairness Metrics
Group Fairness Metrics
Demographic Parity
Definition: Prediction rates equal across groups
Formula:
P(Ŷ=1|A=a) = P(Ŷ=1|A=b) for all groups a, b
Threshold: <10% difference across groups
Equalized Odds
Definition: Equal true positive and false positive rates across groups