The elusiveness of ethics

Encoding fairness in an unfair world

Agenda

Talk (50 minutes)
Q & A (5 minutes)
Break (5 minutes)
Activity (25-30 minutes)

About me

Slides

summerscope.github.io/slides/elusiveness-of-ethics

Section 1

Setting the scene

Once upon a time, in a
FinTech startup...

Descriptive
-vs-
Normative

Girls wear pink

(Descriptive)

Girls wear pink

(Normative)

The way the world is?

or

The way the world should be?

For example...

We make normative assertions all the time

Think of the drone...

Why should Machine Learning systems be more fair than the data from which they learn?

All data is bias

Some bias is harmful

Computers can't tell the difference

ML systems can harden and amplify harmful bias

Awful AI

https://github.com/daviddao/awful-ai

Why diversity?

On principle: technology should belong to all
People who experience harmful bias are more sensitive to the possibilities
POC, Women, etc., are not inherently more ethical than anyone else

Human bias (implicit or explicit) is scoped, the impact limited in a way that machine bias is not

Issue of scale is also the
promise of scale

Why should we use machines to solve a problem caused by machines?

We need technical solutions to solve technical problems

at scale

For example...

1.2 billion photos per day uploaded to Google photos in 2017
Manually labelling each one would take well over 1 million people working full time
If 1% of those label decisions were flagged for human review and each review took 3 minutes, it would require 150,000 people working full time (Google currently employs 103,459 people)

We need to bake in tests and set up thresholds

Save human attention for the trickiest cases

"Machine assisted fairness"

Section 2

Approaches

1. Tooling & Tests

Where might we try to address bias?

Capturing and labelling data
ML design (modeling decisions)
Retroactively after classification

AI Explainability 360
AI Fairness 360
Lime
Aequitas - Bias and Fairness Audit Toolkit
Santa Clara University Ethics Toolkit
Fairness modeling - tradeoffs
Unit tests (?)
CI - "Continuous Inference" (?)

Special mentions

2A. Academic & Corporate research

FAT (fairness, transparency, accountability)
fairxiv.org
FAT/ML
fatconference.org
AI Now
Oxford Digital Ethics Lab
Gradient Institute
3AI

For example

Trying to remove gender bias from NLP

"Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings"

https://arxiv.org/pdf/1607.06520.pdf

Lipstick on a pig

https://arxiv.org/pdf/1903.03862.pdf

2B. Self Regulation /
Soft Regulation

Standards
Checklists
Policies
Risk assessment frameworks

Special mentions

Principled artificial intelligence visualisation

A good starting paper

The Ethics of AI Ethics - An Evaluation of Guidelines

https://arxiv.org/abs/1903.03425

The Montreal Declaration

montrealdeclaration-responsibleai.com

3. Law / Hard Regulation

GDPR
🦗🦗🦗

More coming soon?

National policy? National agency?
Right to a (human parsable) explanation?
Dataset standardised labels?
FDA style drug-testing? Wired

Section 3

Fairness definitions

	Actual Positive	Actual Negative
Predicted Positive	True Positive (TP)	False Positive (FP)
Predicted Negative	False Negative (FP)	True Negative (TP)

	Actual Positive	Actual Negative
Predicted Positive	True Positive (TP) `PPV = TP / (TP + FP)` `TPR = TP / (TP + FN)`	False Positive (FP) `FDR = FP / (TP + FP)` `FPR = TP / (FP + TP)`
Predicted Negative	False Negative (FP) `FOR = FN / (TN + FN)` `FNR = TP / (TP + FN)`	True Negative (TP) `NPV = TN / (TN + FN)` `TNR = TN / (TN + FP)`

Definition based on predicted outcomes

Group fairness

Individuals in protected and unprotected groups have equal probability of being predicted 'positive' by the classifier

Definition based on predicted & actual outcomes

Predictive parity

Individuals in protected and unprotected groups have equal PPV (probability of groups being actual positive and predicted positive)

Overall accuracy equality

Individuals in protected and unprotected groups have equal predictive accuracy (Probability of a subject from actual positive or negative to be predicted positive or negative)

Similarity-based measures

Fairness through unawareness

No sensitive or protected attributes are used in the decision making process

Causal reasoning

Counterfactual fairness

If predicted outcome does not rely on protected attribute (G)

Plus many more...

It tells us that...

We cannot optimise for all measures of fairness simultaneously
We have to pick which ones matter most
Context matters, as do contextual norms

Medical diagnostics
-vs-
Recidivism and parole

We don't (yet) have a single fairness measure to rule them all.

So what!?

No physics 'Theory of everything' (yet)

We still put planes in the air

We build saftey mechanisms despite the lack of theoretical coherence

We'd take ML safety more seriously if we could see the wreckage

Fairness definitions cheatsheet

tinyletter.com/summerscope

Section 4

Ways forward

Allow me one more hot take...

Building ML systems in a capitalist, corporate context:

There is no prediction; only intervention

Classification is never descriptive, always normative

Ways forward

New OS Licences?

Hippocratic Licence

Cost sensitive learning

Co-opt these techniques to optimise for fairness?

Continual learning

All production data is validation data

Measure and counteract the human impact of harmful bias?

Category theory for intersectional bias

Eugenia Cheng TED talk

What if we combined a mathematical approach like this with data about outcomes - could we create a fair 'positive bias' model to negate systemic bias?

Design for failure,
not success

End users are experts on themselves
Bake in feedback loops
Plan to improve the algo over time
Plan an 'emergency brake'
Display system confidence
Promise less, not more
(Hire some designers!!)

"Complex systems are the best gift to finger pointing in the history of humanity"

SOURCE: What is wrong with UX podcast

Bystander effect

No invisible work

Empathy is a useful tool but it's not a full coverage test
Personal anecdotes are good to make it human, make it real, avoid psychic numbing

No 'special' work

When you other ethics,
you other ethics

Process conversation starters:

Ethics review at kick-off meeting
Lean ethics canvas
Alternative feedback mechanisms, like an annonymous form
Ethics debrief at retro

Combine

tooling + process + culture

QA for good QA

Is it...

Measureable?
Enforceable?
One-off or iterable?
Showing change over time?
Easy to integrate with existing process?
Easy to integrate with existing codebase?
Before, during, or after deployment?
Aid or blocker?

Put it in the bin

Acknowledging failure

(is inevitable)

"I am an ethical person therefore I build ethical tech"

If you write a bug, it doesn't make you a bad programmer

If you make an ethical mistake, it doesn't make you an unethical person

Let's draw a line in the sand

Claiming moral authority

Claiming moral imperative

Acknowledging complexity

Thank you!

Questions?

summerscope.github.io/slides/elusiveness-of-ethics

The elusiveness of ethics

Encoding fairness in an unfair world

Agenda

About me

Slides

Section 1

Setting the scene

Once upon a time, in a FinTech startup...

Descriptive -vs- Normative

Girls wear pink

(Descriptive)

Girls wear pink

(Normative)

The way the world is?

or

The way the world should be?

For example...

We make normative assertions all the time

Think of the drone...

Why should Machine Learning systems be more fair than the data from which they learn?

All data is bias

Some bias is harmful

Computers can't tell the difference

ML systems can harden and amplify harmful bias

Why diversity?

Human bias (implicit or explicit) is scoped, the impact limited in a way that machine bias is not

Issue of scale is also the promise of scale

Why should we use machines to solve a problem caused by machines?

We need technical solutions to solve technical problems

at scale

For example...

We need to bake in tests and set up thresholds

Save human attention for the trickiest cases

"Machine assisted fairness"

Section 2

Approaches

1. Tooling & Tests

Where might we try to address bias?

Special mentions

2A. Academic & Corporate research

For example

Lipstick on a pig

2B. Self Regulation / Soft Regulation

Special mentions

A good starting paper

The Montreal Declaration

3. Law / Hard Regulation

More coming soon?

Section 3

Fairness definitions

True Positive (TP)

False Positive (FP)

False Negative (FP)

True Negative (TP)

True Positive (TP)

False Positive (FP)

False Negative (FP)

True Negative (TP)

Definition based on predicted outcomes

Group fairness

Definition based on predicted & actual outcomes

Predictive parity

Overall accuracy equality

Similarity-based measures

Fairness through unawareness

Causal reasoning

Counterfactual fairness

Plus many more...

Further reading

It tells us that...

Medical diagnostics -vs- Recidivism and parole

So what!?

We still put planes in the air

We build saftey mechanisms despite the lack of theoretical coherence

We'd take ML safety more seriously if we could see the wreckage

Fairness definitions cheatsheet

Section 4

Ways forward

There is no prediction; only intervention

Classification is never descriptive, always normative

Once upon a time, in a
FinTech startup...

Descriptive
-vs-
Normative

Issue of scale is also the
promise of scale

2B. Self Regulation /
Soft Regulation

Medical diagnostics
-vs-
Recidivism and parole

Design for failure,
not success