The elusiveness of ethics

Encoding fairness in an unfair world

Agenda

  1. Talk (50 minutes)
  2. Q & A (5 minutes)
  3. Break (5 minutes)
  4. Activity (25-30 minutes)

About me

Slides

summerscope.github.io/slides/elusiveness-of-ethics

Section 1

Setting the scene

Once upon a time, in a
FinTech startup...

Descriptive
-vs-
Normative

Girls wear pink

(Descriptive)

Girls wear pink

(Normative)

The way the world is?

or

The way the world should be?

For example...

We make normative assertions all the time

Think of the drone...

Why should Machine Learning systems be more fair than the data from which they learn?

All data is bias

Some bias is harmful

Computers can't tell the difference

ML systems can harden and amplify harmful bias

Awful AI

https://github.com/daviddao/awful-ai

Why diversity?

  1. On principle: technology should belong to all
  2. People who experience harmful bias are more sensitive to the possibilities
  3. POC, Women, etc., are not inherently more ethical than anyone else

Human bias (implicit or explicit) is scoped, the impact limited in a way that machine bias is not

Issue of scale is also the
promise of scale

Why should we use machines to solve a problem caused by machines?

We need technical solutions to solve technical problems

at scale

For example...

Google Photos
  • 1.2 billion photos per day uploaded to Google photos in 2017
  • Manually labelling each one would take well over 1 million people working full time
  • If 1% of those label decisions were flagged for human review and each review took 3 minutes, it would require 150,000 people working full time (Google currently employs 103,459 people)

We need to bake in tests and set up thresholds

Save human attention for the trickiest cases

"Machine assisted fairness"

Section 2

Approaches

1. Tooling & Tests

Where might we try to address bias?

  1. Capturing and labelling data
  2. ML design (modeling decisions)
  3. Retroactively after classification

Special mentions

2A. Academic & Corporate research

For example

Trying to remove gender bias from NLP

"Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings"

https://arxiv.org/pdf/1607.06520.pdf

Lipstick on a pig

https://arxiv.org/pdf/1903.03862.pdf

2B. Self Regulation /
Soft Regulation

  • Standards
  • Checklists
  • Policies
  • Risk assessment frameworks

Special mentions

Principled artificial intelligence visualisation

A good starting paper

The Ethics of AI Ethics - An Evaluation of Guidelines

https://arxiv.org/abs/1903.03425

The Montreal Declaration

montrealdeclaration-responsibleai.com

3. Law / Hard Regulation

  • GDPR
  • 🦗🦗🦗

More coming soon?

  • National policy? National agency?
  • Right to a (human parsable) explanation?
  • Dataset standardised labels?
  • FDA style drug-testing? Wired

Section 3

Fairness definitions

  Actual Positive Actual Negative
Predicted Positive
True Positive (TP)
False Positive (FP)
Predicted Negative
False Negative (FP)
True Negative (TP)
  Actual Positive Actual Negative
Predicted Positive
True Positive (TP)
PPV = TP / (TP + FP)
TPR = TP / (TP + FN)
False Positive (FP)
FDR = FP / (TP + FP)
FPR = TP / (FP + TP)
Predicted Negative
False Negative (FP)
FOR = FN / (TN + FN)
FNR = TP / (TP + FN)
True Negative (TP)
NPV = TN / (TN + FN)
TNR = TN / (TN + FP)

Definition based on predicted outcomes

Group fairness

Individuals in protected and unprotected groups have equal probability of being predicted 'positive' by the classifier

Definition based on predicted & actual outcomes

Predictive parity

Individuals in protected and unprotected groups have equal PPV (probability of groups being actual positive and predicted positive)

Overall accuracy equality

Individuals in protected and unprotected groups have equal predictive accuracy (Probability of a subject from actual positive or negative to be predicted positive or negative)

Similarity-based measures

Fairness through unawareness

No sensitive or protected attributes are used in the decision making process

Causal reasoning

Counterfactual fairness

If predicted outcome does not rely on protected attribute (G)

Plus many more...

Further reading

Fairness Definitions Explained
YOW! Data 2019 - Finn Lattimore - Engineering an Ethical AI System
Unpopular opinion: the entangled nature of the different measures of fairness is important and useful

It tells us that...

  • We cannot optimise for all measures of fairness simultaneously
  • We have to pick which ones matter most
  • Context matters, as do contextual norms

Medical diagnostics
-vs-
Recidivism and parole

We don't (yet) have a single fairness measure to rule them all.

So what!?

No physics 'Theory of everything' (yet)

We still put planes in the air

We build saftey mechanisms despite the lack of theoretical coherence

We'd take ML safety more seriously if we could see the wreckage

Fairness definitions cheatsheet

tinyletter.com/summerscope

Section 4

Ways forward

Allow me one more hot take...

Building ML systems in a capitalist, corporate context:

There is no prediction; only intervention

Classification is never descriptive, always normative

Ways forward

New OS Licences?

Hippocratic Licence

Cost sensitive learning

Co-opt these techniques to optimise for fairness?

Continual learning

All production data is validation data

Measure and counteract the human impact of harmful bias?

Category theory for intersectional bias

Eugenia Cheng TED talk

What if we combined a mathematical approach like this with data about outcomes - could we create a fair 'positive bias' model to negate systemic bias?

Design for failure,
not success

  • End users are experts on themselves
  • Bake in feedback loops
  • Plan to improve the algo over time
  • Plan an 'emergency brake'
  • Display system confidence
  • Promise less, not more
  • (Hire some designers!!)
"Complex systems are the best gift to finger pointing in the history of humanity"

SOURCE: What is wrong with UX podcast

Bystander effect

No invisible work

Empathy is a useful tool but it's not a full coverage test

No 'special' work

When you other ethics,
you other ethics

Process conversation starters:

  • Ethics review at kick-off meeting
  • Lean ethics canvas
  • Alternative feedback mechanisms, like an annonymous form
  • Ethics debrief at retro

Combine

tooling + process + culture

QA for good QA

Is it...

  • Measureable?
  • Enforceable?
  • One-off or iterable?
  • Showing change over time?
  • Easy to integrate with existing process?
  • Easy to integrate with existing codebase?
  • Before, during, or after deployment?
  • Aid or blocker?

Put it in the bin

Acknowledging failure

(is inevitable)

"I am an ethical person therefore I build ethical tech"
If you write a bug, it doesn't make you a bad programmer
If you make an ethical mistake, it doesn't make you an unethical person

Let's draw a line in the sand

Claiming moral authority


 
 

Claiming moral imperative

Acknowledging complexity

Thank you!

Questions?

summerscope.github.io/slides/elusiveness-of-ethics

Activity

A lean ethics canvas

Let's try it!