Module 2: Examples of Ethical Challenges in Machine Learning

This module will provide an overview of ethical challenges in machine learning, highlight guiding principles when implementing machine learning solutions, and present a case study on implementation.

Objectives

  • Introduce the ethical framework developed by USAID
  • Introduce contextually relevant examples on ethical challenges in ML with a focus on data outputs/outcomes

Sections

Solar Lighting Example slides (PDF)

Learning Objectives

  • Present a case study in which a government/non-governmental organization/social enterprise may want to incorporate machine learning.
  • Discuss pros and cons of implementing ML-based solutions.
  • Highlight important interactions between the ML implementer and an organization.
  • Raise examples of how implementing ML without paying attention to fairness could have negative effects.
  • Discover how implementation of algorithms is not objective.

Content

We will start by presenting a case study on the evolution of machine learning in the solar lighting industry.

Introduction

In this case study, we will be taking the role of a chief technology officer of a social enterprise that provides solar lighting products in East Africa. The mission of the company is to provide affordable lighting solutions to people living in poverty and the company started by providing high-quality, inexpensive solar lights that were replacements for kerosene lanterns.

Over time, the company has grown and increased its product offerings to include larger solar home systems, and along the way has implemented pay-as-you-go models, so that households can afford to purchase these larger systems. The way the pay-as-you-go model works is that the company provides the solar lighting infrastructure as a loaned asset to individuals and is paid back over time through mobile money payments until the full value of the asset is recovered.

The company has been meticulous about keeping records from transactions from their userbase and as a result, has access to both demographic information and payment history from its clients. The information from the users includes age, gender, occupation, location, and household income. As the company looks at expanding its social impact, they realize that this data can be analyzed to determine a credit worthiness metric for their customers. Additionally, they could provide this information to banks or microfinance institutions so that they can give loans to their client base.

In-house vs external implementation

Machine learning is a powerful tool that you can use to implement this credit scoring metric, however the company does not have data scientists or machine learning experts within their team that can implement this solution. They also do not know how accurate or powerful the algorithm that they develop could be, so do not want to spend the resources to build a full team and opt to run a small pilot with some users in Uganda. As a resourceful company with engineering staff, they could either train some of the engineers to implement a machine learning solution using off-the-shelf solutions to credit scoring or work with a third-party company to implement the solution.

To consider the pros and cons of each approach, it is important to consider the perspectives from both a machine learning implementer as well as the organizations to understand thoughts and complexities that go into developing a solution. Doing it in-house without a trained data scientist would likely involve the implementation of a black box solution. While someone with no background in machine learning could get a solution up and running fairly quickly, there are several nuances in design that may get overlooked. Allowing a third-party consultant to implement the solution would solve many of these issues, though the organization may lack the in-house capabilities both to understand how the model is being implemented and maintain it moving forward. Additionally, this approach may be expensive.

Ethical challenges in implementation

First, historical data that certain groups of people may have different default rates than others. For example, women may have a lower default rate than men and the organization will have to decide how to address the issue (they will likely want to be fair and gender-blind in loan determination). To implement fairness, a naïve implementer may first try to use fairness through unawareness, which means they simply hide gender information when building the models. Depending on correlations within the data and how relevant gender is to default rates, the models could still predict gender and use that in the model.

Second, since data shows a difference in default rates, someone has to actively decide how to correct for that. In the case of loans, different approaches to implement fairness may have a trade-off with the accuracy of the algorithms as well. Third, the type of algorithms the implementer uses could have trade-offs as well. Some algorithms may be faster at the cost of accuracy, others may be more accurate at the cost of explainability or understandability.

Important takeaway points

First, implementing a machine learning algorithm is not an objective process. In your implementation, you are both designing a technology and making decisions, both of which introduce your biases into the system. To think that outcomes from a computer are objective is a fantasy.

Second, open communication between you and the implementer on your values as an organization and decisions that they are making in implementation are critical.

Third, you need a way to audit your data and your algorithms if you want to have a fair system.

Challenges with scaling

Let’s move on assuming you were able to work with a consultant to build a satisfactory solution to your algorithm and you are able to demonstrate significant success with your pilot in western Uganda. You now want to scale your model to other parts of Uganda and East Africa. At this point, it is important to pay attention to the representativeness of your data. Are there large differences between the types of users you have in Western Uganda and Eastern Uganda? Are there large differences between the types of users you have in Uganda and Tanzania? You need to make sure that you are collecting representative data as you scale your solution which involves significant testing and auditing.

Additionally, you want to make sure that changes within your population do not suddenly affect your results. For example, if a kerosene tax were imposed by the government, would your model no longer be accurate? How could you build in support within your organization to make sure that you can react to change?

Discussion Questions

  • If you were to collect data on users when building a credit algorithm, which data would you collect and which data would you not collect?
  • Would you implement the algorithm in-house or outsource it to an implementing firm?
    • What are the pros and cons of each option?
  • What are some other ways that the organization could apply machine learning to their business?

References

Meet Pulse – BBOXX’s pioneering predictive management platform for distributed energy businesses.” Bboxx, 2019.

Popescu, Adam. “AI Helps Africa Bypass the Grid.” Bloomberg, 11 June 2018. 

Contributions

Content created and presented by Amit Gandhi (MIT).

USAID Appropriate Use Framework slides (PDF - 2.1MB)

Learning Objectives

  • Present the appropriate use framework developed by USAID.
  • Define and be able to apply concepts of relevance, representativeness, value, explainability, auditability, fairness, and accountability/responsibility.

Content

Relevance

  • Is the use of ML in this context solving an appropriate problem?

As ML becomes more of a trend, we see more and more organizations apply it to their work to distinguish themselves from competitors or increase their appeal to funders without understanding whether it is the right tool for the problem they are trying to solve.

Representativeness

  • Is the data used to train the ML models appropriately selected?

In order to evaluate representativeness, the organizations should consider if the ML model uses data representative of the context in which it will be deployed and which strategies are important for ensuring models can be trained with appropriate data.

Value

  • Does the machine learning model produce predictions that are more accurate than alternative methods? Does it explain variation more completely than alternative models?
  • Do the predicted values inform human decisions in a meaningful way? Are they actionable? Are they timely? Are they delivered to the right people?

Explainability

  • How effectively is the use of machine learning communicated?

It is important to ensure that the application is explained to end-users in a way that effectively communicates how the outcomes were determined. Organizations seeking to apply machine learning outcomes without understanding the nuances of how the models make decisions may use the algorithm outputs inappropriately.

Auditability

  • Can the model’s decision-making processes be queried or monitored by external actors?

Increasingly organizations are turning to “black box” machine learning solutions, whose inner-workings can range from unintuitive to incomprehensible. It is important that the outputs can be monitored externally to show that the model is fair, unbiased, and does not harm some users.

This may require additional infrastructure, whether it is an institutional or legal framework that requires audits, provides auditors with secure access to data and algorithms, and requires that people act on their findings.

Fairness

  • If used to guide decision-making, has the ML model been tested to determine whether it disproportionately benefits or harms some individuals or groups more than others?

Testing the results of algorithms against protected variables such as gender, race, age, or skin color is key to preventing the adoption of biased algorithms. Does a specific algorithm fail for specific groups of people more often than for other groups of people? Does it misclassify different groups in different directions—or do certain groups have different rates of false positives and false negatives?

It is also important to address the issue that accuracy and fairness are not necessarily correlated. Algorithms can be accurate technically, but still inconsistent with the values that organizations may want to promote when making decisions such as who should be hired, who should receive medical care, or others. Gaining an understanding of how these outcomes are derived and taking steps to mitigate them is an important piece in ensuring that unfair algorithms are not widely adopted and used.

Accountability/Responsibility

  • If used to guide decision-making, are there mechanisms in place to ensure that someone will be responsible for responding to feedback and redressing harms, if necessary?

For example, an algorithm might be used to assist in diagnosing medical conditions, but the final diagnosis should still be provided by a trained medical professional. When used by itself, false-identifications from the algorithm could cause harm to individuals. However, consider if there is a shortage of trained medical professionals. Does the risk of misdiagnosis outweigh the risk of not treating people? These decisions are complicated and there is not always a right answer. An accountable setup both makes sure that there are systems in place to prevent harmful errors and makes sure someone is responsible for and correcting errors.

Discussion Questions

  • What are examples of ML being applied to irrelevant problems in international development?
  • How would you address the issue of representativeness in your data?
  • When might it be more important to choose an algorithm that is more explainable at the cost of accuracy or speed?
  • How would you implement accountability and responsibility in an existing system?

References

Paul, Amy, Craig Jolley, and Aubra Anthony. “Reflecting the Past, Shaping the Future: Making AI Work for International Development.” USAID, Washington (2018).

Contributions

Content presented by Amit Gandhi (MIT).

The guiding principles presented in this section were developed by Amy Paul, Craig Jolley, and Aubra Anthony of the Strategy & Research team within the Center for Digital Development at USAID. See their USAID report (PDF - 4.7MB).