Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 4.64 KB

DataScienceEthics.md

File metadata and controls

36 lines (21 loc) · 4.64 KB

Data Science Ethics

Why are data science ethics important?

Digital advances are producing huge amounts and new forms of data, allowing computers to more quickly process this data and make decisions without human oversight. This creates new opportunties and many new challenges we have not had to consider before.

There are laws that set out important priciples on how you can use data. Those working with data should be aware of these laws and always act within them.

Public attitudes to data are changing. Working with data in a way which makes the public feel uneasy, without adequate transparency or engagement could put your project at risk and also jeorpadise other projects. Consideration of public attitudes and communication is key.

Some Key Principles

  1. Start with clear user need and public benefit - Data science offers huge opportunities to create evidence for policy making and also make quicker and more accurate decisions. Being clear about the benefit you seek to achieve will help you justify the sensitivity of the data and the methods you want to use. Creating a use case means that you can translate why better understanding will have benefits for individuals. Creating a use case will help you to:

    • Consider what risks it justifies and therefore what data and method you should use; risk to privacy, risk of making mistakes and negative unintended consequences.
    • Start to think about what decisions might be taken as a result of the insight.
  2. Use data tools which have minimum intrusion necessary - Use minimum data necessary. Sometimes you will need to use sensitive personal data. You can take steps to safeguard people’s privacy e.g de-identifying or aggregating data to higher levels or using synthetic data. Some data science projects have direct and tangible benefits to individuals & some will improve policymakers understanding so that they can develop better policy. Ways to do this include.

    • Only use personal data if similar insight or statistical benefit cannot be achieved using non-personal data
    • De-identify individuals or aggregate to higher geographical levels where possible
    • Use synthetic data to get results
    • Query against datasets through APIs rather than having access to the whole data set

    You must take reasonable steps to ensure that individuals will not be identifiable when you link data or combine it with other data in the public domain. The increasing number of data sets available now or in the future means that it might be easier to link to other open data sources to infer to an individual's identity or personal information about them.

  3. Create robust data science models - Good machine learning models can analyse far larger amounts of data far more quickly and accurately than traditional methods. We should think through the quality and representativeness of the data, flag if algorithms are using protected characteristics e.g ethnicity to make decisions, think through unintended consequences. Complex decisions may well need the wider knowledge of policy or operational experts. Algorithms learn from large amounts of historical data to make decisions. However the quality of this data can affect algorithms and reinforce bias. Use techniques to spot the bias and then code in affirmative action to remoe bias.

  4. Be alert to public perceptions - The law tells us what we can do but ethics tell us what we should do. Ethics become more important when advances in technology are pushing our understanding of the law to its limits. The law and ethical practice enables us to understand public opinion so we can work out what we should do. It is vital to understand both stated and revealed public opinion about how people would actually want the data you hold them to be used. Consult with others to work out if projects are acceptable.

  5. Be as open and accountable as possible - Being open allows us to talk about the benefit of data science. Let people know about the social benefit of your work and the impact it has had on collective or individual social or financial outcomes. Aim to be open about the tools, data and algorithms (unless doing so would jeopardise the aim e.g fraud). Make sure there is oversight and acccountability throughout the project.

  6. Keep data secure - The public are justifiably concerned about their data being lost or stolen and you have a responsibility to protect both personal and non-personal classified data. It's vital that we keep it secure. The law (e.g the Data Protection Act provides the basis on how the data should be collected, shared, processed and deleted.