Jose 𝒥𝒪 Reyes

Machine Learning ★ AWS Community Builder ★ Master of Data Science student @ UNSW ★ Author, fullstackdeveloper.tips ★ Connect with Me!

Overview

Introduction
Week 1: Code Completion Tooling
Week 2: Surveillance technology to spy on workers?
Week 3: PredPol: predictive analytics to forecast crime
Week 4: Do data practitioners need professional standards?
Week 5: Criminalise the re-identification of de-identified data
Week 6: What three (3) changes can you make in your own data or analytics practice to become an agent of positive change?

Introduction

In the next 6 weeks, I will be writing about relevant topics pertaining to ethics in data. Being a software developer, AI and ML practitioner, I need to exert more effort to understand the ethical implications of the industry that I belong to and the work that I do.

Week 1: Code Completion Tooling

In June 2021, GitHub released Copilot, a code completion tool that uses machine learning to write code for you. Marketed as an AI Pair Programmer, it’s intended to help developers write code faster. Copilot has been trained on billions of lines of code from various sources, including public Open Source projects in GitHub and other public repositories.

Nearly a year to the date, Amazon released a preview of Amazon Code Whisperer, a similar tool to Copilot.

As a software developer, I am excited about the possibilities of these tools. I am always on the lookout for tools that can help me write better code. As awesome as these tools are, I am also concerned about their ethical implications. Being trained on Open Source code, what will this mean to the source code that it has generated?

From the point of view of GitHub and Amazon, it makes great business sense, as it uses open source code freely available on the web. It’s akin to a Mining company that mines and exploits resources for free, then selling it for a huge profit.

I am almost certain that the original authors did not intend their code to be used in this manner. There is a class action filed against Copilot on behalf of the millions of GitHub users whose code was used to train the tool. The outcome of this litigation could have significant implications for the these products, the millions of developers already using them, and the source code authors of code used to train these products.

Week 2: Surveillance technology to spy on workers?

Let’s start this second week reflection by watching this short video. It’s about a company called Humanyze, where they use digital badges to track the movements of employees in the workplace. Calling their technology People Analytics, they claim that it can help companies improve productivity and employee engagement. The device hears and knows everything you are doing, for every second one spends in the office.

It is an older video, however this technology is still being used by thousands of organisations today. It begs the question, is this ethical? There are few ways to dissect this question, but in this instance let’s use a simple framework to help us come up with an answer.

The framework to help us answer this ethical dilemma is called Deontological Framework. It is actually quite straightforward to apply as it only requires people to follow the rules and simply do their duty. With this framework, we don’t even have to think about the consequences of our decisions.

For example, in this scenario, as long as the company can prove that they have consent of the workers using the device, and not against anything illegal, then it is ethical. Humanyze claims that all the data collected are not listened for content, but rather only for looking in patterns of interaction, and no identifiable information is collected at all. As to the organisations using these badges in their offices, they come from a good place, of not putting their employees under surveillance, but rather helping them enjoy their work more.

All the data collected is 100% anonymous. It helps identify and diagnose issues in the workplace, those that can affect performance and employee engagement. It can then quantify all the costs, opportunities and risks, so that the company can understand the impact of the changes and their decisions.

Thousands of organisations have reported significant increases in productivity and employee retention across the board, so they must be doing something right.

What do you think, do you think it is ethical?

Week 3: PredPol: predictive analytics to forecast crime

In this third week, we will be looking at AI and ML away from the context of the tech industry. Let’s see how AI and ML are being used in the public sector, specifically in the field of policing. In LA, the police department has been using a predictive policing tool called PredPol. The tool uses machine learning to predict where and when crimes are likely to occur. This video explains how it works.

On the surface, it looks like the is a very good practical application of AI and ML. It helps the police department allocate their resources more efficiently, and in turn, help reduce crime. However, the reality is the situation is much more complex that that, as it involves several stakeholders and we need to be mindful of their rights and interests.

The City of LA is a vibrant multicultural city, however there is a large population of homeless as well as low-income residents. LAPD has been using PredPol daily to help them effectively allocate their resources. PredPol has been used to predict the likely location where crimes will occur, and the police department has been using this information to allocate their resources to areas where they are most needed.

However, a group of citizen activists have been protesting against it’s use, as they have identified that there is a tendency to misuse this information and target the low income and homeless. People like Anthony, an ex-offender, is complaining that he is being unfairly targeted by the police. Even though he has done his time, and is now a productive member of the community, the algorithms are still flagging him as a potential criminal and still being added to the crime watch bulletin. However, we also need to remember why this was done in the first place, there are criminals all over the city that the police department needs to keep an eye on.

In the context of Care Ethics, we need to practice our moral imagination and put ourselves in the shoes of the care givers (LAPD, PredPol) and the care receivers (the homeless, low income residents, Anthony). The care givers I’m sure are coming from a good place of wanting to help the community and make it a safe place for everyone, and the care receivers simply want to live a normal life, of course we can’t deny that many criminals roam the city and its a balancing act to keep everyone safe. But having said that, there needs to be a way to police the police, to ensure that they don’t overstep their authority, and this is where the citizen activists come in, to ensure that the police are not abusing their power.

Week 4: Do data practitioners need professional standards?

Being part of the tech industry, I have seen the benefits that data can contribute to the advancement of society. I also understand that this industry changes so fast, and that it is difficult to keep up with the latest developments. So I understand the reluctance of many in my industry to have a program for formal professional registration like what you see in industries like medicine and engineering.

Having said that, there are standards like data stewardship and governance, which are typically vendor agnostic, so they are somewhat insulated from the rapid changes in the tech industry. Having a national program around this capability will ensure that there are standards and the people registered professionally in these programs are up to date with the latest developments and best practices. This also means that they are accountable to the industry and the public, and will be held to a higher standard of ethics and professionalism.

Do data practitioners need professional standards?

In a past project we needed to clone data in a production database for testing purposes, there wasn’t really a violation of the Ethics for Data Projects principles, however, the situation would have benefited from more rigour in the process. After the cloning was done and the test environment was stood up, there were no checks from the customer to ensure that everything was in order.

However reflecting back on Principle number 4 - “Practice responsible transparency as the default where possible, throughout the entire data life cycle.”, we could have been more transparent and socialised the process and the effort we went through to make sure that the data was anonymised and de-identified. This would have helped the customer to understand the effort we took to ensure that the data was safe and secure.

Week 5: Criminalise the re-identification of de-identified data

Now that this new law to criminalise certain types of re-identification of data has been passed, how will this affect you in your professional or personal context? This is relation to an incident in this Department of Health case where de-identified Medicare data was relatively easily re-identified.

Working in software development, I’m used to used to working with application requirements specifications as a means to communicate the requirements of a project to the development team. I’ve had experience on both sides of the fence, as a developer and as a customer working with a sub-contractor to develop a software solution.

As a software developer, I will have to get used to seeing these requirements relating to this bill, and will need to take into account the extra effort that will be required to ensure that if we have de-identified data, we will need to ensure that it cannot re-identified. And my clients will rest assured that it will now be a criminal offence for a third party to re-identify and transmit this data. Because this is now criminalised, then malicious actors will be discouraged to re-identify data or risk hefty fines or even jail time.

Week 6: What three (3) changes can you make in your own data or analytics practice to become an agent of positive change?

There are several things that I can do to become an agent of positive change, even if I am not in a leadership position. People in a similar position as myself often underestimate our influential power. In these instances, you don’t really need to be a manager or in a leadership position to affect positive change into your team or even the company you are part of.

I can start with ethical awareness activities such as when building case studies or proof of concepts as is common in my consulting practice, I could make ethical and regulatory data concepts more prominent. I did several of such activities in the past, without any regard to these topics. I have the power to shape the environment I live in and the world around me.

Continuing with the educational endeavour, and working with clients and building the projects for them, I have noticed that my clients may need some hand holding and guidance on the way forward. Ethical intelligence is not often a strong characteristic for companies and they will need someone to help them understand their responsibility for building a better world, together with their aspirations of increasing profits.

Lastly, if I want to impart more positive impact to my company and to the world around me, I may need to push myself and go for a higher level position than where I am right now, one that will have more leadership opportunities. Becoming a great and ethically-aware leader will undoubtedly help others rise and the company and the community around me will positively benefit from it.

2023 6
2022 7
2021 9
2020 6
2019 11

2023

How to Build, Train and Deploy Your Own Recommender System – Part 2

7 minute read

We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We then deploy the model to production in AWS.

How to Build, Train and Deploy Your Own Recommender System – Part 1

12 minute read

We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We put it all together with Metaflow and used Comet...

Build Recommender Systems the Easy Way in AWS

15 minute read

Building and maintaining a recommender system that is tuned to your business’ products or services can take great effort. The good news is that AWS can do th...

Ethics in Data, Weekly Reflections

9 minute read

Provided in 6 weekly installments, we will cover current and relevant topics relating to ethics in data

Accelerate ML Application Development in AWS

9 minute read

Get your ML application to production quicker with Amazon Rekognition and AWS Amplify

Remember the last time you created an Entity Relationship diagram? I can’t.

3 minute read

(Re)Learning how to create conceptual models when building software

2022

Going to Production with Github Actions, Metaflow and AWS SageMaker

5 minute read

A scalable (and cost-effective) strategy to transition your Machine Learning project from prototype to production

Small to Reasonable Scale MLOps

4 minute read

An Approach to Effective and Scalable MLOps when you’re not a Giant like Google

AWS Summit 2022 Australia and New Zealand

4 minute read

Day 2 summary - AI/ML edition

AWS Summit 2022 Australia and New Zealand

4 minute read

Day 1 summary - AI/ML edition

Micro-frontends building blocks: Webpack Module Federation

4 minute read

What is Module Federation and why it’s perfect for building your Micro-frontend project

Micro-frontends building blocks: Monorepos

4 minute read

What you always wanted to know about Monorepos but were too afraid to ask

Data Science Bootcamp - MLOps on the cheap!

4 minute read

Using Github Actions as a practical (and Free*) MLOps Workflow tool for your Data Pipeline. This completes the Data Science Bootcamp Series

2021

Data Science Bootcamp - Week 10

7 minute read

Final week of the General Assembly Data Science bootcamp, and the Capstone Project has been completed!

Data Science Bootcamp - Week 5 & 6

5 minute read

Fifth and Sixth week, and we are now working with Machine Learning algorithms and a Capstone Project update

Data Science Bootcamp - Week 4

3 minute read

Fourth week into the GA Data Science bootcamp, and we find out why we have to do data visualizations at all

Data Science Bootcamp - Week 3

4 minute read

On the third week of the GA Data Science bootcamp, we explore ideas for the Capstone Project

Data Science Bootcamp - Week 2

3 minute read

We explore Exploratory Data Analysis in Pandas and start thinking about the course Capstone Project

Data Science Bootcamp - Week 1

3 minute read

Follow along as I go through General Assembly’s 10-week Data Science Bootcamp

Updating React Context does not update my component

4 minute read

Updating Context will re-render context consumers, only in this example, it doesn’t

Pre-render strategies in NextJS

8 minute read

Static Site Generation, Server Side Render or Client Side Render, what’s the difference?

Penny Pinching using the Jamstack Architecture

4 minute read

How to ace your Core Web Vitals without breaking the bank, hint, its FREE! With Netlify, Github and GatsbyJS.

2020

DynamoDB and Single-Table Design

9 minute read

Follow along as I implement DynamoDB Single-Table Design - find out the tools and methods I use to make the process easier, and finally the light-bulb moment...

Debunking 5 common misconceptions about DynamoDB

7 minute read

Use DynamoDB as it was intended, now!

Simple GraphQL consumer with Apollo Client

5 minute read

A GraphQL web client in ReactJS and Apollo

6 Steps to your first GraphQL server

6 minute read

From source to cloud using Serverless and Github Actions

Top 7 reasons why GraphQL is better than REST

7 minute read

How GraphQL promotes thoughtful software development practices

Managing React application state shouldn’t be rocket science

6 minute read

Why you might not need external state management libraries anymore

Ethics in Data, Weekly Reflections

Jose 𝒥𝒪 Reyes

Introduction

Week 1: Code Completion Tooling

Week 2: Surveillance technology to spy on workers?

Week 3: PredPol: predictive analytics to forecast crime

Week 4: Do data practitioners need professional standards?

Week 5: Criminalise the re-identification of de-identified data

Week 6: What three (3) changes can you make in your own data or analytics practice to become an agent of positive change?

2023

2022

2021

2020

2019