Episode 27: Ethics in Data Science

Data Science and the Data Profession, generally, is deeply lacking in a fundamental resource. This oversight is holding the profession back from becoming mature and trusted! What is this important aspect? Ethics!

Even though the profession should have taken more seriously the application of ethics to customer data, algorithms, and the rest before now, we need to apply ethical principles to everything we do immediately.

This episode introduces the notion of Ethics, its application to the data profession, a couple case studies, and then a challenge for the listeners! Be sure to comment your thoughts about the brain teaser in Act 3 in the comment sections below!

To keep up with the podcast be sure to visit our website at datacouture.org, follow us on twitter @datacouturepod, and on instagram @datacouturepodcast. And, if you’d like to help support future episodes, then consider becoming a patron at patreon.com/datacouture!

Music for the show: Foolish Game / God Don’t Work On Commission by spinmeister (c) copyright 2014 Licensed under a Creative Commons Attribution (3.0) license. http://dig.ccmixter.org/files/spinmeister/46822 Ft: Snowflake

Show Notes:

Welcome to data tour the podcast about data culture at work at home and on the go. I’m your host, Jordan Bohall. To stay up to date with everything data controller, be sure to like and subscribe down below. Furthermore, be sure to follow us around the internet at data to her pod on twitter at data couture podcast on Instagram and at data couture pod on Facebook. Of course, if you’d like to help keep the show going then consider becoming a patron at patreon. com forward slash data couture.
Now, now on to the show

Welcome to Data Couture. I’m your host Jordan and on today’s episode, we’re going to be talking about ethics and data science. Now before we get to that, there’s one thing I have to mention. Namely, I am attempting to get data tour podcast to be the podcast of the month over a wonderful site called podcast land in podcast land is a fantastic resource for discovering new podcasts, seeing what everybody else is listening to finding podcasts and a specific genre that you’re looking for. Of course, they’re not sponsoring this episode in any way. But nevertheless, in order to get to that podcast of the month, I need all of my listeners to go over there and vote for the podcast.

Now I’ve made this easy for everybody. And I’ve created a link data tour.org forward slash podcast land, and that will take you directly to the right page on the podcast land website to vote for the podcast. So I implore all of you to do that. I will really appreciate it and it will help me get my podcast in front of more people around the world. And once again, that link is dedicated tour.org forward slash podcast land.

Now, let’s get to that ethics and data science. We’re going to be talking in the first piece as usual about what is ethics, especially what is ethics and data science, to we’ll be talking about some case studies and industry. And three will be talking about how this plays out in my day to day life. So let’s get to it.
Welcome to the first x and this part, we’re going to be talking about what is ethics. So for any of my listeners out there, raise your hand if you know what ethics is. I’m just kidding. Put your hand back on the steering wheel. You shouldn’t talk and driver I guess lift your hand up and drive.
ethics, I mean this and philosophical standpoint.

And so one way to think about ethics are as principles that govern one’s behavior or otherwise the conducting of an activity. Similarly, you can think of ethics when it comes to the field of study. As that branch of philosophy that involves systematized defending and recommending concepts of right and wrong conduct, the field of ethics along with the statics concerns, matters of value, and us comprises the branch of philosophy called x theology. Now, when I was going to grad school for philosophy, I never called an X geology. But there you go. At the end of the day, we can think of ethics is just a way for us to understand how we should conduct ourselves as humans. Of course, there have been lots of very famous ethicists philosophers of ethics throughout the ages. For example, every

And all these people, all the way up until modern times have been deeply concerned about the human condition. For example, Plato was all about something called virtue ethics name. And what that means is that we should align our virtues and again, this is a very quick characterization of Aristotelian ethics. But we should align ourselves such that our virtues are in the mania at something called the doctrine of the mean, where, for example, if you have to carry characteristics or two ways of being, say fear and cowardice, one should push for being right in the middle of that particular spectrum. And of course, there are quite a few others.

When it comes to ethics and data science, we get away from the rigorous what we call normative ethical principles. And instead we look at something called applied ethics. And so what I just spoke about ethics proper, those are normative theories that tell you the way you ought to do something. And applied ethics is as the name implies, applying whichever philosophical, ethical, ethical theory you particularly you that you are particularly engaged with, to a particular problem in the world. Now, I’m not going to take sides on which particular philosophical doctrine that I purport to. However, we will see the way that’s the field itself the field of data science, the data profession, as a whole is taking up this particular issue, namely, creating ethical algorithms and machine learning, predictive analytics and artificial intelligence or handling our customers and members data in an ethical way.

Of course, there are various other ways that we can play ethics to the data professions. But at the end of the day, the question comes down to how can we create machine learning algorithms that aren’t biased that they don’t unfairly focus on one sort of customer segmentation wall, avoiding a complete other set of the customer, the customer segmentation. So for example, if you create a predictive algorithm in order to determine who would be a best fit for a particular mortgage, and you take into account more than just their credit score, but you take into account all sorts of variables about who that person is, and then you put it into production, and you find out Oh, this algorithms doing great, it’s predicting the sort of people that we want to lend to with very, very high accuracy and precision and repeatability and all these sorts of wonderful things.

It’s giving us this wonderful Lyft and our actual approval rates, but then you come to find out Oh, there’s this whole group of people that it’s completely avoiding and we’re denying loans, when had that algorithm been tuned to not be biased against that certain segment of the population, they would not be discriminated her is declared a word. I don’t know. The point is they wouldn’t have been removed from the candidacy, the pool of people who would have otherwise been offered mortgages. So that’s a that’s a very common question in the ethics of data science at the moment.

Another one is how we treat our member or customer data. And this is wrought with all sorts of pitfalls. Namely, questions arise, like, what does it mean for a member to be able to delete all of their data that we have on them, and to give you a bit of insight why that’s problematic is because while we might have a central data warehouse or database or financial core that has all their information in it,

Well, chances are there are programs that are plugged into those systems. Third Party offered systems that then contain the data. So you say, Okay, well that’s that’s like the second step, we go through our main data warehouse. And then we go out to all the third party systems that contain that data. And we can just delete all of the members data if they so choose. Well, it gets trickier yet, depending on how contracts were written for how those third party vendors and those systems connect up to our data. Well, those particular offers those companies that have given us these third party solutions, they may be sharing data as well. And that’s when it becomes very much a black box for whether or not someone can truly delete all their data, where they to ask that of a particular company. And so to bring it back to the ethical piece, how do we treat our members data so that they can own it? It is their data. It is about them.

How can we make sure that if we can’t really delete all the data they are at least fully aware of, and they go into the relationship with us knowing that this might be the case, you can think of that. In the case of Facebook, they offer an option now where you can pull off all of your data, you can delete it. But I guarantee even though Facebook does a lot of in house development themselves, they’re using third party vendors. And so you see how this trickles out. It doesn’t matter which industry you’re talking about. If you give your data to somebody, chances are it’s going to be very difficult to truly remove your data from them. So as a data profession, as a member in that particular area, we need to be very conscientious about how we treat this, these pieces of data that are so valuable and so necessary for how we run our companies, while also being very fair to our customers or members.

Now as we move to The second act will talk about another couple of examples where this plays out, and how some have tried to solve it. Stay tuned.

Know, you might have noticed that I’ve been a bit cagey talking about ethics and data science. But at the end of the day, ethics is such a large and deep and historical field that to truly get to anything meaningful and 20 minutes would be a fool’s errand. And so what I’m trying to do in this episode, especially in the second act is to give you some examples where these sorts of problems crop up. And even though you might not have all the tools because of this episode, I highly encourage you to go on the internet.

There’s so many people currently writing for the data profession for specifically ethics and data science, that you’ll find some excellent resources to get your feet wet in this very interesting topic. Now, I’m going to speak about two particular case studies for this part of the show. The first is called compass, the problem of compass and the second one is called the problem of smart meters. So unless you’re a judge or in the legal system in some capacity, I don’t expect you to know a compass is our first case study. But compass is a way that judges rely on algorithms to assess whether defendants awaiting trial or likely to reoffend compass is one of the systems which has been developed. And it’s used in handful of states.

And what Congress does is it looks at 100 different variables and when we train algorithms, we call it Each column, a variable, and a variable can be something like age or their sex or their criminal history in this case, the point is, for this compass system, defendants answer all the questions that they can then fill out these various variables. And the compass system outputs a risk score. And the risk score ranges from zero to 10. scores from say five to seven or representation of a medium likelihood of real offense eight to 10 indicates a high likelihood of somebody re-offending. And so where are the ethical challenges in this? Well, there are three of them that kind of hit you right on the nose, right? The first is unfair discrimination, the second reinforcing our human biases. Third is a lack of transparency.

And so for the first one unfair discrimination due to compass, it’s just like what I said End the mortgage case example from the intro to the show or the first part of the show. And what this means is that because all humans are just as likely to commit various types of crimes, there’s no data to show that any certain gender any certain race is more likely than another to commit any crime, right? We’re all equal, we’re on the same playing field. However, because of what’s happened in the past, namely, charges having racial biases, the algorithms get trained up with the same biases. So what happens? Well, people with certain racial profiles then get unfairly bias as being highly likely to, once again offend. Right. And so the challenge for ethics and data science is to wipe out this unfair discrimination.

We need to have a strong code of ethics so that we don’t Do this to our citizens to anybody going through the, the the judicial system. Now the second ethical problem is that of reinforcing human biases. So for those of you who are data scientists are in the data profession, you’re well aware that it’s very, very difficult to code out any sort of bias of the person coding the algorithm.

However, what happens is because people have internal biases, I have internal biases, you have internal biases. If you don’t think you do, you might need to check yourself because we all do as much as we try not to. The fact of the matter is, we all have biases. And as people as we are working on ourselves to become better humans, it’s important that we notice those biases and work to get rid of them. Now in the case of machine learning algorithms in the case of this classification,

Type algorithm whether or not somebody’s going to be more or less likely to offend again. Well, the people writing the algorithm in the case of compass and of course, I don’t mean to call it compass specifically, there are quite a few of these that have been built. But the data scientists or team of data professionals who built these algorithms, they all have their own biases. And that gets written into the algorithm itself, which means that people who fall prey to the bias of the particular person writing the algorithm, surprise, surprise is going to be falling prey to the bias of the algorithm belt. So for example, if somebody felt that a certain race was more likely to kill once again, compared to some other race or compared to other races, then that algorithm is going to be built because the data in there affirms or confirms that particular person’s biases.

And so that person writing the code will say, Oh yeah, this is super accurate. Look at Totally pretends or completely applies to the cases that I care about. The third ethical dilemma and this particular case of compass is that of lack of transparency. So I asked once again, my data scientists in the crowd, raise your hand if you know exactly how your model works. Oh wait.

Chances are, you don’t know how your neural network works. If that’s what you’re using, you probably don’t even know the complete inner workings of your decision trees or your Canon modeling or your K means modeling or any of the other wonderful algorithms that we regularly employ. What does this mean? This means that there’s no transparency for the lawyer for the judge for the defendant, for anybody involved, to truly understand how this risk rating comes up with the solution that they in fact, are going to offend the
Again. And if that’s the case, then how do you argue against it? If it’s widely use, how do you prove that you’re not going to offend? Again, if the algorithm says we’re going to because of these hundred questions that you fill it out. Now, luckily, judges don’t solely rely on these algorithms, they take it as a piece of evidence along with a bunch of others that come out in the case for the defendant.

So for now, we’re sitting on somewhat shaky ground. But nevertheless, if the algorithm says something, but all evidence points against that, we hope that the logical reason that productive capabilities of our judges will prevail over it does not take away the impending problem that these sorts of systems imply. And so to be good data scientists, we have to get clear about the ethics underlying our data science practices. Now, let’s talk about the second case study.

That of smart meters, who have you have smart meters on your homes? I don’t yet because I live in the boonies more or less. But smart meters are an advanced in technology that has triggered a shift to having a certain kind of meter for the power that comes to your home. And that’s because power generation has moved from just a couple very large scale power plants placed in certain places around America or the world. For my international listeners to a bunch of small scale resources. Think of all those windmills that you see when you’re driving down the highway and the smart meters guarantee secure operation of these small grids. Now, to be more specific, system operators can gain network transparency and consumers can visualize and optimize their electricity consumption. To me that’s amazing.

Because I love to measure everything in my life. And I would love to know a deep, deep dive into my power consumption. And so the smart meters have quite a few benefits. But of course, they come with ethical dilemmas. One of them, I’ll talk about two, or three, actually privacy, lack of transparency. And finally, consent and power. So for privacy, maybe you don’t think that electricity consumption is sensitive data, but guarantee that I could analyze your electricity data and I could tell when you’re not home. What does that mean home? If I’m a bad guy, which To be clear, not, but for bad guys, they could know that.

You’re not home. Oh, it’s a great time to go and maybe take some of the electronics off your wall, for example. Right. That’s a problem. Secondly, is the lack of transparency and Once again, now the lack of transparency works a little differently in this case, and that is how we tax and how we build cost models around the sort of electricity that we use. And so in the case of smart meters, it turns out that the more granular the data that is, the more specific the more fine grained how deeply we track our energy usage results in cheaper energy rates typically. And so if you have a very, very, very fine grained set of data, chances are your electricity company is going to charge you less than so your neighbors who have a different smart meter say that doesn’t quite go down into that level of detail. Now, how is that a problem?

Well, it’s it’s an ethical dilemma about power, and about unfairness. So for example, it’s very likely that if you happen to be living, say patient
To paycheck, you will perhaps be open to giving your electrical company that super fine grained data or at least trading, having them have the access to fine grained data about you. Whereas if you didn’t have these sorts of financial problems or financial situations, then you might be fine with the less granular set of data coming from your smart meter, even though you’d be paying slightly more. And so this is almost a forced unbalanced of power. How do we treat the people who are the least well off in the same way as we treat those who are in better positions economically in this case? And so we need to figure out how can we provide the same service to everyone across the country without
requiring having all of this very precious data from them?

So the final piece, the final ethical dilemma is consent and power. So Billy
off of this lack of transparency people peace, namely, having a cheaper tax requires having more and more data granularity. However, our customers need to understand and be able to access the data in the first place before giving consent for sharing it. This creates a potentially unfair power dynamic between the energy companies and their customers. And this is an ethical dilemma because when somebody has more power over you, they can make you do things, for example, or they can jack up rates whenever they want to, they can do all sorts of things that are unethical that are unjustifiable, just because say they want to make a dime. So so far, we’ve looked at two case studies, namely that of compass and smart meters, and how ethics and the data profession applies to them.

Welcome to the third and final act of data couture. Now the way I’m going to operate for this final piece is to give a bit of a puzzler something for you to work out. And what I want you to do is once you’ve come to a conclusion, comment below the episode and tell me how you would solve it. Got it? Okay, here’s the problem. Consider the following situation, which is very common in any company that is trying to do highly targeted marketing. What do I mean by highly targeted marketing? I mean, the sort of marketing where it’s almost down to the individual person because you know so much about their particular preferences that you can give them exactly the right offer at exactly the right time when they’re looking to buy a particular product. Now, how do we do that?

Well, one way to approach that is to write out A machine learning model that clusters, various customers into various segments. And these segments, we can then attach lots of bits of data that then give a very robust picture of who those people are. So for example, out of a customer set of, say 100,000 people, well, my, my clustering algorithm, it identifies that oh, here’s a group of people that all seem to like high end sports cars, they enjoy the outdoors, they are technically savvy, etc. Now, when you apply all this other data to them and actually start looking at some descriptive pieces, looking at their social media feeds, you can more or less to the TC exactly, for example, who I am as a person and then be able to shoot them alone save an auto loan, when they start looking at a bunch of cars.

Now, it might be difficult in my case, because I’m constantly looking at cars. And so maybe the people trying to send a auto line to me might miss the mark. Regardless, the ethical question here, the one that I want you to think about an answer in the comments is, how can we treat our customers ethically treat their data ethically, while also performing these various machine learning and marketing practices? That’s for you to decide. I’m not going to tell you the answer. So, again, Leave your answer in the comments and I will be happy to engage. See you next time.

That’s it for the show. Thank you for listening. And if you liked what you’ve heard, then consider leaving a comment or like down below. To stay up to date on everything data couture, be sure to follow us on Twitter at data couture pod to consider becoming a patron@patreon.com forward slash data couture.
Music for the podcast. It’s called foolish game. God don’t work on commission by the artist spin Meister used under the Creative Commons Attribution 3.0 license, writing, editing and production of the podcast is by your host, Jordan Bohall.

Liked it? Take a second to support Data Couture on Patreon!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.