Omri Ben-Shahar, "Data Pollution"

Presented at the Legal Challenges of the Data Economy conference, March 22, 2019.


OMRI BEN-SHAHAR: Hi, again, everybody. Thank you for the introduction, and it's a pleasure to be on the panel. I look forward very much to hear your perspectives. And I want to share with you a project that I've been working on, a paper that I just recently wrote and have not yet published, so I'm very much in the course of thinking about it. And it tries to provide a somewhat different diagnosis of what is the problem with data protection that we need to a resolve?

I'll first put forward the main argument and then discuss a little bit about how I got to it. So the main argument is that-- can you hear me? Yeah. Digital data, as we know, runs the digital economy, creates enormous benefits and enormous new services, but also creates harms. But these harms, I think of them not as harms, not captured by the idea of privacy, but rather as public harms that affect public goods-- affects environments, rather than individuals.

I call that problem data pollution. Pollution is the idea that comes from the industrial era, that the production creates a lot of good products, but also has harms in the process of two environments. Privacy protection does not capture this problem because it focuses on harms only to individuals and something about our private sphere, rather than the harms that are outside our sphere-- a harm to others or to environments.

Data pollution, therefore, is the metaphor that I use, as I said, because it's similar to industrial pollution. And if I convinced you that the analogy makes sense, then it gives us a framework to think about solutions. And the solutions would be similar to the ones used in the industrial pollution era, which is environmental law. So I'd like to kind of begin develop an environmental law for the digital era.

And it has different-- sometimes strikingly different prescriptions-- relative to data protection. So the European GDPR addressed as a different problem. Maybe it's a good problem to address. Maybe it's one of the right problems to address, but it's not the problem that I identify, which is a problem to an environment, not to privacy.

OK. As I said, privacy is viewed to be the dominant-- if not the sole-- problem entering the era of data-- not the only problem of data, but data protection is about privacy, harms to the people whose data is taken, and shared, and used. Platforms collect the data and do things with it in a way that could harm the people whose data is taken. That's the concern with privacy law, generally put.

And if it is the problem, then it seems obvious. If we agree that this is the problem-- I don't, but if we agree that this is the problem, it naturally leads us to think about a solution, privacy protection. So for example, to mandate more user control. That's a big element in the GDPR, to give people more meaningful way to control how their data is used. Because it's a harm to them, they should control it.

Limit the use of data and of databases in a way that will not create privacy harms. And people have suggested compensate the users for the harm from the data security breaches. Data is leaked, abused, misused about them. They are harmed individually, so they should receive compensation. That, again, is a thought that the problem is, and the solutions in that area are mainly private.

But I'd like to suggest that there is another sphere. It's pretty obvious to everybody notices it, but often don't talk about it as a public problem, which is data's public harms, data's externalities, if you want to think about it in these terms-- harms to social environments, political environments, informational ecosystems, harms to private interests of other people. I give data about others, so I'm not hurt myself. I cause harms to others. And there is something that I call the precautionary insurance externality. We all pay together to reduce these harms, even though some of them are not related directly to us-- so to maintain a public sphere.

OK. Here's an example. This comes from an app-- a service-- that's called Strava. It's a social network for athletes. You run, or you bike, or you swim with your device, and it follows and shows what you've done through this heat map. And then you post that heat map in the social network. There are millions of people in that website. And you can see where the person is, which is something that people expose about themselves. But you can also begin to see where people generally are.

Look at the Grand Canyon in Arizona. Nobody's held-- you can see people are not doing anything. They are not there, other than on the paths, and the trails, and the area where people are. OK. That's an interesting information that we get about some public element of the region, but look at Kabul. Not many people in the city of Kabul are using Strava app, but a lot of American soldiers in that area right there.

Or look at Niger, the US drone base in Africa. It doesn't say. Something here is exposed. And when this was found out, it was realized that this has a public effect. People are sharing something in a way that is harmful to a public interest. Now, not everybody shares the public interest, but the US military, the US security is a public good. And sharing of data created harm that, when the press-- by the way, identified it. They called it privacy harm. There is another privacy crisis.

And I kind of thought to myself, what is privacy about this? This is public, not private. So that's an example. But the example that-- sorry-- drove me to think about this project was in the aftermath of the US presidential election in 2016, and the Facebook Cambridge Analytica scandal. Again, when that happened, everybody was saying, there is a privacy problem. And Congress is having hearings about privacy issue, all the data about people that made it possible to target political lies individually.

Is this really a privacy problem? I don't think so. The people who received these political lies and changed their voting patterns, they're not walking around feeling that they were injured. They're probably happy about it. The same thing happened with Brexit with that kind of targeting ads, and probably elsewhere. People that are affected are not to the extent that they even consider themselves as the victims.

That's not the problem. The problem is something much bigger that was corrupted. In the US and in Britain, it was the election, the integrity of the political process. That's one of the most important public goods. To call it a problem of privacy is to mischaracterize the problem. So if you think about some of the problems that the collection of personal data, the creation of databases, and of services based on them lead do, we think that it's not problems that regularly we would deal with through private law.

Contracts-- will people write contracts with the service providers, consumers, users, vis-a-vis service providers that protect the public sphere? No, for the same reason that most people don't take into account the energy efficiency of the product that they buy in the industrial era. The products pollute. People care about it. Some people are green. But in general, they don't behave optimally. That's why we need environmental law, and carbon taxes, and things like this to protect an environment. Contracts will not solve it.

Tort suits-- liability will not solve it because the injury is not to the plaintiff. In a tort suit, you come and you say, I am injured. But if it is the US presidential election system that is injured, tort law will not provide remedy for that. And disclosure, which is, again, letting people know what is going on, is also almost pathetically failing to make any difference in this case.

It is agonizing to me to see that one of the main issues in the GDPR was more or better disclosure to help people exercise more or better control. Even if that were the problem, it is a solution that is not a working. So what can we do? Well, first, start thinking about the real problem-- or start thinking about this as one of the main problems. I don't want to entirely set aside privacy because I don't have skin in this game. I don't know if this is a big problem or small problem.

It might be a huge problem, but here is a different big problem. And if we agree that this is one of the problems to deal with, the social harms from data, then we need regulation for social harms, not laws that treat private harms. We need to do something, and we can get some clues from environmental law.

You see, environmental law came about in the US primarily in the 1970s when it was becoming clear that tort law is failing. People were getting injured from environmental pollution. They were going to law to courts suing in the courts. And we can't compensate you because you cannot prove that your disease is related to the pollution in the factory here. Problem with causation.

When people are going now to courts and say, my data was leaked through a data security breach, and I am injured, you know what courts are telling them? Exactly the same. You cannot prove your injury. You can prove your injury, but you cannot prove that it was caused by this particular leak. Maybe it was caused by another leak. Maybe they got your information some other way.

When you read the cases in the US for these lawsuits, there is a striking parallel between the failure of suits in the industrial era and the suits in the data era-- the digital data era-- to compensate victims for their harms. That's why we created environmental law to solve the problem differently, and that is maybe some inspiration for how to deal with the public harms of data.

So I'll talk briefly about three approaches. This is very basic, very kind of preliminary. None of these ideas is worked out for implementation. It's just directions for a thinking. If people agree with the problem, then this is the direction where you want to look for solutions.

One type of solution is prohibitions, restrictions on what data can be collected so that it will not cause public harm. So this is similar to what is used under the GDPR, notions of data minimization-- don't collect too much-- data purpose limitation, restrict data transfers, data localization. These are ways that are done that could also affect the public harm. But the problem is that if you limit data collection in these strong, mandatory ways, you might reduce the negative externalities, but also the positive externalities, which could be-- and that's an empirical question-- orders of magnitude greater than the harms.

And so it would be a very risky way to regulate. Sometimes we say that it might harm innovation. We say that. We mean innovation that is good. Medical data can allow medical services that could treat people better, and there have been studies to show how the use of digital databases have led to enormous saving of life. The use of data in insurance leads to enormous saving of life, especially in auto insurance in the US.

So we don't want to wash that away just because there are also harms. We need a solution that is perhaps a little more refined to address exactly the specific harm. And environmental law has some clues on how to go about it, and maybe focus specifically about toxic data. I tried to develop this a little bit more in the article that I wrote. I don't have time to get into those, but I am largely worried about these kind of solutions.

So then I say why not do-- ask the following question. In environmental law, the most widely accepted solution both politically-- outside the US-- and theoretically among economists and others is carbon tax-- tax that is equal to the negative effect of pollution. Why not data tax? If data causes some harms and if companies and people give it away without thinking, companies collect as much as they can without thinking, make them think.

And so it's a solution. All right. It would be really difficult to think about the right data tax, and it took generations to begin to come to rough estimates of carbon tax-- the right carbon tax, but here is just a few simple initial steps. Start with a small tax, a penny tax. Web platforms collect everything. I loaded-- I no longer have a flashlight app, but remember the days we used to have a flashlight app on our phones? Why does the flashlight app need to know my contacts, my phone history, all my emails? It doesn't need it for functionality. It's just a light.

Why do Angry Birds need to know everything that I do in the-- very curious birds. That's fine. I can't tell them not to do it. Maybe they'll develop some kind of solution, but pay a penny. Pay something little to make the writers of the program think about it, whether this is data that they can create functional benefits with.

And then begin to increase that little marginal tax so that maybe bigger firms that have huge databases, where we are worried a little bit about some of the things that they do, pay a little bit more. That would be also good for competition. Tax that reflects the sensitivity of data-- so health, sexual behavior, things like this, maybe some platforms needed less than others would pay more.

There are decisions to be made that are very know have to be done prudently, but here is a platform-- or here is a framework-- to begin to take these things into account by taxing them and making the market take them into account, rather than entirely disregard them, as it does now. In a sense-- I have a couple more minutes? In a sense, data tax conflicts with data protection notions existing in the following way.

Under the data pollution paradigm, the data givers-- people who give their data-- are not the ones who need the protection. They are the ones we need protection against. They are polluting. They are using this-- they have to be taxed. They don't directly have to be taxed, but the platforms that collect have to be taxed. It doesn't really matter who pays the tax for the data transfer. One way or another it reflects on what the cost for the users.

People are sharing too easily too much data, and a small tax might make them think twice about using data as the currency to pay for services. It also contrasts with this notions that have been developed recently of pay for data. Companies should pay people for the data that they take. Who owns the data? Who should benefit from the data? Pay for data. Paradigm says that the users should get some of the benefits.

In that sense, it doesn't solve any problem. It's a zero sum transfer between companies to their users. They are not going to take into account the effect on third parties, on others, and so it does not reduce-- pay for data does not reduce the underlying activity and does not encourage pollution reduction investment. In that sense, it solves a different problem, not the one I am interested in.

Finally, a third solution from environmental law is liability for spills. When the BP oil spill occurred BP paid. They're still paying. Probably it's already got to something like $40 billion for liability, some of it to individuals. But it wasn't through tort law. They established a fund and allocated money to people according to some rules because tort law would have been entirely messy to solve the problem.

30 years ago when the Exxon Valdez poured oil in Alaska, it took years for them. In the end they paid a billion, a billion and a half. It was very hard to prove damages, and only some of the harm, the livelihood of the fishermen, was compensated, but not the environmental harm. We have the same problem when data is being spilled. In the US a year ago, a year and a half ago, one of the credit scoring agencies, Equifax, lost-- or got hacked, and the most sensitive personal financial data of 130 million Americans went through the dark web.

There were lawsuits. People cannot get recovery from lawsuits because, again, they cannot prove that their harm occurred-- besides, the harm will occur years from now. There is this latency of the harm. So what about liability that comes from public law, where you pay not for the actual harm you caused, but for the exposure you created. Paying liability for exposure is entirely foreign to private law.

That's what environmental law did for environmental harms. That's why we have sometimes criminal laws and other kinds of administrative sanctions for creating harmful exposure. And so I would like to suggest that, in the aftermath of leaks, we would have a rule in which the-- use estimates about average likelihood of causing harms.

The US Consumer Protection Agency, the Federal Trade Commission, has tables that show how likely is each person to be targeted? It's a small likelihood. What would be the magnitude of loss that they would suffer? It's not that big big. But once you factor these elements and multiply it by 130 million, Equifax will have to pay a lot of money. And that would be much more effective in deterring and also accurately reflecting the social harm than the schemes we have.

So in conclusion, I think digital data law and protection should not be only about privacy. It should also be about the problem of data pollution. The problem exists. It is not addressed by current regulations. And therefore, it is I think-- oops. Well, I finish there. So thank you.


Big data