Closing Keynote: Jean-Gabriel Ganascia, "Fairness, Justice and Data"

Presented at the Legal Challenges of the Data Economy conference, March 22, 2019.


JEAN-GABRIEL GANASCIA: I fear that I will have far more questions than conclusions. Oh first thing, there was a small mistake because it was said that I was provost at the University Pierre and Marie Curie, but this university doesn't exist anymore. It's merged with Paris-Sorbonne University and now it's called Sorbonne University but OK it doesn't change anything.

I didn't give this title. So I will try to speak about fairness, justice, and that. OK, I think it's very interesting question. And I will try to deal with this question but it's not easy. I'm not sure of the notion of fairness is appropriate in this matter. And I think we have a lot of question with justice. So I'll just try to answer to this question.

I decided to organize my talk in four main part and so here is the overview-- what? Yeah, no sorry, yeah here is the overview. So the first step I will mention, the word I think which is interesting because also today was about data and notion of data was introduced a few years ago and I will discuss this question. Especially with this Harari book, Homo Deus, where he mentioned this idea. And I would like to-- so the idea is that this notion of fairness and justice are not really open so I would recall some reference mentioning this notion. And it will offer me the opportunity to mention the work which are then currently in European Union especially with HLEG group on AI and as a group like Toronto group dealing with aging so far to engage in, which I think interesting.

And then it will open to the first part-- the third part of my talk which is about the tension between ethics and epistemology and I would recall some notion about big data and some declaration which maybe can be discussed. So there will be far more question in this talk than answer. And I am very sorry because I was prepared-- initially I was educated in January as an engineer with visually with solution.

And that just-- I have to say that OK, my parents wanted me to be an engineer but I also was interested in philosophy. Maybe it will be more a philosophical part of my education here. So I will I will ask you some questions. The first thing, you remember is this book, which is I think not a scientific book, a very popular book, Homo Deus. And I think it is very interesting because it's tried to draw a general view of the history of humanity.

And so we say there are three steps. I think we can discuss it's three steps, and he's an historian but I'm not sure is very-- oh he's very talented, he's a writer. I'm not sure he is always a very good historian. But it is very interesting, you say the first step of humanity as a human were OK, lost in the world, it was very difficult and they invented gold to explain what was the world and what happened in the world.

In the second step, so men begin to understand and to master the nature. And so the new religion was not the religion of God, but it was the religion of humans. The human, this is this brief idea. And now day, and this is why it's interesting for us, you say, OK, the world is changing. The world is changing because the machine are more efficient than we are. And then maybe it would be better to leave the decision to the machine.

Then this first step he call it, that is why because the reason why these machines are more efficient is that because they use machine learning and because with machine learning they are able to know better than us. And I will give you an example. So I would like to just remain exactly what is because there were many discussion this day about data but I would like to preside some something about the use of that data, and it is important.

That the conclusion is that and maybe it is [LAUGHS] to say, OK, this is this data, data is the first step. It's after human has been-- in a way it's the end of humanity. So is it true? And it's an open question. I hope it is the answer would be no but [LAUGHS] it's not it's not absolutely certain.

So the first thing I would like to recall is that the way we are dealing with data is not so new. It depends what you know what's new but the main techniques, technique which is the most fashionable technique today is what people call deep learning technique. I would like to recall, and I don't want to insist on this, that there are many different techniques in active intelligence and there are many machine learning techniques.

And in machine learning you have different approach. You have non-supervised learning, reinforcement learning, and you have supervised learning. And what nowadays is very efficient is supervised learning. But even in supervised learning, you have many different type of supervised learning.

For instance I was-- many work in the last when I was younger I got my test data which is after the PhD on symbolic machine learning, which is type of machine learning. But you have many hours or type of machine learning. And what is important today is neural network.

So I would like to recall this notion of neural network is really old, it's not so new and the notion of neural network comes from the cybernetics. And it comes-- that the birth of neural network was in 1943. So it's not really old compared to maybe economic or to law or to mathematics but it's quite old. It's not really very new.

And this was a paper written by two person, McCullough and Pitts, and I think Pitts was in Chicago University, yes, he was a mathematician. So I am sure you all know about him. But it's important to say this. Then you know you remember in 1958 was the first learning process. Because the problem with the neural network, it was our model of the brain. A very, very simple model we start to matter Just today, very important 1943, the first electronic computer was built in 1946.

So it was not a computer. The idea was to model with just some electronic component, very simple components. And the idea was, OK, you have you have the cells which correspond to a neuron and you have a connection between cells which are weighted and which correspond to the connection between neurons which is, say, synaptic.

The problem in this first model was how to build, how to wage this synapses. And the idea was to automate it with computer now. And it took a lot of time. And for instance, the guy-- one of the [INAUDIBLE] of other intelligent, Marvin Minsky, tried-- he did PHD on this problem and it failed it just was-- it was that. But it went free and Marvin Minsky succeeded but he succeeded on very, very simple network with two layers. The problem is two layer is not universal.

So we had after this, Marvin Minsky wrote a book, Perception, where he shows that it was very, very restricted to very, very simple function. And had to wait until '86 to generalize this perception procedure to have a very efficient machine learning technique. But the machine were very slow at this time and it was very inefficient.

So people tried to develop as a technique, and especially they tried to understand what was the mathematical reason which makes the machine learn. And they developed, for instance, statistical learning theory or probably approximately correct a theory. And so they develop as a system.

And they were-- you know, in France we say, oh people in Britain, you have hard heads. Say they are obstinate and they're aware-- there was a guy very obstinate, Janus Kun, and he was working United States and say yes, I want-- he was one of the guy who built this but he went to pursue this work. And he'd say OK, two layer is too restricted, three is not very efficient but it takes time et cetera.

But now with a very, very efficient computer and say OK, we shall generalize with a 15 layer and et cetera. And then in less than 10 years ago, he proved that it was very, very efficient and so this is the enthusiasm about this.

Why I recall this story just to understand that this is very restricted technique and very old technique and nowadays very efficient, but it's just one technique which explain today the enthusiasm about AI. But the second thing is its reason why people are so enthusiastic it was a technique. It is possible to run with many, many data and to get a very good result which is very important.

So I would like to give just one reason-- you have many applications in many domain. We have application in bank, we have application in insurance company because you can anticipate the risk. You have application in marketing because you can profile user. And if you want to do some advertisement, it is very important because you can target the advertisement.

It's very important and the economic model of the big tech are based on the use of machine learning. This is the reason why people are so interested in this. But I will give an example because I think it is interesting just to show the efficiency of the system but also the problem with this.

This was a paper published two years ago. It is a system which is trained with photography. Very, very simple photography taken with just a smartphone of the skin. And the physician gave the diagnosis, is this benign or malign.

And then the machine train-- is trained, sorry, with this. You have 130,000 a picture, which is not a lot but it's not so easy. And what is important and what I didn't say is that you train with a photo but also with lager which were given by physician. Is very important because it's not magic, you have to explain medicine to the machine.

And then the result is very impressive because you have a system which is able to diagnose the skin with an efficiency which is better than 61 dermatologists, which is very interesting. Then, and this is the question, let us suppose that this kind of system would be generalized. What would happen with the dermatologist as a physician? And do you think it's OK? Just to have the system because here the system is more efficient than the physician.

Maybe the insurance company will say OK, as a physician if you give a diagnosis which is different from the machine then you will be responsible for the risk. So these are open, open questions. And then behind there is a question of freedom and of responsibility. Rising cost of future, it's a real question. And real ethical question. [LAUGHS]

So now I would like to go to the second point, this notion of fairness and justice. Which is, OK, not totally obvious. What is fairness exactly? And I remember a few days, I was exchanging with people in big tech and say, OK, is our worst system fair and OK, is it is it easy? What does it mean exactly that for instance, a system is fair? Is the fact that you don't advantages some people, it's equal for all type of people.

So to do this, I would like to recall a notion very important from a philosopher, John Rawls, would say OK, justice is fairness. So it's very interesting because you have two points which are presented, two principles in theory of justice. One principle is this idea that you have equal basic rights and liberty for everybody. It's very interesting because this is a base of justice as fairness.

But this morning, you remember, we discussed the notion of personalization of law. It means that the data are contradictory in a way, this massive use of data is in a way contradictory to this principle. I'm not sure but it's an open question. I have no answer. I wouldn't I would be very happy to discuss this one with you.

And the second principle is divided in two separate principles. The first is for equality of opportunity. So is it right that with machine learning we have fair equality of opportunity. That's an open question. I'm not really sure it's yes. And so maybe this is the type of thing we discuss also this morning because some people say, oh yes, it's better with artificial intelligence or maybe with algorithm to have a statistical view and to say, yes, maybe you have some advantages for some part of the population, et cetera. But it's a difficult question.

I had a question, maybe people will not be happy with this question, but you agree that it's better to have equal opportunity for men and women, OK. And there is a problem with women where opportunity is totally unequal and the question is, do have to correct this? Is the fact that the number of women in jail is really lower than men. [LAUGHS] So this is a question, do we have to correct this kind of thing? [LAUGHS]

OK, no, why it's important, I will try to explain. Because in fact it is the fact that OK, this is a correlation that but this is not-- OK this is a question different between correlation because causation, but you shall go before that. And then the second principles difference principle is fact that, OK, you have two advantages the more disadvantages. OK, is it possible with machine learning?

I would like to, yeah but I'm not sure that's what it will do and I'm afraid that the data and the digitization of the word amplify the difference. But that this is a real question.

So now I would like to try to review the existing reflection on ethics and advantage. And so, oh, the first one is the Toronto Group. You ever heard about this? This is a university in Canada who want to write some principle of artificial intelligence, ethics of artificial intelligence. And you see here, well-being principle, respect for autonomy and so OK nobody can complain about this principle. But OK, is it is it really applicable? What is the meaning of this kind of thing?

The second is very interesting, it's a high level expert group on artificial intelligence. Which was organized by the European Commission and we have i think 52 expert from different disciplines. And they try to write a report on ethics of artificial intelligence. I was very happy, I tried to write, I had the first draft in December and the second version will be given in March. I had not get the second version, maybe it would be a few days.

But it was interesting and I was very surprised is that OK, we shall base our principle of ethic of AI on bioethics principle. Is it's relevant? That's the question. Here are the four main biotech principal. Beneficiance, OK, we understand that a drug has to be beneficial. But the question is this not to our benefit, it's an open question.

And non-malificience. So in case of drugs, you have a trade off between benefits and risk. But is it really relevant for us? The third is about autonomy. In medicine it's related to the notion of informed consent. And so it means that you are an autonomous being and then you have to decide by yourself what are the decision which concern you. But in case of ethics of AI, is it relevant?

And so I think as we understood today, since it was so difficult. It's, for instance, if you have a car to be autonomous when you buy a car which is connected, and that's an example for itself. But I think with many examples, say, no it's not relevant.

And the fourth is justice. So we shall go back to justice. The only point they added, and I think is very interesting, is the notion of transparency. Why? Because they say the machine has to explain the conclusion. And I think it's a very important point. Because if you want to be autonomous, you need to understand why the machine decides such intrinsic. For instance, a physician who has-- a machine will go give you a diagnosis, he has to say, yeah this is for this and with reason because if you don't have this, you cannot answer.

Then I would like to go, OK, the principle of justice, which we mentioned here-- I this it was-- yeah, it was there. So they said develop-- developers, sorry, and implementers need to ensure that individuals and the minority group maintain freedom from bias, stigmatization, and discrimination.

So how to answer this kind of thing. It's very, very difficult. And so I tried to see-- what is bias? And this is interesting, prejudiced for or against something or somebody that may resolve in unfair decision. So you see you have a relation, a closer relation which was this notion of fairness and this notion of justice. It's very difficult to decouple this kind of thing.

And the question of discrimination also is an open question. But it's very difficult. So just keep this in mind because we shall go back on this. It's very important to have this because we can't say, yeah it would be very useful to have-- to apply this kind of principle.

There is a second thing-- oh you all know GDPR, many people mentioned GDPR. I just want to recall that the GDPR is very, very complex text. And I am not a lawyer and I was lost in this text but I try to understand and because it was I was really factually concerned by this text. And I understood-- but maybe I am wrong. I understood that there was a few principles.

The first is, what's the principle of finality? It means that an organization must present a legitimate objective for collecting personal data. That means that when people are gathering data, they say yes I am gathering data that are for search engine purpose. And the second principle is notion of transparency. An organization must notify a user about the collection and sharing of information with third parties.

So this is the idea, I'm not sure it's possible. So I tried a long time ago to ask a phone company to get my information. It was very, very difficult to get, OK. And the third is the respect of personal rights. The user has a right to accept or reject data collection and they can also ask for the data to be corrected and permanently deleted.

OK if it is right, I would be happy to I'm not sure it's possible to go, for instance, to Facebook or Google yes you have to delete this information. But OK, no I would like-- oh the last thing. Yes, we mentioned the notion of personalization of justice which means that justice has to be open to the individual.

So I had in mind that I'm not a lawyer at all. This allegory of justice is this Lady Justice is this woman with a blindfold. So does it mean that today we have to remove the blindfold of justice to make it personalized. So this is just a parenthesis, an open question. Maybe-- people or you can ask me the question. That's the question, is that because this is an allegory of impartiality, do you really think that the machine are free of dogmas and bias? This is a real question.

A long time ago, I have not a lot of time, just a long time ago, I met a very famous physician in France. I'm sure nobody knows because if he was a very old guy at this time. And he asked me-- I had wrote the first book on artificial intelligence, he said, I want to meet you because I am very interested in what the machine can do. And he said, at a time scientists have a lot of dogma. So they have some theory in mind. They all apply the same theory.

Maybe with machine we can do extraordinary discovery because you'll be free of this. Is it is it true? I'm not sure it's true. For many reason which I don't want to explain now, but the facts first that the way you gather data, the way you gather example influence the result. And the second is a representation of example, even if you people in the big data can see you have no.

So third point here is about epistemology. So it's a caricature our paper but you remember, 10 years ago this Chris Anderson paper about the end of theory. So data deluge make the scientific method obsolete. I think it's very interesting because he told that we answer in this petabytes ages, which is a huge data, and he say, it change everything.

It's a question, is just volume of data change anything. So the second revolutionizing is that now today we are able to do with data, to deal with data, to treat data without sampling. Which is very important because then you can detect very small accident. So the efficiency of machine plus the big amount of data makes you able to do many things. And say OK, we can get everything I think without produce and without prior model.

Is it true? This is exactly the question of the old physician, John Bernard. He told me, yes, we have that with machine, we are free of that. I'm not sure it's true, because of this question. And the second say, no semantical cause analysis is required. I'm not sure. And the last point, which is very interesting, is that, oh, we are not interested in causation, just in correlation.

Which is interesting because maybe it could be really misleading. You know, for instance, if there is a correlation between sunscreen and skin cancer that it means that sunscreen cause skin cancer? No. [LAUGHS] But the fact is when people are going in the sun they say put sunscreen. And this a correlation, so these are really misleading. But the reason, what I'm interested is not that.

I would like to put in parallel this notion of big data, this theory of big data which many people discusses. And the principle which were presented before. First I say no sampling, guessing everything without your prior model. So just, you get data and then you will analyze data with automate and you'll get results.

And so I would like to confer with the GDPR. I'll promise if not complete imposition. So what we shall do, maybe in Europe we shall use the GDPR and we shall not deal with big data and as a place of the word, for instance China or United States, people would be more pragmatic and if we're not use GDPR that's what it will do, it would be less efficient.

So this is a real question, you see, and the second is just it's a schematic just to give him an idea. The second, no semantical causal analysis is required, correlation is enough. And so the problem is that you remember this report of the high level expert group on artificial intelligence and say, OK, we need to make fair decision and to no discrimination, nothing. But to be sure that there is no discrimination. You need to add something. You need to not add information.

For instance, if you want to be sure that you have equal treatment from men and women, you need to have some model of the way, [LAUGHS] the fact that you have half men and half women in the population. And it's the same for everything. So the problem is that if you have no model-- so you will have very, very difficult problem. And that are all bias and if you want to correct bias you need to know what is bias and so you need to have a model. Which is strange. [LAUGHS]

So I would like to add something about the data. The problem that that not only we have this problem but the fact that the data are not all through. We have many wrong data. Many fake news. And if you just gather all data, you would have a problem. You need to correct data sometime, here or something. And you have also manipulation data with artificial intelligence. So this kind of thing, deep fakes. And new vulnerabilities that many people mentions is-- and I don't want to-- you may have interference with the democratic process.

So I would like to end because I have not a lot of time. This is the end of the day here, you will be very tired now. That I think what happened today-- I think the world is changing. You have what I call reontologization. What does it mean? It means that the concept which is an basis of society are changing.

And I really like this painting, do you know this painting? It's from a Belgian painter, Magritte. And the title is very interesting, it's Human Condition. And I think in a way it is a new human condition. You see you have like a copy of the world, and this is exactly what happened with data. It looks to be a copy of the world and it's a new human condition. But in fact, this has a lot of consequences on our life.

So I would like to just say, yeah, as I mentioned before, the wearing of a society, what makes a tissue, the social tissue, is evolving with digitization of the world. And here are some example, friendships. This is a very old notion and in the antiquity, for instance, Aristotle it takes a-- I don't know how how we say it in France. Nicomachean ethics, yeah. So Aristotle said, yeah it's very important, friendship for ethics.

So nowadays with social networks, your friendship, is the same? That's an open question. And the second question, is this friendship on the social network influenced on the traditional friendship? And the same width as a notion. For instance, reputation. The reputation-- you remember in France, we have a singer, [INAUDIBLE] it's a very nice song. "The Bad Reputation," it's a bad reputation. It means that in the village when you are not doing what everybody are expecting from you then the reputation is disastrous.

So the Chinese are doing exactly the same. This is the score or reputation. So is it really the same? This are open question and I think these are the real ethical question nowadays. How can we rebuild ethics with taking on all this evolution. And the notion of confidence with the blockchain, notion of money, the notion of work, the issue of sovereignty are evolving. I have no time to go more in-depth to precise all this kind of thing.

And the conclusion I see is that-- and I agree with that many people, say, yes privacy is certainly not the only problem. You have many has a problem. Here I wanted to reduce to at least three problems because originally as I said before, I was educated as a physicist, I was an engineer. And you know as an engineer, you know that if you have to body the equilibrium is easy to evaluate. But here you have more than two body, especially three body, it's unstable.

And I think this is a problem today. We have at least three and maybe more requirement and here I call it a trilemma, which is not a dilemma but to the generalization to three requirement, which are all today, which are all seen as necessary. The first is privacy, and I think especially in Europe but I am sure in United States the same, in many country. Not exactly the same in China. I was working with Korean people, they have a different view of privacy, but I think its import even there.

So privacy is important for freedom and on the other hand, transparency is central for many reasons, especially for political reason. We think that we need to be transparent but the problem is that you cannot be both transparent and respect privacy. Because you can say yes, and this is what some people, some activists say, yes, you need to be transparent for the powerful people and respect privacy for the poor people. The problem is that the same people have a social position where they need to be transparent and they, at the same time, they are a private person.

And it's not just the presence of the rhetoric, just not a minister but even a journalist you want to know what he did or even the professor of your trade at school, you want to be sure that his private sexual life is in conformity with his activity. So this is very important. And the third point, I think, which is central is the notion of security. Why? Because you want to say protect you against terrorism or against protect your health. And the problem, if you are totally transparent, you are exposed, which is the real problem.

And if you are-- if you protect totally privacy then you have many people who will exchange information. And if you want to find, again, terrorists you need to have some infringement of privacy. So this is my conclusion is that I am not sure it's the conclusion just I want to say that this ethical issue of data and more general of artificial intelligence in the digital world are really complex, and we have to build together all this conceptual framework to deal with this question. Thank you.


Big data