Dominique Cardon, "Algorithmic Personalization: Sociological and Ethical Issues"

Presented at the Legal Challenges of the Data Economy conference, March 22, 2019.

Transcript

DOMINIQUE CARDON: Thank you very much. Thank you for the invitation. Excuse me for my very French English. And also, I'm very happy to be here as a sociologist because I'm not a lawyer, so I will make a presentation-- a talk-- about sociological and ethical issues of a new form of calculation that is closely linked to the discussion that we had with Sendhil Mullainathatan as the keynote speaker, at the beginning of the day.

Because my idea is that we are entering in a new kind of world in which both computation and data are changing. And they are changing so dramatically that the kind of perspective that I really sustain of public examination of input data, training system, and the choice of the outcome of the machine learning system could be done with static statistical data sets.

But it is changing with the new digital data that I am studying. So my idea is to describe this new world that is coming. And there are a lot of new ethical issues about this new configuration of the personalization of algorithmic computation. And that's the kind of thing that I'm going to describe.

At first, I just want to begin with a kind of starting point-- a very sociological, traditional, historical description of the way we compute society and we describe society with economical, sociological, demographic tools, traditionally.

And let's say that we have items and users, and we need to find correlation or regularity or relation between the distribution of the consumption, the ratings, the anything else, the use of the item by different kinds of individuals. And part of the way we describe society in sociology, in demographic studies, in economics, and in many different social sciences is the idea that we create a different kind of categorical system just in order to represent both items and individuals in the system.

The birth of statistics, which is closely linked to the building of states and state systems, states' measurement of population, is linked to the idea that we could describe society with categories. And those categories are symbolic categories that describe and fix items and people inside different regularities that we could observe because there are regularities between the categorical systems of individuals and the categorical systems that describe items that we want to observe in the distribution.

That's the history of statistics-- the history of the description of our society. And we produce a lot of descriptions of our society, with this idea that there are regular distributions of items, goods, practices, behavior in our society.

Just one very famous example coming from the French sociologist Pierre Bourdieu. In La Distinction, he produced this kind of graph. And if you look at the top of the graph, you could see regularities or correlations or associations between whisky, chess, and private sector executives.

So the idea is that we could understand society because the links between categories are very important in the way we represent society. And for sociologists, it has always been very important because those global distributions or correlations between two different categories of description of people or items are also becoming representation, common sense for everyone in our society.

We know that there are some regularities in our society that help us to understand and to describe society in our everyday life. And there are also-- another important aspect is that the normative understanding of discrimination and distribution by us necessarily involves this category called representation. We need to have age, sex, gender, income level in order to show or to describe discrimination or inequality in the access to different goods. So those categorical systems are very important in our society.

But our society are protesting against the idea that they are linked to different categorical systems. With the individualization of our society, we are against standardization and averaging. We always say that fixed identities such as sex, gender, age, occupation are a very bad way of describing what people really are in their own singularity. In different contexts-- that our identity is something that is more multiple, more complex. And it's changing every time in our life, every time in the day.

And in a certain way, we could say that there is an expectation of our society that we don't want to be described by those categorical systems that appear everywhere in statistical, sociological, economical descriptions of our society. So my idea is that new algorithms that are coming with big data and new digital world are trying to suggest or propose an answer to this individualization expectation of our society of people would not want to be calculated by means on a categorical system.

And this new paradigm could be described with three properties. The first one is that with new digital services, we tried to increase the data granularity-- the way we describe society, not with category, but with elemental, granular, atomistic descriptions of behavior of individuals. We could see that with items. When you look at web services, we don't have this idea that one piece of music should be described by its general genre. But it could be described by a lot of different labels in algorithmic system.

When you look at the Netflix recommender system, you could observe that-- and it changed today-- that we have quite 80,000 micro-genres that have been created by humans and algorithms in order to describe the different properties of the item, just in order to produce a more precise and accurate recommendation to different users.

And with a lot of new data that are coming with digital world, we are trying to transform content and the idea of a symbolic description of the content into physical properties of the objects. Today in deep learning techniques, images are not described by the kind of content that we could observe in the image when you produce image recognition. You just use the value of the picture with red, blue, and green value of each pixel of the image.

In recommendation systems for music, you don't use a traditional categorical system of musical genre, but you use the sound signal that you could record in different music. And it's the same with users. The tendency-- the shift that appears with digital recording of new data is that we don't have to describe users with traditional profiling systems with categories of sex, gender, territory, level of income. We have to capture traces of behavior of the different users.

We don't profile individuals, but we try to have a kind of record tracks of all the actions-- clicks, localization, reading speed on Kindle-- of the behavior of the different individuals. It's very impressive when you look at the way new real time bidding systems for programmatic advertisement is coming.

They don't have to know you. They don't have to know your income level, your sex, your age, and all those traditional categories of description-- sensitive data for GDPR. They just want to know what you have done and the traces of your navigation on the web.

So it's a very important challenge. And it's a challenge that is also asking for having more implicit data more than explicit data coming from the individual. It's a huge tendency that appears in digital studies and digital engineering of new algorithmic techniques.

And the idea is that when you try to compute data coming from a subjective, realistic-- say, coming from the users-- it could be inaccurate. The prediction rate is not very good. And it's better to have implicit traces of the behavior of the user. You need to enter into a really behavioral signal coming from the way individuals are behaving.

Alex Pentland, who is one of the most kind of derision of this new idea of data should come from implicit interaction and action and behavior of the user says that very clearly "These data tell a story of everyday life by recording what each of us has chosen to do. And this is very different from what is put on Facebook; postings on Facebook are what people choose to tell each other, edited according to the standards of the day."

We need to have other data, that the things that people are posting on Facebook. And those traditional categorical systems, such as political or economic labels, such as bourgeoisie, working class Democrat or Republican are often inaccurate stereotypes. We don't need to describe society from the top with those kind of labels. We need to find other information. And those information are implicit traces of behavior-- of digital behavior of the user.

It happens very clearly in the RecSys community. The RecSys community is a community in computer science that produce recommender systems. And for a long time, they used this kind of explicit feedback. Explicit feedback is when on a video on YouTube, you say, yes, I like it. But you could say that you like the video, but you never have seen the video, really, accordingly to your like.

The idea of the algorithm now is not to use the information that is explicit-- representation coming from the user-- but to use implicit recommendation. And implicit recommendation for YouTube recommender is the watch time. The watch time is the time you spend in viewing the video from the beginning to the end. And if you stop viewing the video at the middle, it won't be-- the score will be lesser than if you go to the end of the video.

The second change in this cultural transformation of computation and data with digital algorithms is deep learning. And deep learning is also a machine learning technique, but it's a machine learning technique in which you couldn't describe and you couldn't observe all the different predictors playing a role in the outcome of the system.

And deep learning techniques are coming from a very important shift in the history of artificial intelligence. I won't make the history of artificial intelligence here. But there is a kind of competition with two different paradigms. One of them is symbolic. And with a symbolic paradigm, you try to program, inside the machine, rules with logic and symbols. And you tend describe the different rules inside the system. And with connectionist technique, you just have to compute with very granular data.

And if you make the history of artificial intelligence, one of the things that is very fascinating and very strange, in fact, is that we call those new techniques of deep learning that are coming today-- we call them artificial intelligence. But all the history of artificial intelligence since its birth in 1956 were symbolic.

And if you look at the history, we have with cybernetics a connectionist moment, then the artificial intelligence was a symbolic conception of the computer system. And the new spring-- because the are spring and winter with artificial intelligence-- the new spring of artificial intelligence is, in a certain way, a return to cybernetic systems with the idea of adaptive loop.

And that's the thing that is happening with our new algorithmic techniques. We don't ever fix objectives. Objectives is coming from the changing structure of the behavior of the individuals, and it could change for each individual.

So we could have this kind of adaptive system in which the link that I click on Google search page of results will change the way the computational system will suggest to me new ranking of links for another request. But it changed for me, but it could be different for another person. So the idea is that we could have a kind of public examination of the outcome of the algorithm. It's becoming more and more complex in this system. And it's--

LUBOMIRA ROCHET: Dominique, I'm sorry. I hate to do that, but only two minutes left.

DOMINIQUE CARDON: Two minutes. That's the reason why all those new algorithm systems are linked to the idea of an optimization of the behavior of each user with this personalization system that appears today.

So just to conclude-- skip that-- granularity of data, real-time adaptive model that appears in new deep running techniques, and the use of behavior traces that are, in a certain way, we could say-- in artificial intelligence, this word has often been used-- subsymbolic. So they are below the idea of a symbolic categorization that we could judge and examine in a public situation.

So my conclusion is that perhaps transparency of algorithmic systems won't be enough to have a close examination of the system. It could produce chaotic and indecipherable decisions. It could bring a kind of filter bubble around the users that wants access to other choice that it could have made if it used its own subjective representation. But as it's directly linked to its practical traces of behavior-- new clicks, navigation, and things like that-- you won't have the possibility to click on different objects, subjects that could be proposed by the recommender system. So my conclusion is that it is also necessary to protect the user's autonomy of choice in this new world of algorithms. Thank you.

[APPLAUSE]

Big data