How to build an Image Anonymizer for GDPR Compliant tasks with TF2?

How to build an Image Anonymizer for GDPR Compliant tasks with TF2?

Many tasks in computer vision require images taken in the wild (ie. Road, Events, etc. ), but building a Dataset for Human behavior related task can be tricky. GDPR does not allow storage of pictures taken without consent, that’s normal you might say !

It would be a shame to limit AI applications due to GDPR right ? At Picsell.ia, we are dedicated to help people build better CV models, so that was kind of obvious for us to build an anonymizer that could help you build GDPR-Compliant Human-related Dataset !

Summary

  1. What does a GPDR-Compliant picture look like ?
  2. How to build a robust face-detector ?
  3. How to integrate the anonymizer in your data acquisition pipeline ?

1. What does a GDPR-Compliant Image look like ?

dataset to anonymize.jpeg

Well, let’s meet Tom (don’t worry, I found Tom on Pexels, so he won’t mind).

As you can see, 100% of his face is visible, that’s not quite GDPR compliant ..

To be Compliant, +50% of his face should not be visible, preferably the top 50% of his face to hide his eyes.

dataset hide eyes.jpeg
Like this :) Now Tom is Mister nobody ..

2. How to build a robust Face-Detector ?

By now, you should have an idea on how we developed our anonymizer :

  1. Building a face-detector
  2. Identifying the top 60% of the face
  3. Blurring it
  4. Re-writing the picture
Why build our own face detector ?

I’m sure you saw a bunch of tutorials on how to train a face detector with opencv or something, these algorithms works well for close-shot pictures, but have you tried them in the wild ? Well .. it’s not quite good. And most of all, now that’s the world is living with Covid-19, the face detector algorithms need to adapt to these new masked faces ..

Speaking of Face masked, to build our Dataset we used a Face Mask Dataset annotated by Humans in the loop at the beginning of the pandemic, it’s composed of 6000+ pictures of people wearing masks or not, you can find it on our Dataset Hub.

datalake faces.png

With Picsellia, you can quickly access this dataset and see the labels repartition, here we have a total of 10000+ annotated faces in the wild.

dataset repartition faces.png
But we don’t want to build an other face mask detector …

No we don’t ! We will need to tweak this dataset a bit in order to create a Dataset suited for Face Detection, to do so we can simply create a new version of this dataset and merge all the labels in one -> FACE

merge datasets.png

We let the platform work a bit .. then VOILA, we have a Dataset of 10 000 + annotated faces

datasets details.png

The pictures are really diverse but here is one example :

dataset face example.png

Now let’s train our Model

Ok, so before training, let’s think a bit about what we want to achieve, we want a model that will be able to perform anonymization at high speed, but also at high confidence score, because we can not afford to manually play with the confidence threshold all the time.

We will also aim for a high Precision score, it would be stupid to anonymize only one person in the picture …

Ok so let’s take a look at Picsellia’s Model hub, here you can find ready to train Tensorflow based computer vision architectures :

picsellia model hub.png

To understand how to launch multiple training with different architectures with Picsellia, I invite you to read our last article. I won’t go into details here because it’s quite .. simple I might say.

For this anonymizer, we chose to use an EfficientDet-d2 for its convergence speed and accuracy.

Here are the results logged in Picsellia.

training graphs for our EfficientDet.png

training metrics for our EfficientDet.png

Our training is kind of noisy but we managed to obtain a quite good maP so we’ll use this as base for our anonymizer.

Let’s download our saved model in order to build our anonymizer

Now that our model is trained and exported, we can download it to use it locally, to do so, you just need to go to the artifact of your experiment and download the saved_model.zip file.

experiments artifacts.png

Ok, now that we have a robust face detector, we will be able to build a anonymizer really quickly.

anonymizer (part 1).png
Available here

First let’s import some packages, and disable all the warnings from Tensorflow, who wants to see warnings really ?

You should place your saved_model directory at the root of your project (don’t worry, code will be given below).

Let’s load the saved_model and declare a pre_process function.

Now you are just few lines away of getting an anonymizer, we just need to extract the bbox with a high confidence score, let’s say 0.5 and above, and blur the top 70% of the detected faces.

anonymizer (part 2).png
Available here
dataset anonymized.jpeg

There you go !

You can find the code here.

Picsellia Platform is in open beta right now, with only few seats left ! So why not give it a try now?