How to Build an Image Anonymizer For GDPR Compliant Tasks With TF2

How to Build an Image Anonymizer For GDPR Compliant Tasks With TF2

A lot of tasks in computer vision require images taken in the wild (i.e. road, events, etc.), but building a dataset for human behavior related tasks can be tricky. As you must know, GDPR does not allow storage of pictures taken without consent.

It would be a shame to limit AI applications due to GDPR right? At Picsellia, we are dedicated to help people build better computer vision models, so it only made sense for us to build an anonymizer to help you build GDPR-compliant human-related datasets.

What Does a GDPR-Compliant Image Look Like?

dataset to anonymize.jpeg

Well, let’s meet Tom (don’t worry, I found Tom on Pexels, so he won’t mind).

As you can see, 100% of his face is visible, which is not quite GDPR compliant.

In fact, to be compliant, +50% of his face should not be visible, preferably the top 50% of his face to hide his eyes.

dataset hide eyes.jpeg

Like this!

How To Build a Robust Face-Detector?

By now, you should have an idea on how we developed our anonymizer :

  1. Building a face-detector
  2. Identifying the top 60% of the face
  3. Blurring it
  4. Re-writing the picture

Why Build Your Own Face Detector?

I’m sure you saw a bunch of tutorials on how to train a face detector with openCV or something. These algorithms work well for close-shot pictures, but have you tried them in the wild? Well... it’s not quite good. And most importantly, now that’s the world is living with Covid-19, face detector algorithms need to adapt to the new reality of everyone wearing masks.

Speaking of masks, to build our dataset, we used a face mask dataset annotated by humans in the loop at the beginning of the pandemic. It’s composed of 6000+ pictures of people wearing masks or not; you can find it in our Dataset Hub.

datalake faces.png

With Picsellia, you can quickly access this dataset and see the labels repartition, here we have a total of 10000+ annotated faces in the wild.


dataset repartition faces.png

But we don’t want to build an other face mask detector.

We will need to tweak this dataset a bit in order to create a dataset suited for face detection, to do so we can simply create a new version of this dataset and merge all the labels in one -> FACE

merge datasets.png

We let the platform work a bit, and voilà, we have a dataset of 10,000 + annotated faces

datasets details.png

The pictures are really diverse but here is one example:

dataset face example.png

Now, Let’s Train Our Model

Before training, we must think of what we want to achieve. We wanted a model to perform anonymization at high speed, but also at high confidence score, because we can't afford to manually play with the confidence threshold all the time.

We will also aim for a high precision score, since it'd make no sense to anonymize only one person in the picture.

Let’s take a look at Picsellia’s Model hub, where you can find ready to train Tensorflow-based computer vision architectures:

picsellia model hub.png

To understand how to launch multiple training with different architectures with Picsellia, I invite you to read our last article.

For this anonymizer, we chose to use an EfficientDet-d2 for its convergence speed and accuracy.

Here are the results logged in Picsellia.

training graphs for our EfficientDet.png
training metrics for our EfficientDet.png

Our training is kind of noisy but we managed to obtain a quite good maP so we’ll use this as base for our anonymizer.

Let’s download our saved model in order to build our anonymizer.

Now that our model is trained and exported, we can download it to use it locally, to do so, you just need to go to the artifact of your experiment and download the saved_model.zip file.

experiments artifacts.png

Ok, now that we have a robust face detector, we'll be able to build an anonymizer really quickly.

anonymizer (part 1).png

Available here

First, let’s import some packages and disable all the warnings from Tensorflow—who wants to see warnings really?

You should place your saved_model directory at the root of your project (don’t worry, code will be given below).

Let’s load the saved_model and declare a pre_process function.

Now you are just few lines away of getting an anonymizer, we only need to extract the bbox with a high confidence score, let’s say 0.5 and above, and blur the top 70% of the detected faces.

anonymizer (part 2).png

Available here

dataset anonymized.jpeg

And there you go!

You can find the code here.

If you'd like to try out Picsellia yourself, book a quick call and request a trial here!

Start managing your AI data the right way.

Request a demo

Recommended for you:

french language
FR
EN