### UTCTF 2019 - FaceSafe

tags: ctf challenge write-up machine learning adversarial
by zangobot

(No learning algorithm has been abused during this CTF)

Welcome everybody to this new ZenHack write-up. Today we present you the solution of FaceSace, MISC challenge of UTCTF.

First of all: the website asks for an image. An image classifier will assign a user to that image. The task is to produce an input that is recognized as the Mr. Deer [VIP]. The challenge files are the metadata of the image classifier and the trained parameters. So, how we create an image for the VIP user?

This is an adversarial machine learning problem: we need to craft a point whose network response is attacker controlled. For solving this challenge, we need to apply a state-of-the-art technique. The easiest one is the Fast Sign Gradient Method.

### SPOILER! THEORY AND MATH LIE AHEAD: YOU’VE BEEN WARNED

The only thing we need to know is that the victim model is just a very very complex function. We don’t need to understand how it was trained. We need to explore it.

To explore the space of the codomain of a function we need directions, a vector in the input space that let us navigate the output of the network. Not all the directions are worthy, though. We need the one that maximise the probability of being recognized as VIP. We need to introduce an object that quantifies the distance between the goal and the current output: a loss function.

$J : \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}$

It takes in input two point in the output space and returns the distance between them. Which loss we may use? ANYONE! It’s just something that measures how far we are w.r.t. our goal. For simplicity, we are going to use the Mean Squared Error, that is: $$J(y_1, y_2) = \parallel y_1 - y_2 \parallel_2^2$$ that is the l2 norm at the power of two: the euclidean distance! It is nice because it is easily differentiable, but other distances are suitable for this problem.

So, our loss will be the distance between the current prediction and the target class: $$\parallel y_{pred} - e_t \parallel_2^2$$. It is a multiclass classification, hence each output is a vector $$y \in \mathcal{Y}, \mathcal{Y} \subset \mathbb{R}^{10}$$. Each entry $$y_i$$ is the probability of belonging to class $$i$$, in our case, the probability of being user $$i$$.

The second ingridient for solving the challenge is called gradient: the gradient is the vector of all the partial derivative of a function $$f : \mathbb{R}^d \rightarrow \mathbb{R}$$:

$g = (\frac{\partial f}{\partial x_1}, ... , \frac{\partial f}{\partial x_d})$

Each entry of $$g$$ is a function that tells us how much the output is controlled by a certain input dimension. Moreover, the gradient points to the direction of maximum ascent, that is the direction that points to a local or absolute maximum.

To recap:

• we define a distance between the goal and our current position in the output space
• we compute the gradient of the network using this loss, by computing $$\nabla_x L(f(x),e_t) = \nabla_x \parallel f(x) - e_t\parallel_2^2$$

It is time to craft the attack: the definition of the FSGM method is the following.

$x_{adv} = x - \epsilon sign(\nabla_x L(f(x),e_t))$

We chose an $$\epsilon$$ that allow us to remain in the RGB domain, any integer number is suitable. Of course, if $$\epsilon$$ is low, many iteration of the above formula will be needed to reach the desired confidence. For $$\epsilon$$ too high, the algorithm may fail to craft the sample, by moving too much inside the input space.

So, the algorithm is the following:

"""
UTCTF 2019
FaceSafe
Can you get the secret? http://facesafe.xyz
"""
import keras
import numpy as np

from keras import backend as K
from PIL import Image

# The class we want to obtain (I have just counted from 0 to 9). No magic here!
TARGET = 4

# The strength of the perturbation
# ! VERY IMPORTANT: it is an integer beause we want to produce images in RGB, not greyscale!
eps = 1

# Produce encoding of the output for class 4 (canonical base, e_4)
target = np.zeros(10)
target[TARGET] = 1

#l Lad the model using Keras, standard way for saving netwrok's weights is HDF5 format.

# Produce np array from image, using PIL (one of the thousand ways for loading an image)
img = Image.open('img2.png')
data = np.asarray(img, dtype="int32")

print(np.argmax(model.predict(np.array([data]))),' should be 4 BUT NOT NOW!')

# We need te function that incapsulate the gradient of the loss wrt the input.
# ! MOST IMPORTANT: the loss function is the main actor here. It defines what we want to search.
# In this case, we want the distance between the prediciton and the target label 4.
# Hence, we produce the loss written there.
session = K.get_session()
d_model_d_x = K.gradients( keras.losses.mean_squared_error(target, model.output), model.input)

x0 = data
conf = model.predict(np.array([x0]))

# The attack may last forever?
# YES! But I tried with a black image and it converges.
# You should put here a fixed number of iterations...
while np.argmax(conf) != TARGET:

# Thank you Keras + Tensorflow!
# That  is just ugly, but it is needed to obtain the value as an array.

# Compute the perturbation!
# This is the Fast Sign Gradient Method attack.

# The gradient always points to maximum ascent direction, but we need to minimize.
# Hence, we swap the sign of the gradient.
x0 = x0 - fsgm

# Here we need to bound the editing. No negative values in images!
# So we clip all the negative values to 0.
# We also clip all values above 255.
x0[x0 < 0] = 0
x0[x0 > 255] = 255
conf = model.predict(np.array([x0]))
print("Confdence of target class {}: {:.3f}%\nPredicted class: {}\nConfidence of predicted class: {:.3f}%\n----".format(TARGET, conf[TARGET]*100, np.argmax(conf), conf[np.argmax(conf)]*100))

# If we obtained the evasion, we just save the new image
i = Image.fromarray(x0.astype('uint8'), 'RGB')

The flag is utflag{n3ur4l_n3t_s3cur1ty_b4d_p4d1ct4b1l1ty}