Leakage in Adversarial Debiasing as a Consequence of a Multi-Directional Bias in Weight Space
LocationErasmus University, Mandeville Building, room T18-29B
Date and time
August 30, 2022
15:00 - 16:00
An increasing number of research acknowledges the problem of leakage in the adversarial debiasing framework. Leakage thereby refers to the detectability of the protected attribute in some allegedly debiased representations or predictions. We formally argue that leakage can be explained as a result of incomplete or improper debiasing if there exists a multi-directional bias in the weight space of the encoder, indicated by the gradients of potential adversaries. As a solution, we propose proper multi-directional debiasing that builds on the mutual orthogonalization of the involved adversary gradients. We also investigate possible extensions that attempt to reduce the number of imposed directional restrictions in the encoder weight space or explore the weight space more holistically by encouraging diversity among the adversaries in an ensemble. Our experiments based on benchmark data and synthetic data indicate that our proposed method consistently outperforms a naive ensemble approach in terms of fairness and leakage. We also find that reducing the number of imposed restrictions improves the leakage and fairness results, whereas encouraging diversity among ensemble members leads to a dramatic deterioration in performance along several dimensions.