Our latest technique goes beyond suppression. We have developed a method to mathematically isolate where antisemitism lives inside a neural network and delete it entirely, without degrading the model's general capabilities.
Current safety training teaches models to avoid producing harmful outputs, but the underlying biases remain intact, waiting to be resurfaced by jailbreaks, fine-tuning attacks, or simple steering. Our technique removes the structures responsible for generating antisemitic content in the first place, so there is nothing left to resurface.
Early results suggest this could offer a clean and permanent solution, one that is robust to the attacks that defeat conventional safety training. If validated at scale, this represents a shift from managing antisemitism in AI to eliminating it, giving labs a tool that resolves the problem at its source.
We are actively expanding these experiments and will publish full results soon.