Newly published research suggests that AI can subliminally learn. This is exciting but also disconcerting. Evil AI could ...
Fine-tuned “student” models can pick up unwanted traits from base “teacher” models that could evade data filtering, generating a need for more rigorous safety evaluations. Researchers have discovered ...
Large language models can transmit harmful behavior to one another through training data, even when that data lacks any ...