AI Safety

The Anatomy of AI Lies: How Language Models Can Deceive Us

We’re used to hearing that AI sometimes “hallucinates” — making funny or random mistakes. Hallucinations are unintended errors caused by the limits of statistical prediction. But the new research goes further: it shows that AI can knowingly choose to lie when deception helps it achieve a goal. The publication Can LLMs Lie? takes us into a world where AI acts more like a strategic agent, capable of manipulating information to maximize outcomes. ...

Global Guarantees of Robustness: A Probabilistic Approach to AI Safety

Modern machine learning models, from image recognition systems to large language models, have achieved impressive capabilities. However, their strength can be deceptive. One of the biggest challenges in the field of AI is their vulnerability to adversarial attacks. These are intentionally crafted, small perturbations to input data (e.g., changing a few pixels in an image) that are imperceptible to humans but can completely fool the model, leading to incorrect and often absurd decisions. ...