Cyber Daily Report

Lesswrong.com

Lesswrong.com

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

aogara--Lesswrong.com
published date: 2023-09-13 18:03:31 UTC

Published on September 13, 2023 6:03 PM GMTWelcome to the 10th issue of the ML Safety Newsletter by the Center for AI Safety. In this edition, we cover:<ul><li>Adversarial attacks against GPT-4, PaLM-2, Claude, and Llama 2</li><li>Robustness against unforesee…

Welcome to the 10th issue of the ML Safety Newsletter by the Center for AI Safety. In this edition, we cover: <ul><li>Adversarial attacks against GPT-4, PaLM-2, Claude, and Llama 2</li><li>Robustnes… [+9031 chars]