News

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

  • aogara--Lesswrong.com
  • published date: 2023-09-13 18:03:31 UTC

Published on September 13, 2023 6:03 PM GMTWelcome to the 10th issue of the ML Safety Newsletter by the Center for AI Safety. In this edition, we cover:<ul><li>Adversarial attacks against GPT-4, PaLM-2, Claude, and Llama 2</li><li>Robustness against unforeseeā€¦

Welcome to the 10th issue of the ML Safety Newsletter by the Center for AI Safety. In this edition, we cover: <ul><li>Adversarial attacks against GPT-4, PaLM-2, Claude, and Llama 2</li><li>Robustnesā€¦ [+9031 chars]