promptpressure-evals added to PyPI
Multi-turn behavioral drift detection for LLMs — tone, sycophancy, refusal sensitivity, persona stability
multi-turn behavioral drift detection for LLMs. the things benchmarks don't test. most eval frameworks measure accuracy on known-answer datasets. PromptPressure measures how models behave over susta… [+15038 chars]