News

passwedge added to PyPI

  • None--Pypi.org
  • published date: 2026-05-26 18:07:03 UTC

Reliability-science metrics for repeated-attempt evaluation of long-horizon LLM agents: pass@k, pass^k, Bayesian posteriors, RDC/VAF/GDS/MOP.

Reliability-science metrics for repeated-attempt evaluation of long-horizon LLM agents. pass@1 tells you whether a model can do a task once. It says nothing about whether an agent does so consisten… [+4574 chars]