News

sforge added to PyPI

  • None--Pypi.org
  • published date: 2026-07-02 13:58:27 UTC

SForge: Evaluation harness for frontier agents

Overview EdgeBench is a benchmark of 134 real-world tasks for evaluating how autonomous AI agents learn from real-world environments. Instead of measuring one-shot performance, EdgeBench places agen… [+19162 chars]