Evaluation data
for frontier AI.
Partner with us to author rigorous, machine-verifiable coding tasks that benchmark the world's most advanced AI agents — on real codebases, with real standards.
Trusted by engineers across 20+ countries
The benchmark layer
AI labs depend on.
Every task you author is a real coding challenge — grounded in actual code, verifiable by machine, and scored against an 11-criterion quality bar.
Real codebases
Tasks are grounded in actual production-grade code — not toy problems. The agent must understand real code to succeed.
Machine-verifiable
Every task includes a test harness and reference solution. Pass/fail is determined objectively — no human bias in the score.
11-point quality bar
Tasks are reviewed against 11 criteria — Verifiable, Solvable, Fair, Deterministic and more. All must pass. No shortcuts.
Evaluation programs
Project Aura
SWE-bench-style coding tasks
Rigorous coding challenges grounded in real code, scored against a 24-hour reviewer SLA and a calibrated difficulty band.
- Languages
- Python · TS · Rust · Go
- Payouts
- Weekly · Bonuses available
Project Spark
SWE-bench-style coding tasks
The primary program open to new contributors worldwide. Same high quality bar, fast reviews, and weekly earnings.
- Languages
- Python · TS · Rust · Go
- Payouts
- Weekly · Bonuses available
Project Titan
Long-horizon system-level tasks
Complex, multi-step tasks requiring deep understanding across large codebases. Invite-only for senior contributors.
- Eligibility
- Senior only · Invite only
- Payouts
- Higher rates
From sign-up to payout.
Create account
Sign up and tell us about your background — languages, years of experience, and the kind of code you work in.
Get program access
Our team reviews your profile and grants access to the program that best fits your expertise level.
Author & submit
Author coding challenges and submit them for review. Verdicts arrive within 24 hours with detailed rubric feedback.
Get paid
Accepted tasks create pending payments automatically. We pay out every Wednesday via your preferred method.
Get paid weekly.
Bonuses on top.
Every accepted task earns you real money, paid out weekly. We also run bonuses so the more you contribute, the more you can earn.
Start earning11 criteria.
All must pass.
Every task is graded on 11 rubric criteria. A single fail means rejection — this is the bar that makes the data genuinely useful to frontier labs.
All 11 criteria must score Accept or Strong Accept. Tasks that miss even one are returned with detailed feedback.
Ready to
contribute?
Create an account, tell us about your background, and our team will get back to you with program access within 48 hours.