Stealth logo

Member of Technical Staff - Data

Stealth
Department:Finance
Type:REMOTE
Region:Australia
Location:Australia
Experience:Mid-Senior level
Salary:A$200,000 - A$300,000
Skills:
PYTHONMACHINE LEARNINGFORECASTINGSTATISTICSSQLPOSTGRESSUPABASETYPESCRIPTOPTIMIZATIONEVALUATION SYSTEMSCALIBRATIONTEMPORAL LEAKAGEBRIER SCOREEXPERIMENT DESIGNPRODUCTION PIPELINES
Share this job:

Job Description

Posted on: March 3, 2026

We are one of Australia's two private foundation model labs, currently still in stealth. We build AI forecasting systems. Our reasoning models beat human superforecasters at prediction tasks. We’re backed by Blackbird Ventures and notable angels, including Balaji Srinivasan, Synthesia founders, and Supabase founders. The salary range for this job is $200,000 - $300,000 p/a.

Our founders include the founder of one of the largest DevOps infrastructure companies in the world, Forbes 30 Under 30 alumni, and the creator of core infrastructure for many quant funds.

The Role

You will own the evaluation and learning loop in our Python research stack and decision-surface workflow.

This is not a "build a model and hand it off" role. There is no data team cleaning features for you. There is no ML platform abstracting away infrastructure. You will be the person who diagnoses why Brier score spiked last week and whether it was calibration drift or resolution loss; who determines if a surface optimization actually improved out-of-sample performance or just overfit; and who builds the leakage prevention infrastructure because nobody else will. You will spend more time asking "is this result real" than "how do I build a fancier model."

You will work at the boundary of research and production, with direct handoff into our TypeScript strategy runtime.

What You’ll Do

  • Build and own point-in-time, leakage-resistant evaluation pipelines for forecasting and trading decisions.
  • Define and enforce rigorous validation: temporal splits, walk-forward tests, segment/regime slicing, and regression benchmarks.
  • Improve calibration quality using robust methods (for example isotonic/platt variants) with reproducible model selection and monitoring.
  • Connect forecast quality to execution quality: measure where edge is created, where it is lost, and why.
  • Improve decision-surface optimization pipelines (entry/exit thresholds, exits, policy knobs) using robust objective design and stability checks.
  • Maintain Python-to-TypeScript parity for exported artifacts so research outputs are production-faithful.
  • Convert postmortems into systematic improvements across prompts, models, features, and policy logic.
  • Prototype quickly, then harden into tested, documented components used by the broader team.

Requirements

  • Proven track record building robust ML/forecasting evaluation systems end-to-end, not just metric dashboards.
  • Deep understanding of temporal leakage modes and practical defenses in real data systems.
  • Strong applied statistics: calibration, uncertainty, proper scoring rules, experiment design, and significance under noise.
  • Strong Python engineering skills for production-grade research tooling and reproducible pipelines.
  • Solid SQL/Supabase/Postgres fluency for large-scale, time-dependent analytical workflows.
  • Experience with optimization workflows (for example Optuna) and model/policy tuning under realistic constraints.
  • Ability to work with TypeScript engineers to keep runtime behavior aligned with research assumptions.
  • Strong product judgment: prioritize changes that move both forecast reliability and trade performance.

Nice To Have

  • Experience with prediction markets, market microstructure, DeFi, or other on-chain datasets.
  • Experience tying calibration quality directly to capital allocation and risk controls.
  • Familiarity with parity-contract testing across languages and artifact schemas.
  • Experience in high-velocity research environments where correctness and speed both matter.

Why Us

  • Real traction: our live system already outperforms human superforecasters.
  • Frontier technical problem: AI reasoning + forecasting + market execution.
  • Small, technical founding team with high ownership and fast iteration.
  • Backed by top-tier investors and operators.
  • Remote-friendly with Sydney and San Francisco presence.

How To Apply

Send your resume and a brief note covering:

  1. The most rigorous leakage-resistant evaluation system you have built.
  2. A concrete case where improved calibration or decision policy changed real business/trading outcomes.
  3. How you would structure a point-in-time evaluation + optimization loop that is production-faithful across Python research and TypeScript execution.

Optionally, a quick-fire question — given this Brier decomposition:

  • Period A: BS=0.180 Reliability=0.040 Resolution=0.100 Uncertainty=0.240
  • Period B: BS=0.220 Reliability=0.020 Resolution=0.060 Uncertainty=0.240

Which period had better forecasts, and why?

Originally posted on LinkedIn

Apply now

Please let the company know that you found this position on our job board. This is a great way to support us, so we can keep posting cool jobs every day!

Remote-Work.app logo

Remote-Work.app

Get Remote-Work.app on your phone!

SIMILAR JOBS
Stealth logo

Member of Technical Staff - Data

Stealth
Just now
Finance
Remote (Australia)
Australia
PYTHONMACHINE LEARNINGFORECASTING+12 more
Aussie logo

Mortgage Broker

Aussie
Just now
Finance
Remote (Australia)
Sydney, New South Wales, Australia
MORTGAGE BROKINGFINANCESALES+7 more
Mindrift logo

Data Scientist (Python & SQL) - Freelance AI Trainer

Mindrift
Just now
Finance
Remote (Australia)
Australia
PYTHONSQLPANDAS+21 more
Twine logo

Freelance Video Editor – Education Content

Twine
Just now
Finance
Remote (Australia)
Australia
VIDEO EDITINGADOBE PREMIERE PROFINAL CUT PRO+5 more
Traild logo

Financial Planning & Analysis Manager - Australia

Traild
Just now
Finance
Remote (Australia)
Sydney, New South Wales, Australia
FINANCIAL PLANNINGFINANCIAL ANALYSISBUDGETING+7 more