00 / Systems

The machine, with the cover off.

AlphaForge is not a bot watching a chart. It is a research process that happens to trade. Six components carry a single decision from raw market data to a costed fill, and every one of them is built to refuse a lie. Here is each, in order, with nothing hidden.

data to factors to portfolio to backtester to overlays to paper loop

Paper and simulation only. No real capital.

01 / Data integrity

Point-in-time, or it does not count.

The data lake is the foundation, and it is built to make a lie expensive. Every bar is stamped with the moment it could actually have been known, to the millisecond. A decision evaluated at one o'clock may read only what existed at one o'clock. One reader enforces point-in-time access on every query, so a leak is not something we hope to avoid. It is something the data layer will not permit. The same store feeds research and live paper trading, so a backtest and a live decision read from one contract, never two that merely resemble each other.

Source Point-in-time lake every bar stamped to the millisecond it was knowable
Gate One reader enforces as-of access on every query, no leak permitted
Consumers Research and paper one contract, shared by research and paper

The universe includes the graveyard.

The universe is survivorship-bias-free. It includes the instruments that died. Listing and delisting are recorded as historical facts, so a backtest cannot quietly drop the names that went to zero and keep only the winners. The graveyard is in the data, by design.

Research and live code share one path.

There is no second implementation that drifts. What the backtester runs is what the paper loop runs. The parity is structural, not a promise to keep two codebases in step.

3.5M+ hourly bars, crypto / from 2020 / 94 instruments, live and delisted / zero look-ahead

24M+ daily bars, equity lake / 8,436 US stocks, survivorship-free / 1997 to 2026 (25 years) / 392K+ point-in-time fundamentals

02 / The factor library

Many signals, measured before they vote.

AlphaForge scores every instrument across 50 factors drawn from the published anomaly literature and from market microstructure. These include momentum (price trends), reversal (short-horizon bounce-back), funding carry (perpetual funding rates read as return), and low-volatility anomalies. It then blends them by how much real predictive content each one carries. The blend is information-coefficient weighted: a factor earns its weight by its measured correlation with future returns, not by how good its story sounds. What does not earn its place is shrunk toward zero. The engine would rather hold cash than trade a signal it cannot defend.

  1. 01

    Momentum, two kinds.

    Cross-sectional, which names lead the pack, and time-series, whether a name trends against its own past. Held on a short leash, the factor most prone to crowding.

  2. 02

    Residual reversal.

    The short-horizon tendency of a name to revert after it moves away from what its peers and its own structure imply it should be.

  3. 03

    Funding carry.

    The funding rate paid between longs and shorts on a perpetual, read as a carry signal to harvest rather than noise to ignore.

  4. 04

    Low-volatility and low-beta.

    The persistent tendency of calmer, lower-beta instruments to deliver better risk-adjusted returns than their racier peers.

  5. 05

    Volatility estimators.

    Yang-Zhang and Parkinson estimators that use the full open-high-low-close bar, not just closes, for a less noisy read on risk.

  6. 06

    Liquidity measures.

    Amihud illiquidity and the Corwin-Schultz spread estimator, so the engine knows what a position will actually cost to hold and to exit.

The output is one score per instrument, per hour. A single number that the rest of the system can size, constrain, and fill against.

See how the library is organized

// 50 factors, grouped into families, each weighted by its measured information coefficient, the correlation with future returns. Families with more predictive content carry more weight; the rest are shrunk toward zero.

  • Momentum IC-weighted
  • Residual reversal IC-weighted
  • Funding carry IC-weighted
  • Low-volatility and low-beta IC-weighted
  • Volatility estimators IC-weighted
  • Liquidity measures IC-weighted

Hover a family to read how it earns its weight.

50 factors / IC-weighted blend / one score per instrument, per hour

03 / Portfolio and risk

A portfolio, not a pile of trades.

Scores become positions through a construction step that treats correlated bets as one bet. The covariance is estimated with an EWMA that leans on recent data and a Ledoit-Wolf shrinkage that pulls a noisy sample matrix toward a stable target, so it stays usable on short history; positions are then solved by mean-variance optimization through the Clarabel conic solver, with a rank and inverse-volatility fallback for the moments no clean solution exists.

Construct

Sized by covariance.

EWMA plus Ledoit-Wolf shrinkage gives a usable covariance on short history. Mean-variance optimization through Clarabel solves the book, with a rank and inverse-volatility fallback when no clean solution exists. A volatility-target overlay holds the book at a chosen risk.

Protect

The brakes are the engine.

A drawdown ladder reduces gross exposure in steps as losses accumulate, and a kill switch stands behind it for the case the ladder is not enough. The system does less when the market gives it less, and fails toward safety when something is wrong.

EWMA + Ledoit-Wolf covariance / Clarabel MVO, rank fallback / vol-target overlay / drawdown ladder + kill switch

04 / The truth backtester

Costs that tell the truth.

The backtester is event-driven and built to remove the two ways a backtest usually flatters itself. First, timing: a signal computed at a bar's close is filled at the next bar's open, never the bar it decided on, so there is no looking ahead by construction. Second, cost: spreads, exchange fees, and the funding paid on perpetuals are charged exactly as they would be charged live. Funding is modeled as the discrete, event-driven cash flow it actually is, not smeared into an average. One transaction-cost authority prices every fill, in research and in paper trading alike, so the number a strategy earns in a backtest is the number it would have paid for in the market.

Bar close Signal computed decision made on what was knowable at the close
Next open Order filled never the bar it decided on, no look-ahead by construction
Every fill One cost authority spreads, fees, event-driven funding, research and paper alike

The validation gauntlet

Survive this, or the number means nothing.

A good backtest is easy to fake and easy to fool yourself with. So before any result is trusted it must clear a validation gauntlet. Each test below is a different way of asking the same hard question: would this survive if the world had not been so kind?

  1. 01

    Purged walk-forward.

    Train on the past, test on a future the model never saw, with a purge and embargo so no information bleeds across the boundary, then roll forward and repeat.

  2. 02

    Deflated and Probabilistic Sharpe.

    A Sharpe is discounted for how many strategies were tried to find it and for the non-normal shape of returns, so a number that survives is one that is unlikely to be luck.

  3. 03

    Probability of Backtest Overfitting, via CSCV.

    Combinatorially symmetric cross-validation estimates the chance that the strategy that looked best in-sample is actually no better than the median out-of-sample.

  4. 04

    Combinatorially-purged cross-validation.

    Many train and test splits, each purged, so the estimate of performance is not a single lucky path through history.

  5. 05

    Honest trial counts.

    We count the number of configurations tested and feed that count into the deflation, rather than pretending we only ever tried one.

  6. 06

    A must-beat-baseline gate.

    A strategy must beat a simple, honest baseline by a meaningful margin, or it does not pass. Beating zero is not the bar.

Most strategies never survive a gauntlet this severe. That is the point: what survives, you can trust.

signal-at-close to fill-at-next-open / event-driven funding / one cost authority / purged WFV + DSR + PBO + baseline gate

05 / The overlays

Conviction, scaled. Exposure, gated.

Meta-labeling scales conviction. It never flips a signal.

A gradient-boosted classifier sits on top of the factor signal as a meta-labeler. It is trained on cost-honest triple-barrier labels, did a position hit its profit target, its stop, or its time limit first, net of costs, so its target is the live outcome a trade would actually have had, not a raw forward return that ignores what it cost to get there. Its probabilities are isotonically calibrated, so a stated seventy percent means roughly seventy percent and conviction can be sized against it. Critically, the model is used only to scale conviction up or down. It never flips a signal. The factor engine decides direction; the meta-labeler decides how much to believe it.

A regime gate throttles exposure in adverse states.

A hand-rolled Gaussian hidden-Markov model reads the market's latent regime, the inference being that a sharp shift in the joint behaviour of returns and volatility means the market has changed state, not merely moved, and that an edge measured in one state should not be trusted at full size in another. When that latent state turns adverse the model throttles gross exposure. It is run filtered, with no lookahead, so its view at any moment uses only what was knowable then. When the market turns hostile, the engine carries less.

Meta-labeler

Scales, never flips.

Triple-barrier labels, a gradient-boosted classifier, isotonic calibration. It sits inside a decision the factor engine has already made and adjusts only how much to believe it.

Regime gate

Carries less when hostile.

A filtered Gaussian hidden-Markov model with no lookahead. When the latent regime turns adverse, gross exposure is throttled down.

Signal Direction, from the factors the factor engine decides which way, the overlays never overrule it
Meta-labeler Conviction, scaled a calibrated probability sizes how much to believe the signal
Regime gate Exposure, ceilinged an adverse latent state throttles gross exposure down

Neither overlay is a black box bolted to the front. Both sit inside a decision the factor engine has already made, and both can only make the engine more careful, never more reckless.

triple-barrier labels / gradient-boosted + isotonic calibration / scales conviction, never flips / Gaussian HMM regime gate

06 / The paper loop

The same system, run live, on paper.

Research and paper trading are not two systems that resemble each other. They are one system, run twice. A 24/7 loop walks the live order book to estimate realistic fills, places idempotent orders so a retry can never double-count, reconciles its own ledger against the venue so its books are always honest, and runs off a single authoritative clock so every component agrees on what time it is and what was knowable. State is written down, so a process killed mid-decision resumes exactly where it stopped, with no lost position and no double fill. This is the part that decides whether an edge survives contact with a real market, so it is engineered for the moment things go wrong.

Read Walk the order book realistic fills estimated against live depth
Act Idempotent orders a retry can never double-count, on one authoritative clock
Reconcile Books against the venue crash-safe, resumable, no lost position, no double fill

No real capital touches this loop. It is the dress rehearsal, run in full costume, for as long as it takes to earn the right to a live stage.

The same six-stage architecture runs all three algorithms in paper: AlphaForge in crypto funding carry, AlphaMax in US equity momentum, and ALPHAC, their equal-risk combination. The equities layer is trained on 25 years of leak-proof market data across 8,436 survivorship-free US stocks, and breadth research continues. What the live paper record has earned so far, honestly reported, lives on the performance page.

order-book-walking fills / idempotent orders / reconciliation / one authoritative clock / crash-safe, resumable

07 / Chain of custody

One decision, end to end.

Six components, one unbroken chain. Watch a single decision carried from raw market data to a costed fill, each stage handing to the next with nothing lost and nothing assumed.

  1. Stage 01 / Data

    Point-in-time, or it does not count.

    One leak-proof lake, every bar stamped to the moment it was knowable; research and paper read the same contract.

  2. Stage 02 / Factors

    Many signals, measured before they vote.

    Fifty factors registered, blended by measured predictive content, not by how good the story sounds. One score per instrument, per hour.

  3. Stage 03 / Portfolio

    A portfolio, not a pile of trades.

    Scores become positions through a covariance-aware optimizer, held at a chosen risk by a vol-target overlay. A drawdown ladder and a kill switch fail it toward safety.

  4. Stage 04 / Backtester

    Costs that tell the truth.

    Signal at close, fill at the next open, with spreads, fees, and event-driven funding charged exactly as they would be live. Then the validation gauntlet decides whether the result is trustworthy.

  5. Stage 05 / Overlays

    Conviction scaled. Exposure gated.

    A meta-labeler scales how much to believe a signal and never flips it; a regime gate throttles exposure when the market turns hostile. Both can only make the engine more careful.

  6. Stage 06 / Paper loop

    The same system, run live, on paper.

    One 24/7 loop walks the live book, places idempotent orders, reconciles its ledger, and resumes crash-safe. No real capital. The chain is whole, end to end.

07 / The whole machine

Built to refuse a lie, end to end.

That is the whole machine. What it has produced so far, honestly reported, lives on the performance page.