Brier-calibrated multi-thesis ensemble across US, Korea, FX, commodities, and crypto markets. Every prediction carries a full evidence chain — source provenance, per-thesis AUC, confidence intervals, and a deterministic replay guarantee.
Every prediction is traceable. Every thesis is calibrated with empirical data. Every data response is reproducible to the byte.
Outputs Prediction(up_probability, confidence_interval_bps, contributing_signals…) with Brier-derived ensemble weights per thesis. No BUY/SELL output — ever (INV-GS-101).
Turns any thesis into a calibration row — Brier score, AUC, Sharpe, OOS degradation — with explicit IS/OOS split and full reproducibility guarantees.
Every external API response is persisted as a parquet shard + SQLite index + Merkle leaf. Any prediction can be replayed bit-for-bit months later (INV-GS-022).
MIT license. Fork-friendly. Designed so third-party thesis authors can plug in their own modules and contribute calibration data to the shared table via PR.
Broadcast to Telegram or mass-email raises ComplianceError permanently and unconditionally (INV-GS-024). Every Prediction carries a non-removable disclaimer field (INV-GS-104).
Every LLM call is pinned to a sha256 so the prompt graph is auditable across versions. Combined with 71+ numbered invariants and a 1:1 unit-test mapping.
Free-stack data sources only in MVP — yfinance, SEC EDGAR, Naver Finance, DART, ECOS, KIS, CCXT. Paid sources (Bigdata MCP) are phase-gated behind explicit activation.
v0.6 called these "8 thesis FAIL" against a rigid Sharpe gate and shut down. v1.0 reframes the same data as the calibration baseline — weak signals carry near-zero Brier weight, strong signals carry proportional weight.
Brier-derived weights are illustrative; actual values computed at run time from
cache/calibration_table.parquet.
The table above reflects the v0.6 baseline only — current main has 21 thesis modules.
Full table and interpretation: docs/CALIBRATION.md
*** AUC ≤ 0.51 — statistical noise territory warning
in the output (INV-GS-113 X3 / INV-GS-114). This is intentional honesty, not
a defect. See docs/KR_SUPPORT.md.
Requires Python ≥ 3.11. No paid API keys needed in default MVP mode.
# Install (PyPI default = v2.0.2+; v1.x are yanked) pip install glostat # Mock prediction (no network, bundled fixtures) glostat predict AAPL --horizon swing_5d --mock # Live KR prediction (no SEC_USER_AGENT needed — Naver/DART/KIS only) glostat predict 005930 --horizon swing_5d # 삼성전자 glostat predict 096770 --horizon swing_5d # SK Innovation # Live US prediction (SEC EDGAR requires User-Agent — INV-GS-038) GLOSTAT_SEC_USER_AGENT="Your Name [email protected]" \ glostat predict AAPL --horizon swing_5d # Refresh calibration table from cached hindcast reports glostat calibrate --out cache/calibration_table.parquet # Universe-wide scan (rank by composite edge, filter significant) glostat scan --universe kr_kospi200 --significant
from glostat.predictor import predict, load_calibration from glostat.predictor.types import Prediction # Build contributions via collect_contributions(); see docs/EXAMPLES.md cal_table = load_calibration() prediction: Prediction = predict( ticker="AAPL", horizon="swing_5d", # intraday | swing_5d | swing_30d | long_3y contributions=contributions, cal_table=cal_table, ) print(f"up_probability = {prediction.up_probability:.3f}") low_bps, high_bps = prediction.confidence_interval_bps print(f"CI 1-sigma (~68%) bps = [{low_bps:+.1f}, {high_bps:+.1f}]") if low_bps <= 0 <= high_bps: print(" *** includes 0 — no clear direction (INV-GS-113)") for c in prediction.contributing_signals: if c.direction == "skip": continue print(f" {c.name:24} dir={c.direction:4} " f"AUC={c.calibration_auc:.3f} n={c.n_samples}") print(prediction.disclaimer) # Always non-empty (INV-GS-104)
The infrastructure is independent of which thesis you screen. New thesis modules need calibration data (n ≥ 50, AUC, Sharpe, OOS) attached to the PR (INV-GS-026).
Subclass the Thesis protocol in src/glostat/experts/. Return a typed (direction, raw_score, sources). See docs/EXAMPLES.md for a working template.
Add a routing entry in data_router.py. The DataRouter enforces phase gating so paid sources stay blocked until you explicitly opt in.
Configure Hindcast, point at a universe, get an IS/OOS report with AUC, Sharpe, and Brier score. Minimum n=50 required (INV-GS-026).
Append the result to calibration_table.parquet. The Brier-weighted ensemble picks the weight automatically at the next recalibration run.