v2.0 breaking change: Prediction.dca_sizing field removed. All v1.x releases (1.3.0–1.9.1) are yanked from PyPI. pip install glostat now resolves to v2.0.2+. See v2.0 release notes.
v2.0.2  ·  MIT License  ·  Python ≥ 3.11

Evidence-based
Probability Predictor
for Global Equities

Brier-calibrated multi-thesis ensemble across US, Korea, FX, commodities, and crypto markets. Every prediction carries a full evidence chain — source provenance, per-thesis AUC, confidence intervals, and a deterministic replay guarantee.

$ pip install glostat click to copy
Information tool only. GLOSTAT outputs probability distributions with explicit confidence intervals and source provenance — not investment recommendations, not securities solicitations, not financial advice. Past calibration data does not guarantee future predictive performance. Users are solely responsible for their own decisions. 본 도구는 정보 제공 목적으로만 제공되며, 투자 권유·증권 청약 권유·재무 조언을 구성하지 않습니다.
21+
Thesis Modules
71+
Invariants (INV-GS)
5
Markets Covered
836+
Unit Tests
v2.0.2
Current Release

A research framework, not a black box

Every prediction is traceable. Every thesis is calibrated with empirical data. Every data response is reproducible to the byte.

📐

Calibrated Probability Predictor

Outputs Prediction(up_probability, confidence_interval_bps, contributing_signals…) with Brier-derived ensemble weights per thesis. No BUY/SELL output — ever (INV-GS-101).

🔁

Deterministic Hindcast Harness

Turns any thesis into a calibration row — Brier score, AUC, Sharpe, OOS degradation — with explicit IS/OOS split and full reproducibility guarantees.

🗄️

Snapshot Broker

Every external API response is persisted as a parquet shard + SQLite index + Merkle leaf. Any prediction can be replayed bit-for-bit months later (INV-GS-022).

🔓

Open-Source Framework

MIT license. Fork-friendly. Designed so third-party thesis authors can plug in their own modules and contribute calibration data to the shared table via PR.

🛡️

Compliance Gate

Broadcast to Telegram or mass-email raises ComplianceError permanently and unconditionally (INV-GS-024). Every Prediction carries a non-removable disclaimer field (INV-GS-104).

🔑

Prompt Registry

Every LLM call is pinned to a sha256 so the prompt graph is auditable across versions. Combined with 71+ numbered invariants and a 1:1 unit-test mapping.

US, Korea, FX, Commodities, Crypto

Free-stack data sources only in MVP — yfinance, SEC EDGAR, Naver Finance, DART, ECOS, KIS, CCXT. Paid sources (Bigdata MCP) are phase-gated behind explicit activation.

XNAS / XNYS
US Large-cap
Active
S&P 500 Top 50 · yfinance + SEC EDGAR + FRED-ready
XKRX
Korea KOSPI
Active
KOSPI 200 · yfinance (.KS) + Naver + DART + ECOS + KIS + KRX
XKOS
Korea KOSDAQ
Active
KOSDAQ 150 Top 30 · yfinance (.KQ) + Naver + DART
BINANCE_PERP
Crypto Perp
Research
BTC / ETH · CCXT
NYSE / CBOE
FX & Commodity ETFs
Partial
yfinance + CFTC COT · commodity cycle client

8 theses measured, not failed

v0.6 called these "8 thesis FAIL" against a rigid Sharpe gate and shut down. v1.0 reframes the same data as the calibration baseline — weak signals carry near-zero Brier weight, strong signals carry proportional weight.

Thesis Universe AUC OOS Sharpe Brier Weight
E_PEAD US 50 0.587 +0.63 0.18
E_FOREIGN_REVERSAL KR 20 0.467 +0.58 0.14
E_INSIDER_CLUSTER US 19 0.339 +0.78 0.05
E_COMMODITY_TS Commodity 0.489 +0.14 0.06
E_SECTOR_ROTATION US 11 0.470 −0.48 0.00
E_FOMC_DRIFT US 12 0.357 −1.34 0.00
E_FX_CARRY FX 8 0.400 −1.53 0.00
E_FUNDING_CARRY Crypto 2 0.505 −0.23 0.02

Brier-derived weights are illustrative; actual values computed at run time from cache/calibration_table.parquet. The table above reflects the v0.6 baseline only — current main has 21 thesis modules. Full table and interpretation: docs/CALIBRATION.md

⚠ KR megacap statistical honesty (INV-GS-114). On KOSPI 200 / KOSDAQ 150 megacap names, Phase-KR M1 hindcast measured AUC ≤ 0.51 (n = 3,510 samples). Discrimination is at the edge of statistical noise for these tickers. Predictions for KR large-caps carry an explicit *** AUC ≤ 0.51 — statistical noise territory warning in the output (INV-GS-113 X3 / INV-GS-114). This is intentional honesty, not a defect. See docs/KR_SUPPORT.md.

Up and running in minutes

Requires Python ≥ 3.11. No paid API keys needed in default MVP mode.

shell
# Install (PyPI default = v2.0.2+; v1.x are yanked)
pip install glostat

# Mock prediction (no network, bundled fixtures)
glostat predict AAPL --horizon swing_5d --mock

# Live KR prediction (no SEC_USER_AGENT needed — Naver/DART/KIS only)
glostat predict 005930 --horizon swing_5d   # 삼성전자
glostat predict 096770 --horizon swing_5d   # SK Innovation

# Live US prediction (SEC EDGAR requires User-Agent — INV-GS-038)
GLOSTAT_SEC_USER_AGENT="Your Name [email protected]" \
glostat predict AAPL --horizon swing_5d

# Refresh calibration table from cached hindcast reports
glostat calibrate --out cache/calibration_table.parquet

# Universe-wide scan (rank by composite edge, filter significant)
glostat scan --universe kr_kospi200 --significant
python
from glostat.predictor import predict, load_calibration
from glostat.predictor.types import Prediction

# Build contributions via collect_contributions(); see docs/EXAMPLES.md
cal_table = load_calibration()
prediction: Prediction = predict(
    ticker="AAPL",
    horizon="swing_5d",   # intraday | swing_5d | swing_30d | long_3y
    contributions=contributions,
    cal_table=cal_table,
)

print(f"up_probability = {prediction.up_probability:.3f}")
low_bps, high_bps = prediction.confidence_interval_bps
print(f"CI 1-sigma (~68%) bps = [{low_bps:+.1f}, {high_bps:+.1f}]")
if low_bps <= 0 <= high_bps:
    print("  *** includes 0 — no clear direction (INV-GS-113)")

for c in prediction.contributing_signals:
    if c.direction == "skip": continue
    print(f"  {c.name:24} dir={c.direction:4}  "
          f"AUC={c.calibration_auc:.3f}  n={c.n_samples}")

print(prediction.disclaimer)  # Always non-empty (INV-GS-104)

Add your own thesis

The infrastructure is independent of which thesis you screen. New thesis modules need calibration data (n ≥ 50, AUC, Sharpe, OOS) attached to the PR (INV-GS-026).

1️⃣

Write a thesis module

Subclass the Thesis protocol in src/glostat/experts/. Return a typed (direction, raw_score, sources). See docs/EXAMPLES.md for a working template.

2️⃣

Register a data source

Add a routing entry in data_router.py. The DataRouter enforces phase gating so paid sources stay blocked until you explicitly opt in.

3️⃣

Run the hindcast

Configure Hindcast, point at a universe, get an IS/OOS report with AUC, Sharpe, and Brier score. Minimum n=50 required (INV-GS-026).

4️⃣

Submit a calibration row

Append the result to calibration_table.parquet. The Brier-weighted ensemble picks the weight automatically at the next recalibration run.