Latent Space — Research — The Finance Lab

01 · The Problem

Raw market data is a hostile learning environment.

Reinforcement learning agents fail when the raw state space is too noisy, sparse, or unstable. The same price pattern means different things under different volatility regimes, liquidity conditions, macro contexts, or historical setups.

Instead of feeding agents raw variables, we learn compressed representations of market structure — latent states that preserve regime, trend, volatility, risk asymmetry, market memory, and scenario probability.

≋

Noise

⟳

Non-stationarity

◈

Regime shifts

⟲

Delayed effects

⌁

Hidden dependencies

⊕

Multiple timescales

◌

Sparse rewards

◎

Partial observability

02 · The Pipeline

Data in. Latent structure out.

The market state is not a flat vector of indicators. It is encoded into a latent representation that summarizes the structure of the current market context — which the RL agent then uses to reason, act, and improve.

⬡

Input

Market Data

⟳Encode
Representation Model

◉Compress
Latent Market State

▸

Reason

RL Agent

⇢

Decide

Scenario / Policy

◎

Observe

Market Outcome

★

Signal

Reward

↑Improve
Update Representation

03 · Techniques

Six families of latent representation.

Different methods suit different tasks. A VAE is useful for probabilistic scenario generation. A SOM reveals regime topology. A transformer encoder captures long-range dependencies. A world model simulates future transitions.

AE / VAE

Autoencoders

Learn compressed market state by reconstructing input through a bottleneck. VAE adds a probabilistic latent space — ideal for uncertainty-aware encoding and scenario sampling.

Regime compressUncertaintySampling

SOM

Self-Organizing Maps

Project high-dimensional market states into a structured 2D topology preserving similarity. Each region maps to a distinct market condition: trending, ranging, volatile, compressing.

Regime mapRetrievalInterpretability

Contrastive

Metric Learning

Pull similar market states together, push dissimilar ones apart. Similarity defined by forward return distribution, volatility regime, or reward consequence — not just visual resemblance.

Reward-awareSimilarityRetrieval

Transformer

Sequence Encoders

Capture long-range temporal dependencies, multi-timeframe context, and regime transitions. The meaning of a market state depends on the path that produced it.

TemporalMulti-TFAttention

World Model

Latent Dynamics

Learn not just current state representation but how states evolve over time. Creates an internal market simulator for scenario rollout, risk estimation, and policy evaluation.

SimulationRolloutPolicy eval

Auxiliary Tasks

Reward-Shaped Rep.

Train representations with auxiliary objectives tied directly to future reward. Similar market states should be close not because they look alike, but because they imply similar decision risk and return distributions.

Reward-shapedDecision-awareRobust

04 · Regime Map

The market as a navigable topology.

A self-organizing map trained on historical SPX states. Each cell is a learned region of market structure. Color indicates dominant regime classification. The agent moves through this map as conditions evolve — learning distinct policies for each region.

Trend-following

Low-vol compression

Breakout zone

High-vol / Risk-off

Mean-reversion

Transition

05 · Probabilistic Encoding

VAE — a distribution over possible states.

Instead of mapping each market state to a fixed point, a variational autoencoder learns a distribution over latent states. The same observed conditions can imply a range of possible market structures — which maps naturally to scenario uncertainty.

Research Question

Can probabilistic latent states from a VAE improve calibrated scenario reasoning — producing better-grounded probability estimates than deterministic encoding?

06 · World Model

A learned internal simulator of market transitions.

A world model learns latent dynamics: how states evolve, what transitions are likely, and what reward can be expected — without requiring live market interaction. The goal is not perfect forecasting. The goal is a structured model of possible transitions.

01

◉

Latent State z_t

Compressed market context at time T — regime, momentum, vol structure.

02

⟳

Transition Model

Predicts distribution over next latent states p(z_t+1 | z_t, a_t).

03

⇢

Rollout

Simulate multiple future trajectories in latent space — each branch is a scenario.

04

★

Reward Estimation

Estimate expected reward and risk along each trajectory without live market data.

05

↑

Policy Improvement

Use simulated rollouts to update the agent's policy — safer than live exploration.

07 · Applications

Six areas where latent representations change the game.

App · 01

Regime Detection

Latent clusters automatically identify market regimes — trending, mean-reverting, high-volatility, compressing, transition — without manual labeling.

SOMVAEClustering

App · 02

Scenario Probability

Latent states estimate the probability of continuation, reversal, breakout, or volatility expansion — grounded in the structure of the encoded state.

VAEContrastiveRL policy

App · 03

Case-Based Retrieval

Given a current latent state, retrieve historically similar states and their forward outcomes — providing empirical priors for scenario reasoning.

Metric learningMemoryKNN

App · 04

Offline RL

Cleaner latent states improve offline reinforcement learning from historical episodes — reducing overfitting to surface-level patterns in raw data.

Decision transformerIQLCQL

App · 05

Policy Generalization

Agents trained on latent representations generalize better across regime changes because the representation abstracts away regime-specific surface features.

Actor-criticTransferRobustness

App · 06

Multimodal Fusion

Combine numerical features with chart image embeddings in the same latent space — testing whether visual market structure adds decision-useful signal.

VLMChart embedMultimodal

08 · Challenges

Five hard problems we are actively addressing.

#

Challenge

Our approach

01

Regime overfitting — representations trained in one regime fail when market structure changes

Walk-forward validation, regime-separated evaluation, adaptive online encoding

02

Reconstruction ≠ decision quality — a model may reconstruct price features while ignoring reward-relevant information

Reward-shaped auxiliary objectives, contrastive learning with forward-return similarity

03

Delayed, noisy rewards — financial feedback is path-dependent, sparse, and regime-conditioned

Multi-horizon reward labeling, calibration audits, professor-style evaluation

04

Interpretability — latent spaces must be inspectable, auditable, and explainable in financial terms

SOM visualization, cluster labeling, latent traversal probes, regime tagging

05

Temporal data leakage — random train-test splits create misleading results in time-series

Strict walk-forward splits, out-of-sample regime tests, no look-ahead in encoders

Latent Spacefor MarketRepresentation