Perpetual Futures Grid Trading System

Overview

A proprietary algorithmic trading system for perpetual futures markets, built entirely from scratch. The system manages multiple isolated trading accounts simultaneously, each running independent dynamic grid strategies across multiple symbols — with shared WebSocket connections to price feeds to minimize exchange load.

This is a closed-source personal project. The description focuses on architecture and engineering decisions, not trading strategy or performance.

Architecture

The system is a modular monolith designed with SaaS-ready multi-tenancy from the start:

MultiAccountBot (orchestrator)
  → TradingAccount[] (per account)
      → GridStrategy[] (per symbol)
          → StateManager, OrderExecutor, EventQueue, TPCalculator
      → BybitClient (REST), BalanceManager, ProfitProtectionManager
  → PublicWebSocket[] (shared per symbol+environment)
  → PrivateWebSocket[] (per account)

Key principle: each account is fully isolated — separate logs, state files, and emergency flags. But public WebSocket connections to price feeds are shared across accounts trading the same symbol, reducing exchange connections.

Event-Driven Threading

The system uses clean threading (not asyncio) — a deliberate choice for financial systems where predictability matters more than throughput. WebSocket callbacks are non-blocking: they put events into a priority queue, and a dedicated EventWorker processes them sequentially.

# Priority queue: Execution events (P0) before Order events (P1)
# Prevents order state from being updated before execution is processed
heapq.heappush(self._queue, (priority, timestamp, event))

Eight explicit RLock/Lock instances guard shared resources across components. No shared mutable state without a lock.

State Persistence

No database — state is stored in JSON files with atomic writes:

# Atomic write: temp file → rename (OS-level atomic on Linux)
with tempfile.NamedTemporaryFile(mode='w', dir=dir, delete=False) as f:
    json.dump(state, f, indent=2)
    temp_path = f.name
os.replace(temp_path, state_path)  # atomic

This means a crash mid-write never corrupts the state file. The exchange API is the single source of truth on restart.

Dynamic Progressive Grid

Grid step size is not fixed — it's calculated from market volatility:

# EMA of daily ATR% over 10 periods, scaled to 10-day horizon
atr_ema = pandas_ewm(daily_atr_pct, span=10).mean().iloc[-1]
base_step = atr_ema * sqrt(10)

# Steps increase with level: L1 is minimum, L10 is ~3.16× larger
step[n] = base_step * scale * sqrt(n)

This is computed once on a fresh start and then immutable — preventing configuration drift across restarts.

Position sizing uses martingale doubling: qty[level] = base_qty × 2^level. Base quantity is also locked after initial calculation.

Risk Management — 10 Levels

Level	Trigger	Response
1	IM Rate ≥ 90%	Block workflow, move balance orders to 0.01% from market
2	High IM Rate + profit	Close position
3	Trailing stop	Lock profit on drawdown from peak
4	Position value > 40% of risk limit	Block new orders
5	Position limit timeout	Force close stuck positions
6	Both sides profitable	Close both simultaneously
7	TP safety check (hysteresis 5%/10%)	Pause TP if insufficient margin for reopening
8	TP safety pre-check	Block protective orders if margin insufficient
9	Level rate limit	Budget: ceil(max_levels × 0.5) level-ups per 48h sliding window
10	Emergency stop	File-based flag requiring manual reset

Level 7–8 deserve explanation: at high grid levels, taking profit would leave insufficient margin to reopen the position. Without this check, the system would keep averaging down (wasting capital) while TP remains unreachable. The solution: block protective orders so TP stays active and the position can actually close.

Crash Recovery

On restart, a 6-step workflow reconstructs full state from the exchange:

1. Cancel all open orders
2. Clear local state
3. Fetch actual positions from exchange
4. Detect state: none / long_only / short_only / both
5. Place take-profit orders
6. Place balance + protective orders

The position_level_analyzer.py (2,354 lines) reverse-engineers grid levels from order history, with backward compatibility for 5+ historical orderLinkId formats across versions.

Hard Problems

Independent Cycle Tracking (v2.8.0)

LONG and SHORT sides take profit at different frequencies. A shared cycle counter caused a subtle bug: SHORT would TP 3× (cycle→13) while LONG was still at L5 (cycle=10). On restart, the restoration logic searched for LONG positions using cycle=13 — found nothing — and triggered Emergency Stop.

Fix: independent long_cycle/short_cycle counters with L/S prefixes in orderLinkId. Simple in retrospect, invisible until it happened in production.

AMEND Workflow Removal (v2.3.5)

Originally, order modification used Bybit's AMEND endpoint. This caused a systemic failure: Bybit's Order History API sorts by createdTime, and AMEND preserves the original timestamp. Amended orders couldn't be found by the restoration logic → Emergency Stop on every restart.

Fix: remove AMEND entirely. Cancel + replace instead. More API calls, zero ambiguity.

Partial Fill Race Condition

Discovered after a real incident (2025-11-01): a partial fill triggered the workflow while the remaining quantity was still being filled. Two concurrent workflows → inconsistent state.

Fix: _partially_filled_freeze flag suspends the workflow until fill is complete or times out.

Scale

Metric	Value
Source code	27,677 lines (41 files)
Tests	7,939 lines (19 files, 280 functions)
Scripts	8,154 lines (35 diagnostic tools)
Documentation	17 markdown files, 385KB
Total Python	~43,770 lines
Commits	248
Development	Oct 2025 → present
Version	v2.12.0
Status	Running in production (systemd service)