Perpetual Futures Grid Trading System
Production algorithmic trading system for perpetual futures — multi-account, event-driven architecture with dynamic progressive grid and 10-level risk management.
Tech Stack
Core
Data & Config
Infrastructure
Testing
Key Results
- v2.12.0 — running in production since October 2025
- 43,770 lines of Python across 41 source files
- 10-level risk management with automatic emergency response
- 6-step crash recovery reconstructing full state from exchange
- 248 commits over 4.5 months of active development
Overview
A proprietary algorithmic trading system for perpetual futures markets, built entirely from scratch. The system manages multiple isolated trading accounts simultaneously, each running independent dynamic grid strategies across multiple symbols — with shared WebSocket connections to price feeds to minimize exchange load.
This is a closed-source personal project. The description focuses on architecture and engineering decisions, not trading strategy or performance.
Architecture
The system is a modular monolith designed with SaaS-ready multi-tenancy from the start:
MultiAccountBot (orchestrator)
→ TradingAccount[] (per account)
→ GridStrategy[] (per symbol)
→ StateManager, OrderExecutor, EventQueue, TPCalculator
→ BybitClient (REST), BalanceManager, ProfitProtectionManager
→ PublicWebSocket[] (shared per symbol+environment)
→ PrivateWebSocket[] (per account)
Key principle: each account is fully isolated — separate logs, state files, and emergency flags. But public WebSocket connections to price feeds are shared across accounts trading the same symbol, reducing exchange connections.
Event-Driven Threading
The system uses clean threading (not asyncio) — a deliberate choice for financial systems where predictability matters more than throughput. WebSocket callbacks are non-blocking: they put events into a priority queue, and a dedicated EventWorker processes them sequentially.
# Priority queue: Execution events (P0) before Order events (P1)
# Prevents order state from being updated before execution is processed
heapq.heappush(self._queue, (priority, timestamp, event))
Eight explicit RLock/Lock instances guard shared resources across components. No shared mutable state without a lock.
State Persistence
No database — state is stored in JSON files with atomic writes:
# Atomic write: temp file → rename (OS-level atomic on Linux)
with tempfile.NamedTemporaryFile(mode='w', dir=dir, delete=False) as f:
json.dump(state, f, indent=2)
temp_path = f.name
os.replace(temp_path, state_path) # atomic
This means a crash mid-write never corrupts the state file. The exchange API is the single source of truth on restart.
Dynamic Progressive Grid
Grid step size is not fixed — it's calculated from market volatility:
# EMA of daily ATR% over 10 periods, scaled to 10-day horizon
atr_ema = pandas_ewm(daily_atr_pct, span=10).mean().iloc[-1]
base_step = atr_ema * sqrt(10)
# Steps increase with level: L1 is minimum, L10 is ~3.16× larger
step[n] = base_step * scale * sqrt(n)
This is computed once on a fresh start and then immutable — preventing configuration drift across restarts.
Position sizing uses martingale doubling: qty[level] = base_qty × 2^level. Base quantity is also locked after initial calculation.
Risk Management — 10 Levels
| Level | Trigger | Response |
|---|---|---|
| 1 | IM Rate ≥ 90% | Block workflow, move balance orders to 0.01% from market |
| 2 | High IM Rate + profit | Close position |
| 3 | Trailing stop | Lock profit on drawdown from peak |
| 4 | Position value > 40% of risk limit | Block new orders |
| 5 | Position limit timeout | Force close stuck positions |
| 6 | Both sides profitable | Close both simultaneously |
| 7 | TP safety check (hysteresis 5%/10%) | Pause TP if insufficient margin for reopening |
| 8 | TP safety pre-check | Block protective orders if margin insufficient |
| 9 | Level rate limit | Budget: ceil(max_levels × 0.5) level-ups per 48h sliding window |
| 10 | Emergency stop | File-based flag requiring manual reset |
Level 7–8 deserve explanation: at high grid levels, taking profit would leave insufficient margin to reopen the position. Without this check, the system would keep averaging down (wasting capital) while TP remains unreachable. The solution: block protective orders so TP stays active and the position can actually close.
Crash Recovery
On restart, a 6-step workflow reconstructs full state from the exchange:
1. Cancel all open orders
2. Clear local state
3. Fetch actual positions from exchange
4. Detect state: none / long_only / short_only / both
5. Place take-profit orders
6. Place balance + protective orders
The position_level_analyzer.py (2,354 lines) reverse-engineers grid levels from order history, with backward compatibility for 5+ historical orderLinkId formats across versions.
Hard Problems
Independent Cycle Tracking (v2.8.0)
LONG and SHORT sides take profit at different frequencies. A shared cycle counter caused a subtle bug: SHORT would TP 3× (cycle→13) while LONG was still at L5 (cycle=10). On restart, the restoration logic searched for LONG positions using cycle=13 — found nothing — and triggered Emergency Stop.
Fix: independent long_cycle/short_cycle counters with L/S prefixes in orderLinkId. Simple in retrospect, invisible until it happened in production.
AMEND Workflow Removal (v2.3.5)
Originally, order modification used Bybit's AMEND endpoint. This caused a systemic failure: Bybit's Order History API sorts by createdTime, and AMEND preserves the original timestamp. Amended orders couldn't be found by the restoration logic → Emergency Stop on every restart.
Fix: remove AMEND entirely. Cancel + replace instead. More API calls, zero ambiguity.
Partial Fill Race Condition
Discovered after a real incident (2025-11-01): a partial fill triggered the workflow while the remaining quantity was still being filled. Two concurrent workflows → inconsistent state.
Fix: _partially_filled_freeze flag suspends the workflow until fill is complete or times out.
Scale
| Metric | Value |
|---|---|
| Source code | 27,677 lines (41 files) |
| Tests | 7,939 lines (19 files, 280 functions) |
| Scripts | 8,154 lines (35 diagnostic tools) |
| Documentation | 17 markdown files, 385KB |
| Total Python | ~43,770 lines |
| Commits | 248 |
| Development | Oct 2025 → present |
| Version | v2.12.0 |
| Status | Running in production (systemd service) |
AvailableNeed something similar?
I build custom solutions — from APIs to full products. Let's talk about your project.