Perpetual Futures Grid Trading System

October 1, 2025

Production algorithmic trading system for perpetual futures — multi-account, event-driven architecture with dynamic progressive grid and 10-level risk management.

Tech Stack

Core

Python 3.9+pybit 5.7.0websocket-clientpandasthreading

Data & Config

JSON (atomic writes)YAML configpython-dotenvpyyaml

Infrastructure

Linux VPSsystemd (Type=notify)sdnotifyTelegram alerts

Testing

pytestpytest-asynciopytest-cov280 test functions

Key Results

  • v2.12.0 — running in production since October 2025
  • 43,770 lines of Python across 41 source files
  • 10-level risk management with automatic emergency response
  • 6-step crash recovery reconstructing full state from exchange
  • 248 commits over 4.5 months of active development

Overview

A proprietary algorithmic trading system for perpetual futures markets, built entirely from scratch. The system manages multiple isolated trading accounts simultaneously, each running independent dynamic grid strategies across multiple symbols — with shared WebSocket connections to price feeds to minimize exchange load.

This is a closed-source personal project. The description focuses on architecture and engineering decisions, not trading strategy or performance.

Architecture

The system is a modular monolith designed with SaaS-ready multi-tenancy from the start:

MultiAccountBot (orchestrator)
  → TradingAccount[] (per account)
      → GridStrategy[] (per symbol)
          → StateManager, OrderExecutor, EventQueue, TPCalculator
      → BybitClient (REST), BalanceManager, ProfitProtectionManager
  → PublicWebSocket[] (shared per symbol+environment)
  → PrivateWebSocket[] (per account)

Key principle: each account is fully isolated — separate logs, state files, and emergency flags. But public WebSocket connections to price feeds are shared across accounts trading the same symbol, reducing exchange connections.

Event-Driven Threading

The system uses clean threading (not asyncio) — a deliberate choice for financial systems where predictability matters more than throughput. WebSocket callbacks are non-blocking: they put events into a priority queue, and a dedicated EventWorker processes them sequentially.

# Priority queue: Execution events (P0) before Order events (P1)
# Prevents order state from being updated before execution is processed
heapq.heappush(self._queue, (priority, timestamp, event))

Eight explicit RLock/Lock instances guard shared resources across components. No shared mutable state without a lock.

State Persistence

No database — state is stored in JSON files with atomic writes:

# Atomic write: temp file → rename (OS-level atomic on Linux)
with tempfile.NamedTemporaryFile(mode='w', dir=dir, delete=False) as f:
    json.dump(state, f, indent=2)
    temp_path = f.name
os.replace(temp_path, state_path)  # atomic

This means a crash mid-write never corrupts the state file. The exchange API is the single source of truth on restart.

Dynamic Progressive Grid

Grid step size is not fixed — it's calculated from market volatility:

# EMA of daily ATR% over 10 periods, scaled to 10-day horizon
atr_ema = pandas_ewm(daily_atr_pct, span=10).mean().iloc[-1]
base_step = atr_ema * sqrt(10)

# Steps increase with level: L1 is minimum, L10 is ~3.16× larger
step[n] = base_step * scale * sqrt(n)

This is computed once on a fresh start and then immutable — preventing configuration drift across restarts.

Position sizing uses martingale doubling: qty[level] = base_qty × 2^level. Base quantity is also locked after initial calculation.

Risk Management — 10 Levels

LevelTriggerResponse
1IM Rate ≥ 90%Block workflow, move balance orders to 0.01% from market
2High IM Rate + profitClose position
3Trailing stopLock profit on drawdown from peak
4Position value > 40% of risk limitBlock new orders
5Position limit timeoutForce close stuck positions
6Both sides profitableClose both simultaneously
7TP safety check (hysteresis 5%/10%)Pause TP if insufficient margin for reopening
8TP safety pre-checkBlock protective orders if margin insufficient
9Level rate limitBudget: ceil(max_levels × 0.5) level-ups per 48h sliding window
10Emergency stopFile-based flag requiring manual reset

Level 7–8 deserve explanation: at high grid levels, taking profit would leave insufficient margin to reopen the position. Without this check, the system would keep averaging down (wasting capital) while TP remains unreachable. The solution: block protective orders so TP stays active and the position can actually close.

Crash Recovery

On restart, a 6-step workflow reconstructs full state from the exchange:

1. Cancel all open orders
2. Clear local state
3. Fetch actual positions from exchange
4. Detect state: none / long_only / short_only / both
5. Place take-profit orders
6. Place balance + protective orders

The position_level_analyzer.py (2,354 lines) reverse-engineers grid levels from order history, with backward compatibility for 5+ historical orderLinkId formats across versions.

Hard Problems

Independent Cycle Tracking (v2.8.0)

LONG and SHORT sides take profit at different frequencies. A shared cycle counter caused a subtle bug: SHORT would TP 3× (cycle→13) while LONG was still at L5 (cycle=10). On restart, the restoration logic searched for LONG positions using cycle=13 — found nothing — and triggered Emergency Stop.

Fix: independent long_cycle/short_cycle counters with L/S prefixes in orderLinkId. Simple in retrospect, invisible until it happened in production.

AMEND Workflow Removal (v2.3.5)

Originally, order modification used Bybit's AMEND endpoint. This caused a systemic failure: Bybit's Order History API sorts by createdTime, and AMEND preserves the original timestamp. Amended orders couldn't be found by the restoration logic → Emergency Stop on every restart.

Fix: remove AMEND entirely. Cancel + replace instead. More API calls, zero ambiguity.

Partial Fill Race Condition

Discovered after a real incident (2025-11-01): a partial fill triggered the workflow while the remaining quantity was still being filled. Two concurrent workflows → inconsistent state.

Fix: _partially_filled_freeze flag suspends the workflow until fill is complete or times out.

Scale

MetricValue
Source code27,677 lines (41 files)
Tests7,939 lines (19 files, 280 functions)
Scripts8,154 lines (35 diagnostic tools)
Documentation17 markdown files, 385KB
Total Python~43,770 lines
Commits248
DevelopmentOct 2025 → present
Versionv2.12.0
StatusRunning in production (systemd service)
Iurii RoguliaAvailable

Need something similar?

I build custom solutions — from APIs to full products. Let's talk about your project.

View all projects

Related projects