Dijkstra + Reinforcement Learning  ·  Python & React

Trains that
think ahead.

AI-powered railway traffic control using graph-based scheduling and reinforcement learning. Fewer delays. Less manual work. More trains on time.

View on GitHub See how it works →
railflow — live network · simulation mode
12 trains active
w:4 w:6 w:3 w:5 w:2 w:9 SRC A B C HUB D E F G H I DST optimal path (Dijkstra) congested (RL penalty)
45%
lower route latency
30%
better on-time rate
60%
less manual work
O(E log V)
Dijkstra complexity

Intelligence built
into every route.

Graph-based scheduling

The rail network is modelled as a weighted graph. Dijkstra's algorithm finds the globally optimal path for every train in O(E log V) time.

Reinforcement learning

A Python RL agent is penalized for every minute of delay. Over thousands of simulation steps it learns to preempt congestion before it forms.

Automated dispatch

Scheduling decisions are executed without human intervention. The system handles rerouting, priority queuing, and conflict resolution autonomously.

Real-time rerouting

When a segment is blocked or congested, the system recalculates an alternate optimal path within milliseconds — no dispatcher required.

React dashboard

Live network topology, train positions, delay heatmaps, and RL reward curves — all visualized in a real-time React frontend.

High-density scaling

Designed for networks with hundreds of simultaneous trains. Automated logic removes the human bottleneck that caps throughput in dense corridors.

Two algorithms.
One system.

Dijkstra handles the geography. RL handles the uncertainty. Together they cover what neither could alone.

1

Model the network as a graph

Stations are vertices. Track segments are weighted edges. Edge weight encodes distance, speed limit, and current congestion level.

Graph theory
2

Run Dijkstra at dispatch time

For every train departure, Dijkstra finds the globally shortest path from origin to destination in O(E log V). This becomes the base schedule.

Dijkstra's algorithm
3

RL agent monitors in simulation

A Python RL agent observes the state space (train positions, segment loads, delay accumulations) and learns a policy that minimizes total delay reward.

Reinforcement learning
4

Override and reroute dynamically

When the RL policy predicts a delay penalty exceeding the threshold, it triggers a re-run of Dijkstra with updated edge weights — rerouting before the congestion forms.

Dynamic rerouting
5

React dashboard surfaces decisions

Every dispatch, reroute, and penalty event is streamed to the frontend in real time. Operators see the network, not a spreadsheet.

React + WebSocket

Three layers.
Built to integrate.

Dijkstra's Algorithm

Core routing engine. Models the rail network as a weighted directed graph. Guarantees shortest-path solution in every scheduling cycle.

Python · heapq · networkx

Reinforcement Learning

Python RL agent trained in simulation. Reward function penalizes delay accumulation; policy learns to reroute proactively in congested conditions.

Python · OpenAI Gym · stable-baselines3

React Frontend

Real-time dashboard displaying live train positions, delay heatmaps, optimal path overlays, and RL reward curves over time.

React · WebSocket · Recharts

Numbers from
the real runs.

All metrics measured against a baseline scheduler with no graph optimisation and no RL, running identical traffic loads in simulation.

45%

Route-decision latency reduced

Graph-based Dijkstra scheduling cuts the time between a train's departure trigger and its confirmed route assignment by nearly half.

30%

On-time performance improved

RL penalty-based training taught the agent to anticipate high-load segments and preemptively reroute, keeping trains on schedule.

60%

Manual intervention eliminated

Automated dispatch and real-time rerouting removed the need for a human dispatcher on the majority of scheduling decisions.

The algorithm
in full.

dijkstra.py
import heapq def dijkstra(graph, source): dist = {node: float('inf') for node in graph} dist[source] = 0 prev = {} heap = [(0, source)] while heap: d, u = heapq.heappop(heap) if d > dist[u]: continue for v, weight in graph[u].items(): alt = dist[u] + weight if alt < dist[v]: dist[v] = alt prev[v] = u heapq.heappush(heap, (alt, v)) return dist, prev def get_path(prev, target): path = [] while target in prev: path.append(target) target = prev[target] path.append(target) return path[::-1]
rl_agent.py
class RailEnv(gym.Env): def __init__(self, graph): self.graph = graph self.obs_space = spaces.Box(...) self.act_space = spaces.Discrete( len(graph.edges) ) def step(self, action): # apply reroute decision self._apply_action(action) obs = self._get_obs() delay = self._total_delay() # reward penalises every minute late reward = -delay * PENALTY_FACTOR done = self.timestep >= MAX_STEPS self.timestep += 1 return obs, reward, done, {} def reset(self): self.timestep = 0 self._init_trains() return self._get_obs()

Open source

The network
never sleeps.

Explore the full simulation, training logs, and React dashboard on GitHub.

View on GitHub Read the docs →