Building a Smart Trajectory System

I wanted to build a system that could automatically account for wind, distance, and environmental conditions in real-time. So I put together a pipeline that uses computer vision to detect targets, estimate distance, and then runs ballistics calculations to predict where shots will land.

System Architecture

The system runs on a Jetson device and processes video frames through multiple stages:

Camera Feed
    ↓
[1] Object Detection (YOLOv7)
    ↓
    Detections + Distance Estimation
    ↓
[2] Pose Estimation (3D Pose)
    ↓
    Keypoint Locations
    ↓
[3] Trajectory Calculation
    ↓
    Windage + Elevation Adjustments
    ↓
[4] Display Overlay
    ↓
    Annotated Video Output

Stage 1: Object Detection and Distance Estimation

I use YOLOv7 running on Triton Inference Server to detect people and objects in each frame. The detector runs at 416x416 resolution for speed, and I filter detections by confidence threshold (0.55).

class Detector():
    def __init__(self, dim=416, triton_url="localhost:8001", threshold=0.55):
        self.triton = grpcclient.InferenceServerClient(url=triton_url)
        # ... setup inputs/outputs
    
    def detect(self, frame):
        batch = self._preprocess(frame)
        results = self._inference(batch)
        return self._postprocess(*results)

Distance estimation uses the bounding box size and a calibration model. I trained XGBoost regressors on real-world data to predict distance from bounding box dimensions, accounting for the camera's field of view.

Stage 2: Pose Estimation

For more precise targeting, I use a 3D pose estimation model to extract keypoints from detected people. The model outputs 18 keypoints (neck, shoulders, elbows, wrists, hips, knees, ankles, eyes, ears) along with heatmaps and part affinity fields.

class PoseDetector():
    def detect(self, frame):
        batch = self._preprocess(frame)
        features, heatmaps, pafs = self._inference(batch)
        keypoints = pose3d_postprocess(heatmaps, pafs, self.eR, self.et)
        return keypoints

I use the keypoints to calculate distance more accurately - the spacing between shoulders or hips gives a better distance estimate than bounding boxes alone. The pose model also helps with orientation detection, which matters for lead calculations.

Stage 3: Trajectory Calculation

This is where the physics comes in. I use XGBoost regressors trained on Hornady's 4DOF ballistic calculator to predict windage and elevation adjustments. The models take into account:

Distance to target
Bullet velocity (muzzle velocity)
Bullet weight
Wind speed and direction
Temperature, pressure, humidity
Altitude
Barrel twist rate
Coriolis effect (latitude-dependent)

def worker():
    cu_regressor = xgb.XGBRegressor()
    cu_regressor.load_model("xgb_cu_regressor-v2.json")
    wd_regressor = xgb.XGBRegressor()
    wd_regressor.load_model("xgb_wd_regressor-v2.json")
    
    while True:
        distance = float(rv("distance"))
        if distance != last_distance:
            # Calculate windage and elevation
            windage = wd_regressor.predict([features])
            elevation = cu_regressor.predict([features])
            # Update Redis with adjustments

I read sensor data from Redis, which is updated by separate processes that handle:

IMU (BNO08x) for device orientation
Barometric pressure sensor (BMP3xx) for altitude
Temperature/humidity sensors (SHT31D, SHTC3)
GPS for location (latitude affects Coriolis)

The trajectory calculation runs in a loop, checking for distance changes and recalculating adjustments when needed.

Stage 4: Video Processing Pipeline

The whole thing runs on GStreamer for efficient video processing. I use shared memory between processes to pass frames around without copying:

scan_shm = shared_memory.SharedMemory(name="scan_frame")
scan_shm_frame = np.ndarray((SCAN_DIM, SCAN_DIM, 3), dtype=np.uint8, buffer=scan_shm.buf)

The pipeline has two video streams:

Scan stream: Lower resolution for object detection (wider field of view)
Zoom stream: Higher resolution cropped region for precise targeting

I use Redis to coordinate crop regions - when an object is detected, the system updates the zoom crop to center on it.

The Sensor Stack

Environmental data comes from multiple sensors:

BNO08x IMU: Device orientation (pitch, yaw, roll) for compensating device tilt
BMP3xx: Barometric pressure for altitude calculation
SHT31D/SHTC3: Temperature and humidity for air density calculations
GPS: Location for Coriolis effect calculations

All sensor data gets written to Redis, where the trajectory calculation service reads it. This decouples sensor polling from the main video processing pipeline.

Real-Time Performance

On a Jetson Xavier NX:

Object detection: ~20ms per frame at 416x416
Pose estimation: ~30ms per frame
Trajectory calculation: <1ms (just model inference)
Total pipeline latency: ~50-60ms, which is good enough for real-time use

The bottleneck is the neural network inference. I use TensorRT-optimized models where possible, and Triton Inference Server for efficient batching and GPU utilization.

Challenges

Sensor synchronization: Different sensors update at different rates. I use Redis with timestamps to handle this, but there's still some jitter in the measurements.

Distance estimation accuracy: The XGBoost models work well within their training range, but performance degrades outside of it. I had to collect a lot of calibration data at different distances to get good coverage.

Frame rate consistency: GStreamer pipelines can drop frames under load. I use frame queues and drop policies to maintain real-time performance, but this means some frames get skipped during heavy processing.

Coordinate system alignment: Converting between camera coordinates, world coordinates, and device orientation is tricky. I use extrinsic calibration matrices to transform between coordinate systems, but small errors compound over distance.

What I Learned

This project taught me a lot about building real-time computer vision systems:

Shared memory is fast: Passing frames via shared memory between processes is much faster than copying or using Redis. I use numpy arrays backed by shared memory for zero-copy frame passing.
Triton Inference Server is great: Running models on Triton gives you automatic batching, model versioning, and easy scaling. It's way better than running PyTorch models directly.
XGBoost for physics: Training XGBoost models on ballistic calculator data gives you fast, accurate predictions without needing to implement the full physics equations. The models learn the relationships between inputs and outputs.
Sensor fusion matters: No single sensor gives you everything you need. Combining IMU, pressure, temperature, and GPS data gives you a much more complete picture of the environment.
Real-time is hard: Keeping everything synchronized and running at 30fps requires careful attention to latency at every stage. Profiling is essential.

The system works well for its intended use case - automatically adjusting for environmental conditions in real-time. It's also a good example of how to combine computer vision, sensor data, and machine learning in an embedded system.