Semantic Video Search with DINOv3 and CLIP
Building a video search system that uses DINOv3 embeddings for scene understanding and CLIP for text queries, with a learned projector to bridge the two.
All work and posts
Building a video search system that uses DINOv3 embeddings for scene understanding and CLIP for text queries, with a learned projector to bridge the two.
A distributed system for monitoring stock sentiment by scraping news, extracting structured information with LLMs, and indexing for semantic search.
Building a system that analyzes keyboard typing patterns to infer cognitive health metrics like attention, impulse control, and mood stability.
Building a real-time video analysis system that tracks people across frames and infers multiple attributes using CLIP and YOLO.
A real-time computer vision system that combines object detection, pose estimation, and ballistics calculations to automatically adjust for environmental conditions.