Advanced Real-Time Object Detection & Action Recognition System
- Real-time multi-object tracking with YOLOv8
- 3D pose estimation using MediaPipe
- Action recognition (SlowFast + ST-GCN)
- TensorRT optimization for edge deployment
- Docker support with GPU acceleration
- CI/CD pipeline with automated testing
- Python 3.8+ (recommended: 3.10)
- CUDA 11.8+ (for GPU acceleration)
- OpenCV 4.7+
- At least 4GB RAM
- Webcam or video files for testing
- Clone the repository:
git clone https://github.com/superuser303/VisionFlow
cd VisionFlow- Create virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install --upgrade pip
pip install -r requirements.txt- Set up Python path:
export PYTHONPATH="${PYTHONPATH}:$(pwd)" # On Windows: set PYTHONPATH=%PYTHONPATH%;%cd%- Build the container:
docker build -t visionflow:latest .- Run with GPU support:
docker run --gpus all -it --rm \
-v /dev/video0:/dev/video0 \
-v $(pwd)/data:/app/data \
visionflow:latest- Open project in VS Code
- Install "Dev Containers" extension
- Press
Ctrl+Shift+P→ "Dev Containers: Reopen in Container"
python scripts/run_webcam.pypython scripts/run_webcam.py --source path/to/video.mp4python scripts/train.py --config configs/model_config.yaml --data data/dataset.yamljupyter notebook notebooks/VisionFlow_Demo.ipynbEdit configs/model_config.yaml to customize:
- Detection: Model type, confidence thresholds, input size
- Pose Estimation: MediaPipe complexity settings
- Tracking: DeepSORT parameters
- Training: Batch size, learning rate, epochs
- Organize your dataset:
data/custom_dataset/
├── train/
│ ├── images/
│ └── labels/
├── val/
│ ├── images/
│ └── labels/
└── dataset.yaml
- Update
data/dataset.yaml:
path: ../data/custom_dataset
train: train/images
val: val/images
names:
0: person
1: car
2: custom_classThe system automatically downloads COCO sample images for testing, or creates synthetic test data if download fails.
from src.detection.yolo_wrapper import YOLODetector
import cv2
detector = YOLODetector("yolov8n.pt")
frame = cv2.imread("test_image.jpg")
detections = detector.detect(frame)from src.pose.mediapipe_wrapper import PoseEstimator
import cv2
estimator = PoseEstimator()
frame = cv2.imread("person_image.jpg")
landmarks = estimator.estimate(frame)from src.utils.video_processor import VideoHandler
video = VideoHandler("input_video.mp4")
while True:
ret, frame = video.read()
if not ret: break
# Process frame here# Install test dependencies
pip install pytest pytest-cov flake8
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=src --cov-report=html# Linting
flake8 src/ --count --select=E9,F63,F7,F82 --show-source --statistics
# Code formatting
black src/ tests/ scripts/pip install pre-commit
pre-commit installpython scripts/deploy_edge.py # Converts to ONNX/TensorRT# Export to ONNX
python -c "from ultralytics import YOLO; YOLO('yolov8n.pt').export(format='onnx')"
# Export to TensorRT (requires TensorRT)
python -c "from ultralytics import YOLO; YOLO('yolov8n.pt').export(format='engine')"-
Import Errors:
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
-
CUDA Not Found:
- Install CUDA 11.8+
- Verify:
nvidia-smi - Install PyTorch with CUDA:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
-
Camera Not Working:
# Check available cameras python -c "import cv2; print([i for i in range(10) if cv2.VideoCapture(i).isOpened()])"
-
Memory Issues:
- Reduce batch size in
configs/model_config.yaml - Use smaller model:
yolov8n.ptinstead ofyolov8x.pt
- Reduce batch size in
-
Missing Test Data:
python -c "from src.utils.coco_utils import setup_test_dataset; setup_test_dataset('data/samples')"
- Use smaller models (
yolov8n.pt) - Reduce input resolution (320x320)
- Enable TensorRT optimization
- Use GPU acceleration
- Use larger models (
yolov8x.pt) - Higher input resolution (640x640)
- Lower confidence thresholds
- Custom training on domain-specific data
YOLODetector.detect(frame)→ Returns bounding boxesYOLODetector.class_names→ Class name mapping
PoseEstimator.estimate(frame)→ Returns pose landmarks- MediaPipe pose connections available
VideoHandler(source)→ Initialize video captureVideoHandler.read()→ Get next frameVideoHandler.get_properties()→ Video metadata
- Fork the repository
- Create feature branch:
git checkout -b feature-name - Run tests:
pytest tests/ - Submit pull request
MIT License - see LICENSE file
- Initial release with YOLOv8 + MediaPipe
- Docker support
- CI/CD pipeline
- Test dataset generation
Need Help?
- Check the Issues page
- Run the Jupyter notebook demo for examples
- Review test files for usage patterns