Update example for converting DLC project to benchmark (v2) by niksirbi · Pull Request #49 · neuroinformatics-unit/poseinterface

niksirbi · 2026-05-12T23:06:12Z

Summary

Clean replacement for #40 (Frankenstein branch). Starts fresh from main, builds on top of the now-merged #45 (predictions_to_poseinterface), and adds the gallery example plus supporting utilities and a lightweight test fixture.

The example walks through the following workflow:

Warning

This example specifically documents the DLC-to-poseinterface conversion process. Trying it on other pose estimation software is encouraged, but adaptations will be needed, and the underlying functions are not yet tested on non-DLC inputs. Other source software will be tackled in future PRs.

What's included

frames_to_poseinterface: copies and renames frame images to match the filenames in a framelabels.json (unit tests included).
Lightweight DLC project fixture under tests/data/dlc/MouseTopDown-Loukia-2022-09-13/ : 2 sessions, short MP4 videos, placeholder PNG frames, annotation + prediction CSVs. Used by tests and the gallery example.
convert_dlc_to_benchmark sphinx-gallery example: replacing the old SWC-plusmaze_to_benchmark example. Runs end-to-end against the bundled fixture and demonstrates the full DLC → benchmark conversion (video, framelabels + frame copy, videolabels) followed by clip extraction.
tree utility: for displaying directory structures, used in the example (unit tests included).
movement pin bumped to >=0.16.0 (the version introducing the automatic source-software inference, which predictions_to_poseinterface now relies on).
Docs updates:
- sphinx-gallery execution enabled in conf.py
- API index entries for tree, frames_to_poseinterface, and predictions_to_poseinterface.
- Docs dependencies: jupyter, matplotlib.

Supersedes

Update example for converting DLC project to benchmark #40

How was this tested

I used a variant of this example locally to convert 2 sessions from a 'real' dataset (not just the fixture) and inspect the resulting .json files. The real dataset was structured similarly to the fixture included here.

Checklist

All unit tests pass locally
Pre-commit run
Sphinx-gallery example runs end-to-end (python examples/convert_dlc_to_benchmark.py)
Docs build successfully (make clean html from docs/)
CI passes

How to review

I recommend primarily reviewing the built example end-to-end, and then diving into the newly introduced tree/frames_to_poseinterface functions if/when necessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add frames_to_poseinterface utility to copy and rename frame images according to filenames in a COCO JSON file. Also fix the output filename of predictions_to_poseinterface to use _cliplabels.json suffix matching the naming convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a minimal DLC project under tests/data/dlc/ with 2 sessions, each containing a small (100 frames) video, placeholder PNGs, and annotation/prediction CSVs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace SWC-plusmaze_to_benchmark example with a new end-to-end example showing how to convert a DLC project to the poseinterface benchmark dataset format. Update sphinx-gallery config to execute examples, add API entries for new functions, and add jupyter and matplotlib dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lochhh

Thanks @niksirbi !

I tried the example with a sample dlc project (that has annotations only) and it broke when it tries to find the predictions file. Otherwise the example works as expected. My suggestion is to skip the conversion(s) in the example if the expected files are not found.

It's very nice that we now have a sample DLC project in test data. I'm guessing the broken frame files are intentional, but it would be better to have actual frames, since this would allow us to have images with non-zero width and height in the COCO JSON files. As a follow-up PR, we should consider adding the generated benchmark dataset(s) as part of our test data. These could replace the current "reference dataset" Train/SWC-plusmaze/sub-M708149_ses-20200317/, giving us a more cohesive and internally consistent set of test fixtures (using the sample DLC project as test inputs and the reference dataset as expected outputs) - our current test data comes from a mix of sample inputs/outputs across different projects, along with multiple ad‑hoc fixtures.

lochhh · 2026-05-22T15:51:54Z

+    output_dir: Path,
+    framelabels_path: Path,
+) -> None:
+    """Copy frame images, renaming them per the COCO JSON filenames.


Suggested change

"""Copy frame images, renaming them per the COCO JSON filenames.

"""Copy and rename frame images to match filenames in COCO JSON.

lochhh · 2026-05-22T16:06:28Z

+    for img in coco_data["images"]:
+        target_filename = img["file_name"]
+        frame_number = _extract_frame_number(target_filename)
+        if frame_number not in source_frame_map:
+            raise FileNotFoundError(
+                f"No source frame found for frame {frame_number} "
+                f"in {input_dir}"
+            )
+        target_path = output_dir / target_filename
+        if not target_path.exists():
+            shutil.copy2(source_frame_map[frame_number], target_path)


Instead of breaking at every missing frame, wdyt about proceeding as usual but warn at the end, so that users could fix the missing frames and then rerun the function?

Suggested change

for img in coco_data["images"]:

target_filename = img["file_name"]

frame_number = _extract_frame_number(target_filename)

if frame_number not in source_frame_map:

raise FileNotFoundError(

f"No source frame found for frame {frame_number} "

f"in {input_dir}"

)

target_path = output_dir / target_filename

if not target_path.exists():

shutil.copy2(source_frame_map[frame_number], target_path)

missing_frames = []

for img in coco_data["images"]:

target_filename = img["file_name"]

frame_number = _extract_frame_number(target_filename)

if frame_number not in source_frame_map:

missing_frames.append(target_filename)

continue

target_path = output_dir / target_filename

if not target_path.exists():

shutil.copy2(source_frame_map[frame_number], target_path)

if missing_frames:

missing = "\n".join(f" {f}" for f in missing_frames)

warnings.warn(

f"{len(missing_frames)} frame(s) not found in {input_dir} "

f"and were skipped:\n{missing}",

UserWarning,

stacklevel=2,

)

lochhh · 2026-05-22T16:21:07Z

+    for img_path in input_dir.glob("*.png"):
+        match = re.search(r"(\d+)", img_path.stem)
+        if match:
+            source_frame_map[int(match.group(1))] = img_path


we also accept jpeg format. Should also think about how we want to handle cases when the same frame is available in multiple formats. The suggestion here just implicitly priorities png over jpeg and jpg, by overwriting existing entries for the same frame number. We may also want to either fail fast or exit if the input dir doesn't contain any matching frame files.

Suggested change

for img_path in input_dir.glob("*.png"):

match = re.search(r"(\d+)", img_path.stem)

if match:

source_frame_map[int(match.group(1))] = img_path

for ext in ("*.jpg", "*.jpeg", "*.png"):

for img_path in input_dir.glob(ext):

match = re.search(r"(\d+)", img_path.stem)

if match:

source_frame_map[int(match.group(1))] = img_path

if not source_frame_map:

raise FileNotFoundError(

f"No image files found in {input_dir}"

)

lochhh · 2026-05-22T16:24:29Z

 ]
 docs = [
  "linkify-it-py",
+  "matplotlib",


I don't see where this is imported?

lochhh · 2026-05-22T16:27:12Z

@@ -0,0 +1,6 @@
+scorer,,,Loukia,Loukia,Loukia,Loukia,Loukia,Loukia,Loukia,Loukia
+bodyparts,,,snout,snout,left_ear,left_ear,right_ear,right_ear,tailbase,tailbase


Since we have 2 CollectedData.csv, shall we use single-index DLC format for one and multi-index for the other? We could potentially remove CollectedData_Pranav.csv and CollecetdData_shailaja.csv from our test data

lochhh · 2026-05-22T16:50:46Z

+# Let's extract short clips from the converted session videos. The resulting
+# clip label files (``cliplabels.json``) can be proof-read and corrected by
+# experts before being shared as part of the benchmark dataset.
+#
+# First, we specify the clip parameters. This step can be run multiple times
+# with different parameters to grow the clip set incrementally.


Suggested change

# Let's extract short clips from the converted session videos. The resulting

# clip label files (``cliplabels.json``) can be proof-read and corrected by

# experts before being shared as part of the benchmark dataset.

#

# First, we specify the clip parameters. This step can be run multiple times

# with different parameters to grow the clip set incrementally.

# Clips (short video segments) can be extracted from the converted session

# videos. When the ``videolabels.json`` files are present, the corresponding

# clip label files (``cliplabels.json``) are generated automatically during

# clip extraction.

# These clip label files should then be proof-read and corrected by

# experts before being included in the benchmark dataset.

#

# First, we specify the clip-extraction parameters. This step can be repeated

# with different parameters to incrementally expand the clip set.

lochhh · 2026-05-22T16:51:54Z

+#    In the published dataset, the ``Train`` split includes all extracted clip
+#    labels (``cliplabels.json``). The ``Test`` split withholds full clip
+#    labels; only clip start labels (``startlabels.json``), derived from each
+#    clip's first frame, are included to support point-tracker evaluation.
+#    The ``videolabels.json`` files generated in the previous section are
+#    intermediate artifacts used for clip extraction, and are never shared.
+#    See :ref:`benchmark dataset <target-benchmark-dataset>` for details.


Also pointing to the folder structure section

Suggested change

# In the published dataset, the ``Train`` split includes all extracted clip

# labels (``cliplabels.json``). The ``Test`` split withholds full clip

# labels; only clip start labels (``startlabels.json``), derived from each

# clip's first frame, are included to support point-tracker evaluation.

# The ``videolabels.json`` files generated in the previous section are

# intermediate artifacts used for clip extraction, and are never shared.

# See :ref:`benchmark dataset <target-benchmark-dataset>` for details.

# In the published dataset, the ``Train`` split includes all

# ``cliplabels.json`` files. The ``Test`` split omits all

# ``cliplabels.json`` files and instead provides only clip start labels

# (``startlabels.json``), derived from each clip's first frame,

# to support point-tracker evaluation.

# The ``videolabels.json`` files generated in the previous section are

# intermediate artifacts used for clip extraction, and are never shared.

# See the :ref:`folder structure specification<target-dataset-folder-\

# structure>` for details.

lochhh · 2026-05-22T17:12:44Z

+# ---------- Frames to poseinterface ----------------
+
+
+@pytest.fixture


We should replace tests/data/Train/SWC-plusmaze/sub-M708149_ses-20200317/ with the outputs we generate in the example for M708154 and/or M727755 (sample dlc data added in this PR). This would allow us to test conversions end-to-end with actual inputs and the "converted benchmark dataset", besides being the reference/sample, functions also as the expected outputs. That said the current unit tests sufficiently cover the logic. I'm happy for this to be done in a follow-up PR.

lochhh · 2026-05-22T17:17:06Z

+
+        result = tree(tmp_path)
+        assert "a_dir/" in result
+        assert "b_file.txt" in result


Ensures file names do not end with trailing slash

Suggested change

assert "b_file.txt" in result

assert "b_file.txt" in result

assert "b_file.txt/" not in result

lochhh · 2026-05-22T17:17:47Z

+    def test_directories_have_trailing_slash(self, tmp_path):
+        """Test that directory names end with / and file names do not."""
+        (tmp_path / "subdir").mkdir()
+        (tmp_path / "file.txt").touch()
+
+        result = tree(tmp_path)
+        lines = result.split("\n")
+        # Root line
+        assert lines[0].endswith("/")
+        # Subdirectory line
+        subdir_line = [line for line in lines if "subdir" in line][0]
+        assert subdir_line.endswith("subdir/")
+        # File line should NOT end with /
+        file_line = [line for line in lines if "file.txt" in line][0]
+        assert not file_line.endswith("/")


this test becomes redundant if we assert "b_file.txt/" not in result in test_files_and_directories

Suggested change

def test_directories_have_trailing_slash(self, tmp_path):

"""Test that directory names end with / and file names do not."""

(tmp_path / "subdir").mkdir()

(tmp_path / "file.txt").touch()

result = tree(tmp_path)

lines = result.split("\n")

# Root line

assert lines[0].endswith("/")

# Subdirectory line

subdir_line = [line for line in lines if "subdir" in line][0]

assert subdir_line.endswith("subdir/")

# File line should NOT end with /

file_line = [line for line in lines if "file.txt" in line][0]

assert not file_line.endswith("/")

niksirbi marked this pull request as draft May 12, 2026 23:12

This was referenced May 13, 2026

Convert predictions to cliplabels.json using movement #45

Merged

Update example for converting DLC project to benchmark #40

Closed

Proposal: Simplify the project-conversion workflow with a Session-based API #50

Open

niksirbi and others added 10 commits May 18, 2026 16:19

Add tree utility for displaying directory structures

c46b3ab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add lightweight DLC project fixture for testing and examples

2ebcd68

Adds a minimal DLC project under tests/data/dlc/ with 2 sessions, each containing a small (100 frames) video, placeholder PNGs, and annotation/prediction CSVs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pin movement>=0.16.0 and update supported Python versions to 3.12-3.14

271af11

updated conversion example

860d3fe

add workflow image to the conversion example

2eaff04

add API reference entry for predictions_to_poseinterface

a004aa4

Fix predictions_to_poseinterface kwargs in example after rebase

ada238d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix typo

6cfef6b

niksirbi force-pushed the update-dlc-to-coco-example-v2 branch from 99eed8a to 6cfef6b Compare May 18, 2026 15:32

niksirbi and others added 4 commits May 18, 2026 16:52

Rename frames_to_poseinterface params to input_dir/output_dir

533cdd0

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

temporarily try on real data and add provenance

73e0693

switch back to using small fixture in example

f63c209

remove duplicate prediction_to_interface entry in API index

729a09f

niksirbi marked this pull request as ready for review May 19, 2026 10:27

niksirbi requested a review from a team May 19, 2026 10:46

lochhh requested review from lochhh and removed request for a team May 19, 2026 10:52

lochhh requested changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update example for converting DLC project to benchmark (v2)#49

Update example for converting DLC project to benchmark (v2)#49
niksirbi wants to merge 14 commits into
mainfrom
update-dlc-to-coco-example-v2

niksirbi commented May 12, 2026 •

edited

Loading

Uh oh!

lochhh left a comment

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

lochhh May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"""Copy frame images, renaming them per the COCO JSON filenames.
	"""Copy and rename frame images to match filenames in COCO JSON.

		@@ -0,0 +1,6 @@
		scorer,,,Loukia,Loukia,Loukia,Loukia,Loukia,Loukia,Loukia,Loukia
		bodyparts,,,snout,snout,left_ear,left_ear,right_ear,right_ear,tailbase,tailbase

-# Let's extract short clips from the converted session videos. The resulting
-# clip label files (``cliplabels.json``) can be proof-read and corrected by
-# experts before being shared as part of the benchmark dataset.
-#
-# First, we specify the clip parameters. This step can be run multiple times
-# with different parameters to grow the clip set incrementally.
+# Clips (short video segments) can be extracted from the converted session
+# videos. When the ``videolabels.json`` files are present, the corresponding
+# clip label files (``cliplabels.json``) are generated automatically during
+# clip extraction.
+# These clip label files should then be proof-read and corrected by
+# experts before being included in the benchmark dataset.
+#
+# First, we specify the clip-extraction parameters. This step can be repeated
+# with different parameters to incrementally expand the clip set.

		# ---------- Frames to poseinterface ----------------


		@pytest.fixture

	assert "b_file.txt" in result
	assert "b_file.txt" in result
	assert "b_file.txt/" not in result

Conversation

niksirbi commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Supersedes

How was this tested

How to review

Uh oh!

lochhh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

niksirbi commented May 12, 2026 •

edited

Loading