feat: persist course sync field diffs by aryans-15 · Pull Request #407 · coursetable/ferry

aryans-15 · 2026-04-11T05:21:48Z

Summary by CodeRabbit

New Features
- Course sync now captures and stores the previous and current values for modified course fields, enabling detailed tracking of what changed during each synchronization.

coderabbitai · 2026-04-11T05:22:01Z

Warning

Rate limit exceeded

@aryans-15 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 35 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 35 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d73d0a91-b799-4ecb-8e54-0066e3aae484

📥 Commits

Reviewing files that changed from the base of the PR and between daa53cf and aa32ae4.

📒 Files selected for processing (2)

ferry/database/generate_changelog.py
ferry/database/sync_db_courses.py

📝 Walkthrough

Walkthrough

This change adds course synchronization difference tracking to the system. A new nullable JSONB last_sync_diff column is added to the Course model to store old and new values for non-computed fields during sync operations. The sync logic computes these diffs, persists them to the database, and marks unchanged courses as NULL.

Changes

Cohort / File(s)	Summary
Course Model Enhancement `ferry/database/models.py`	Added `last_sync_diff` JSONB column to Course model to track prior and new values of non-computed fields from recent sync operations.
Sync Diff Computation and Persistence `ferry/database/sync_db_courses.py`	Introduced `_jsonable_scalar` for type normalization, added `course_scalar_last_sync_diffs` to compute per-course diff payloads, and enhanced `sync_db_courses` to ensure column existence, compute diffs, set NULL for unchanged courses, and upsert changes via parameterized SQL updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hop, hop, the sync now tracks its tale,
Old values dance with new in JSON's sail,
Each course remembers what has changed,
While stable ones stay NULL, arranged,
Diffs preserved for all to see! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: persist course sync field diffs' directly and accurately describes the main changes: adding a new column to store sync diffs and implementing logic to compute and persist these diffs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch store-course-diffs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Made-with: Cursor

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ferry/database/generate_changelog.py`:
- Around line 340-349: The current loop only processes
diff["courses"]["changed_rows"], so brand-new courses
(diff["courses"]["added_rows"]) never get field-level (old, new) entries and are
dropped downstream; extend the logic to also iterate
diff["courses"]["added_rows"] and populate course_id_to_changes for each new
course_id (respecting computed_columns["courses"]), using old_val = None (or an
appropriate sentinel) and new_val from courses_new_indexed.loc[course_id,
column]; ensure you initialize course_id_to_changes[course_id] as with changed
rows and reuse the same column filtering as in the existing loop.

In `@ferry/database/sync_db_courses.py`:
- Around line 833-840: The UPDATE that builds ids_sql from changed_course_ids
should use a bound expanding parameter instead of string-concatenating IDs;
update the conn.execute call that references changed_course_ids/ids_sql to use
an expanding bind (e.g., bindparam('ids', expanding=True) or SQLAlchemy's ":ids"
expanding param) and pass {"ids": changed_course_ids} so the query becomes
"WHERE last_sync_diff IS NOT NULL AND course_id NOT IN :ids" (or equivalent with
bindparam) to avoid huge SQL text and enable parameter binding for the course_id
list.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44730d87-aad1-4c1c-9199-4a6ce5180b58

📥 Commits

Reviewing files that changed from the base of the PR and between a3ea142 and 3d0803c.

📒 Files selected for processing (3)

ferry/database/generate_changelog.py
ferry/database/models.py
ferry/database/sync_db_courses.py

coderabbitai · 2026-04-11T05:26:27Z

+    register_junction_changes(
+        course_id_to_changes,
+        diff,
+        tables_old,
+        tables,
+        "listings",
+        ["course_id"],
+    )


⚠️ Potential issue | 🟠 Major

Handle listing course_id moves on both courses.

For a listing that keeps the same listing_id but changes course_id, this path only registers the new course via changed_rows. The old course never gets the corresponding listing removal, so cross-listing merges/splits leave one side's changelog and last_sync_diff incomplete.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

ferry/database/sync_db_courses.py (1)

874-882: ⚠️ Potential issue | 🟡 Minor

Keep the last_sync_diff clear query parameterized.

This still string-builds the NOT IN (...) list, which bypasses bound parameters and can generate very large SQL on big syncs. Please switch it to a bound array / expanding parameter.

Proposed fix

         changed_course_ids = list(course_sync_diff_by_id.keys())
         if changed_course_ids:
-            ids_sql = ",".join(str(int(cid)) for cid in changed_course_ids)
             conn.execute(
                 text(
-                    f"UPDATE courses SET last_sync_diff = NULL "
-                    f"WHERE last_sync_diff IS NOT NULL AND course_id NOT IN ({ids_sql})"
-                )
+                    "UPDATE courses "
+                    "SET last_sync_diff = NULL "
+                    "WHERE last_sync_diff IS NOT NULL "
+                    "AND NOT (course_id = ANY(CAST(:changed_course_ids AS int[])))"
+                ),
+                {"changed_course_ids": [int(cid) for cid in changed_course_ids]},
             )
         else:
             conn.execute(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ferry/database/sync_db_courses.py` around lines 874 - 882, The current UPDATE
in sync_db_courses.py builds a NOT IN(...) list with string concatenation;
change it to use a bound expanding parameter so the DB driver can handle large
lists and binding. In the block that builds ids_sql and calls
conn.execute(text(...)), replace that approach by using a bound parameter (e.g.,
bindparam("ids", expanding=True) or an :ids expanding param) and pass the
changed_course_ids list as the parameter to conn.execute for the UPDATE
statement that sets last_sync_diff = NULL where course_id NOT IN :ids; update
the text/sqlalchemy call around conn.execute and the surrounding logic in the
same function so it uses the expanding bound parameter instead of inlining
ids_sql.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ferry/database/sync_db_courses.py`:
- Around line 71-91: The helper course_scalar_last_sync_diffs currently only
processes diff["courses"]["changed_rows"], so new courses are omitted; update it
to also iterate diff["courses"]["inserted_rows"] and add entries for each
inserted course id (use int(row["course_id"])) building a fields dict for
non-computed columns where "old" is None and "new" comes from cn.loc[cid, col]
via _jsonable_scalar (skip columns in computed_columns["courses"]); merge these
into the existing out mapping the same way as changed rows so inserts produce a
last_sync_diff payload.

---

Duplicate comments:
In `@ferry/database/sync_db_courses.py`:
- Around line 874-882: The current UPDATE in sync_db_courses.py builds a NOT
IN(...) list with string concatenation; change it to use a bound expanding
parameter so the DB driver can handle large lists and binding. In the block that
builds ids_sql and calls conn.execute(text(...)), replace that approach by using
a bound parameter (e.g., bindparam("ids", expanding=True) or an :ids expanding
param) and pass the changed_course_ids list as the parameter to conn.execute for
the UPDATE statement that sets last_sync_diff = NULL where course_id NOT IN
:ids; update the text/sqlalchemy call around conn.execute and the surrounding
logic in the same function so it uses the expanding bound parameter instead of
inlining ids_sql.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b81392d-f98c-4acf-af48-2da719c9b05d

📥 Commits

Reviewing files that changed from the base of the PR and between 3d0803c and daa53cf.

📒 Files selected for processing (2)

ferry/database/models.py
ferry/database/sync_db_courses.py

🚧 Files skipped from review as they are similar to previous changes (1)

ferry/database/models.py

coderabbitai · 2026-04-11T05:29:05Z

+def course_scalar_last_sync_diffs(
+    diff: dict[str, DiffRecord],
+    tables_old: dict[str, pd.DataFrame],
+    tables: dict[str, pd.DataFrame],
+) -> dict[int, dict[str, dict[str, Any]]]:
+    co = tables_old["courses"].set_index("course_id")
+    cn = tables["courses"].set_index("course_id")
+    out: dict[int, dict[str, dict[str, Any]]] = {}
+    for _, row in diff["courses"]["changed_rows"].iterrows():
+        cid = int(row["course_id"])
+        fields: dict[str, dict[str, Any]] = {}
+        for col in row["columns_changed"]:
+            if col in computed_columns["courses"]:
+                continue
+            fields[col] = {
+                "old": _jsonable_scalar(co.loc[cid, col]),
+                "new": _jsonable_scalar(cn.loc[cid, col]),
+            }
+        if fields:
+            out[cid] = fields
+    return out


⚠️ Potential issue | 🟠 Major

Include inserted courses in last_sync_diff.

This helper only walks diff["courses"]["changed_rows"], so newly inserted courses never get a payload and end the sync with last_sync_diff = NULL. That leaves course creations invisible to this feature.

Proposed fix

def course_scalar_last_sync_diffs( diff: dict[str, DiffRecord], tables_old: dict[str, pd.DataFrame], tables: dict[str, pd.DataFrame], ) -> dict[int, dict[str, dict[str, Any]]]: co = tables_old["courses"].set_index("course_id") cn = tables["courses"].set_index("course_id") out: dict[int, dict[str, dict[str, Any]]] = {} for _, row in diff["courses"]["changed_rows"].iterrows(): cid = int(row["course_id"]) fields: dict[str, dict[str, Any]] = {} for col in row["columns_changed"]: if col in computed_columns["courses"]: continue fields[col] = { "old": _jsonable_scalar(co.loc[cid, col]), "new": _jsonable_scalar(cn.loc[cid, col]), } if fields: out[cid] = fields + + for _, row in diff["courses"]["added_rows"].iterrows(): + cid = int(row["course_id"]) + fields: dict[str, dict[str, Any]] = {} + for col in cn.columns: + if col == "course_id" or col in computed_columns["courses"]: + continue + fields[col] = { + "old": None, + "new": _jsonable_scalar(cn.loc[cid, col]), + } + if fields: + out[cid] = fields + return out

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ferry/database/sync_db_courses.py` around lines 71 - 91, The helper course_scalar_last_sync_diffs currently only processes diff["courses"]["changed_rows"], so new courses are omitted; update it to also iterate diff["courses"]["inserted_rows"] and add entries for each inserted course id (use int(row["course_id"])) building a fields dict for non-computed columns where "old" is None and "new" comes from cn.loc[cid, col] via _jsonable_scalar (skip columns in computed_columns["courses"]); merge these into the existing out mapping the same way as changed rows so inserts produce a last_sync_diff payload.

aryans-15 linked an issue Apr 11, 2026 that may be closed by this pull request

store course diffs #406

Open

aryans-15 requested a review from reybahl April 11, 2026 05:21

feat: store scalar course column diffs in last_sync_diff

daa53cf

Made-with: Cursor

aryans-15 force-pushed the store-course-diffs branch from 3d0803c to daa53cf Compare April 11, 2026 05:24

coderabbitai Bot reviewed Apr 11, 2026

View reviewed changes

fix: course diffs for adds + SQL bind

aa32ae4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: persist course sync field diffs#407

feat: persist course sync field diffs#407
aryans-15 wants to merge 2 commits intomasterfrom
store-course-diffs

aryans-15 commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 11, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryans-15 commented Apr 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aryans-15 commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading