Skip to content

feat: persist course sync field diffs#407

Open
aryans-15 wants to merge 2 commits intomasterfrom
store-course-diffs
Open

feat: persist course sync field diffs#407
aryans-15 wants to merge 2 commits intomasterfrom
store-course-diffs

Conversation

@aryans-15
Copy link
Copy Markdown

@aryans-15 aryans-15 commented Apr 11, 2026

Summary by CodeRabbit

  • New Features
    • Course sync now captures and stores the previous and current values for modified course fields, enabling detailed tracking of what changed during each synchronization.

@aryans-15 aryans-15 linked an issue Apr 11, 2026 that may be closed by this pull request
@aryans-15 aryans-15 requested a review from reybahl April 11, 2026 05:21
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Warning

Rate limit exceeded

@aryans-15 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 35 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 35 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d73d0a91-b799-4ecb-8e54-0066e3aae484

📥 Commits

Reviewing files that changed from the base of the PR and between daa53cf and aa32ae4.

📒 Files selected for processing (2)
  • ferry/database/generate_changelog.py
  • ferry/database/sync_db_courses.py
📝 Walkthrough

Walkthrough

This change adds course synchronization difference tracking to the system. A new nullable JSONB last_sync_diff column is added to the Course model to store old and new values for non-computed fields during sync operations. The sync logic computes these diffs, persists them to the database, and marks unchanged courses as NULL.

Changes

Cohort / File(s) Summary
Course Model Enhancement
ferry/database/models.py
Added last_sync_diff JSONB column to Course model to track prior and new values of non-computed fields from recent sync operations.
Sync Diff Computation and Persistence
ferry/database/sync_db_courses.py
Introduced _jsonable_scalar for type normalization, added course_scalar_last_sync_diffs to compute per-course diff payloads, and enhanced sync_db_courses to ensure column existence, compute diffs, set NULL for unchanged courses, and upsert changes via parameterized SQL updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hop, hop, the sync now tracks its tale,
Old values dance with new in JSON's sail,
Each course remembers what has changed,
While stable ones stay NULL, arranged,
Diffs preserved for all to see!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: persist course sync field diffs' directly and accurately describes the main changes: adding a new column to store sync diffs and implementing logic to compute and persist these diffs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch store-course-diffs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@aryans-15 aryans-15 force-pushed the store-course-diffs branch from 3d0803c to daa53cf Compare April 11, 2026 05:24
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ferry/database/generate_changelog.py`:
- Around line 340-349: The current loop only processes
diff["courses"]["changed_rows"], so brand-new courses
(diff["courses"]["added_rows"]) never get field-level (old, new) entries and are
dropped downstream; extend the logic to also iterate
diff["courses"]["added_rows"] and populate course_id_to_changes for each new
course_id (respecting computed_columns["courses"]), using old_val = None (or an
appropriate sentinel) and new_val from courses_new_indexed.loc[course_id,
column]; ensure you initialize course_id_to_changes[course_id] as with changed
rows and reuse the same column filtering as in the existing loop.

In `@ferry/database/sync_db_courses.py`:
- Around line 833-840: The UPDATE that builds ids_sql from changed_course_ids
should use a bound expanding parameter instead of string-concatenating IDs;
update the conn.execute call that references changed_course_ids/ids_sql to use
an expanding bind (e.g., bindparam('ids', expanding=True) or SQLAlchemy's ":ids"
expanding param) and pass {"ids": changed_course_ids} so the query becomes
"WHERE last_sync_diff IS NOT NULL AND course_id NOT IN :ids" (or equivalent with
bindparam) to avoid huge SQL text and enable parameter binding for the course_id
list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44730d87-aad1-4c1c-9199-4a6ce5180b58

📥 Commits

Reviewing files that changed from the base of the PR and between a3ea142 and 3d0803c.

📒 Files selected for processing (3)
  • ferry/database/generate_changelog.py
  • ferry/database/models.py
  • ferry/database/sync_db_courses.py

Comment thread ferry/database/generate_changelog.py Outdated
Comment thread ferry/database/generate_changelog.py Outdated
Comment on lines +364 to +371
register_junction_changes(
course_id_to_changes,
diff,
tables_old,
tables,
"listings",
["course_id"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle listing course_id moves on both courses.

For a listing that keeps the same listing_id but changes course_id, this path only registers the new course via changed_rows. The old course never gets the corresponding listing removal, so cross-listing merges/splits leave one side's changelog and last_sync_diff incomplete.

Comment thread ferry/database/sync_db_courses.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ferry/database/sync_db_courses.py (1)

874-882: ⚠️ Potential issue | 🟡 Minor

Keep the last_sync_diff clear query parameterized.

This still string-builds the NOT IN (...) list, which bypasses bound parameters and can generate very large SQL on big syncs. Please switch it to a bound array / expanding parameter.

Proposed fix
         changed_course_ids = list(course_sync_diff_by_id.keys())
         if changed_course_ids:
-            ids_sql = ",".join(str(int(cid)) for cid in changed_course_ids)
             conn.execute(
                 text(
-                    f"UPDATE courses SET last_sync_diff = NULL "
-                    f"WHERE last_sync_diff IS NOT NULL AND course_id NOT IN ({ids_sql})"
-                )
+                    "UPDATE courses "
+                    "SET last_sync_diff = NULL "
+                    "WHERE last_sync_diff IS NOT NULL "
+                    "AND NOT (course_id = ANY(CAST(:changed_course_ids AS int[])))"
+                ),
+                {"changed_course_ids": [int(cid) for cid in changed_course_ids]},
             )
         else:
             conn.execute(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/database/sync_db_courses.py` around lines 874 - 882, The current UPDATE
in sync_db_courses.py builds a NOT IN(...) list with string concatenation;
change it to use a bound expanding parameter so the DB driver can handle large
lists and binding. In the block that builds ids_sql and calls
conn.execute(text(...)), replace that approach by using a bound parameter (e.g.,
bindparam("ids", expanding=True) or an :ids expanding param) and pass the
changed_course_ids list as the parameter to conn.execute for the UPDATE
statement that sets last_sync_diff = NULL where course_id NOT IN :ids; update
the text/sqlalchemy call around conn.execute and the surrounding logic in the
same function so it uses the expanding bound parameter instead of inlining
ids_sql.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ferry/database/sync_db_courses.py`:
- Around line 71-91: The helper course_scalar_last_sync_diffs currently only
processes diff["courses"]["changed_rows"], so new courses are omitted; update it
to also iterate diff["courses"]["inserted_rows"] and add entries for each
inserted course id (use int(row["course_id"])) building a fields dict for
non-computed columns where "old" is None and "new" comes from cn.loc[cid, col]
via _jsonable_scalar (skip columns in computed_columns["courses"]); merge these
into the existing out mapping the same way as changed rows so inserts produce a
last_sync_diff payload.

---

Duplicate comments:
In `@ferry/database/sync_db_courses.py`:
- Around line 874-882: The current UPDATE in sync_db_courses.py builds a NOT
IN(...) list with string concatenation; change it to use a bound expanding
parameter so the DB driver can handle large lists and binding. In the block that
builds ids_sql and calls conn.execute(text(...)), replace that approach by using
a bound parameter (e.g., bindparam("ids", expanding=True) or an :ids expanding
param) and pass the changed_course_ids list as the parameter to conn.execute for
the UPDATE statement that sets last_sync_diff = NULL where course_id NOT IN
:ids; update the text/sqlalchemy call around conn.execute and the surrounding
logic in the same function so it uses the expanding bound parameter instead of
inlining ids_sql.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b81392d-f98c-4acf-af48-2da719c9b05d

📥 Commits

Reviewing files that changed from the base of the PR and between 3d0803c and daa53cf.

📒 Files selected for processing (2)
  • ferry/database/models.py
  • ferry/database/sync_db_courses.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ferry/database/models.py

Comment on lines +71 to +91
def course_scalar_last_sync_diffs(
diff: dict[str, DiffRecord],
tables_old: dict[str, pd.DataFrame],
tables: dict[str, pd.DataFrame],
) -> dict[int, dict[str, dict[str, Any]]]:
co = tables_old["courses"].set_index("course_id")
cn = tables["courses"].set_index("course_id")
out: dict[int, dict[str, dict[str, Any]]] = {}
for _, row in diff["courses"]["changed_rows"].iterrows():
cid = int(row["course_id"])
fields: dict[str, dict[str, Any]] = {}
for col in row["columns_changed"]:
if col in computed_columns["courses"]:
continue
fields[col] = {
"old": _jsonable_scalar(co.loc[cid, col]),
"new": _jsonable_scalar(cn.loc[cid, col]),
}
if fields:
out[cid] = fields
return out
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include inserted courses in last_sync_diff.

This helper only walks diff["courses"]["changed_rows"], so newly inserted courses never get a payload and end the sync with last_sync_diff = NULL. That leaves course creations invisible to this feature.

Proposed fix
 def course_scalar_last_sync_diffs(
     diff: dict[str, DiffRecord],
     tables_old: dict[str, pd.DataFrame],
     tables: dict[str, pd.DataFrame],
 ) -> dict[int, dict[str, dict[str, Any]]]:
     co = tables_old["courses"].set_index("course_id")
     cn = tables["courses"].set_index("course_id")
     out: dict[int, dict[str, dict[str, Any]]] = {}
     for _, row in diff["courses"]["changed_rows"].iterrows():
         cid = int(row["course_id"])
         fields: dict[str, dict[str, Any]] = {}
         for col in row["columns_changed"]:
             if col in computed_columns["courses"]:
                 continue
             fields[col] = {
                 "old": _jsonable_scalar(co.loc[cid, col]),
                 "new": _jsonable_scalar(cn.loc[cid, col]),
             }
         if fields:
             out[cid] = fields
+
+    for _, row in diff["courses"]["added_rows"].iterrows():
+        cid = int(row["course_id"])
+        fields: dict[str, dict[str, Any]] = {}
+        for col in cn.columns:
+            if col == "course_id" or col in computed_columns["courses"]:
+                continue
+            fields[col] = {
+                "old": None,
+                "new": _jsonable_scalar(cn.loc[cid, col]),
+            }
+        if fields:
+            out[cid] = fields
+
     return out
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/database/sync_db_courses.py` around lines 71 - 91, The helper
course_scalar_last_sync_diffs currently only processes
diff["courses"]["changed_rows"], so new courses are omitted; update it to also
iterate diff["courses"]["inserted_rows"] and add entries for each inserted
course id (use int(row["course_id"])) building a fields dict for non-computed
columns where "old" is None and "new" comes from cn.loc[cid, col] via
_jsonable_scalar (skip columns in computed_columns["courses"]); merge these into
the existing out mapping the same way as changed rows so inserts produce a
last_sync_diff payload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

store course diffs

1 participant