docs: OEP-68 Learning Content Identifiers#773
Conversation
7e19a86 to
aaa05fc
Compare
|
@ormsbee , could I ask you to give a very high-level review of what this OEP's trying to achieve? Don't worry about the exact prose and examples, I'd like to work on those a bit to make them more reader-friendly. Just want to make sure the scope and intent feels appropriate before I dig in to making a final draft. |
|
@kdmccormick: I think the high level scope and framing are good. I was a little surprised to see UUIDs in there, but it makes sense. As part of this PR, you might want to amend OEP-38 so that people see a mention of this new OEP if they're looking at Data Modeling more generally. |
bmtcril
left a comment
There was a problem hiding this comment.
Looks good, just one comment
| Summary | ||
| ======= | ||
|
|
||
| .. list-table:: |
There was a problem hiding this comment.
It would be nice if this table had guidance on database / model types for each identifier type as well. Some places where this comes up:
OpequeKeyFieldvsCharFieldUUIDFieldvsCharFieldvsBinaryField- How long to make the
CharFieldfor a code? Not that there's a single answer, but guidance might be useful
There was a problem hiding this comment.
Great idea, I will do that. My proposal is:
OpaqueKeyFieldfor keys, always.UUIDFieldfor UUIDs, always- For codes, default to 255 chars,
[A-Za-z0-9_\-\.], case sensitive. We could define this as a utility class somewhere.
Let me know if you disagree, or if you know of exceptions to (1) and (2). cc @ormsbee
There was a problem hiding this comment.
That proposal looks good to me. I have a suspicion that 1 & 2 are somewhat inconsistent currently, but nothing to back that up aside from a memory of being irritated by it at some point. It could have been in an archived IDA or something.
|
Looks great! Thanks so much for taking this on.
|
agreed
That's a great question. The immediate example that comes to mind is an event, but that's not really a piece of learning content. Do we have any examples which are closer to "content"? Maybe a certificate UUID? |
A certificate is a good example. Another could be the LTI configuration, where we use UUID to avoid leaking the primary ID. |
ormsbee
left a comment
There was a problem hiding this comment.
This is great work—thank you so much for doing it. Minor questions and suggestions, but nothing that I would consider blocking.
The draft and publish logs are another example. |
|
@bmtcril @ormsbee @bradenmacdonald @Agrendalath Thanks for your reviews! I've incorporated all feedback. Here's the diff w.r.t. the initial OEP: 31aaed6...110dc06 . I wouldn't mind another set of eyes on the "OpaueKeys" section which I've amended to explain that OpaqueKeys can sometimes have meaning outside of the instance. |
rodmgwgu
left a comment
There was a problem hiding this comment.
Thanks for this proposal, this is great, I've been confused in the past with course_id's that are actually course keys, so this is really needed.
| **How to name:** On Django models, the primary key is ``id`` and foreign keys automatically | ||
| get the ``_id`` suffix (e.g. ``collection_id``). Outside of model definitions—in variable | ||
| names, REST APIs, event schemas, and so on—use the suffix ``_pk`` instead (e.g. | ||
| ``collection_pk``). When accessing the primary key on a django model, prefer ``.pk``, e.g. | ||
| ``collection.pk``. "id" is an overloaded term that means many things; ``_pk`` leaves no | ||
| doubt that the value is a Django model integer primary key. |
There was a problem hiding this comment.
Unfortunately Braden's recent research shows that the name pk is in conflict with our desire to use typed model keys: openedx/openedx-core#534 (comment) . He suggested changing this guidance to recommend .id.
I'm happy switching back to .id for accessing the PK on the model object and for queries. Anyway, that's what 95% of our code does, so we'd have way less code to update.
What I'll miss about _pk is that it disambiguates non-model fields very well:
some_data = {
"course_id": ..., # ambiguous :(
"course_key": ..., # clearly an opaque key
"course_pk": ..., # clearly a PK
}Maybe we could offer a new convention for ambiguous situations, while still respecting the _id suffix? Like, _db_id:
some_data = {
"course_id": ..., # ambiguous :(
"course_key": ..., # clearly an opaque key
"course_db_id": ..., # clearly a PK
}Lmk your thoughts @ormsbee , @bradenmacdonald , et al
There was a problem hiding this comment.
I'm okay with having vars named course_id an dealing with a bit of ambiguity, but I do strongly wish that we could keep model_instance.pk working somehow. It's a standard Django practice, and it's useful for 1:1 models like Component. I suppose we could modify PublishableEntityMixin so that the OneToOneField is named id, which would probably help. Right now, my_component.id would just error because the actual pk is at my_component.publishable_entity_id.
There was a problem hiding this comment.
In my PR, I added a typed accessor so that you can use .id and it returns self.publishable_entity_id with the correctly-typed PK type.
I do strongly wish that we could keep model_instance.pk working somehow
Yeah. It works, but it has Any type, so it kind of bypasses all the advantages of typed primary keys if you use it.
If we want the best of both worlds (fully typed + using pk terminology), I think the only feasible option (working around django-stubs limitations) is to use modelname_pk on each model, e.g. container.container_pk, entity.entity_pk, other_entity.entity_pk. This lets us be fairly consistent across the board (except if we need to access the ID field in django ORM queries/joins, e.g. obj__entity_id derived from obj.entity.id). But it's a lot more verbose.
Or we go with id which is simple, concise, can be made fully-typed, but is ambiguous in old code.
Or something novel like db_id, but I'd rather avoid that.
I think I'm personally leaning toward just id/..._id everywhere as it's by far the most consistent in new code, and we can try to work on bringing the old code in line over time.
There was a problem hiding this comment.
I think I'm personally leaning toward just id/..._id everywhere as it's by far the most consistent in new code, and we can try to work on bringing the old code in line over time.
I'm supportive of this as well. At the very least, we'll be able to make openedx-core fully consistent, and most platform code will naturally be consistent. course_id (a CourseKey) will be the major offender, but we can work that out over time.
@ormsbee @bradenmacdonald , I can make this OEP update, any objections?
Btw @bradenmacdonald , I'd love if we could add guidance to the OEP about typed primary keys, but I'm fine merging this OEP first and then updating it after your openedx-core PR merges.
There was a problem hiding this comment.
Sounds good to me.
Btw @bradenmacdonald , I'd love if we could add guidance to the OEP about typed primary keys, but I'm fine merging this OEP first and then updating it after your openedx-core PR merges.
I'd like to see if we can find any ways to improve this, as adding two id_fields.py[i] files to each app is slightly more overhead than I'd like. But if we only do it for "major" models, I think it's still a good approach.
There was a problem hiding this comment.
Btw @bradenmacdonald , I'd love if we could add guidance to the OEP about typed primary keys, but I'm fine merging this OEP first and then updating it after your openedx-core PR merges.
I do have a recommendation on this now. Let me know what you think: openedx/openedx-core#534 (comment)
Co-authored-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
110dc06 to
f7d1c52
Compare
|
@bmtcril @Agrendalath @rodmgwgu @ormsbee @bradenmacdonald , based on the latest discussion, I've pushed a commit to change it so that we'll use Please let me know before Wednesday morning (15 April) if you have concerns or if you'd like more time to re-review! cc @sarina . |
Abstract
Open edX code has four kinds of identifiers for learning content: Integer Primary Keys,
Codes, OpaqueKeys, and UUIDs. Choosing the wrong kind—or naming it ambiguously—can
cause subtle bugs, broken exports, and painful code reviews. This OEP explains what each kind
is, when to reach for it, and how to name it consistently.
The primary audience is developers working on openedx-core, openedx-platform, and other
repositories that manage learning content. Plugin developers, site administrators, and
nontechnical Open edX Core Contributors who work with learning content will also find these
concepts useful.
🌐 This is part of a broader "Open edX Core v1.0" effort.
Review 👀
Open: Monday, March 23rd, 2026
Closed:
Monday, April 13th, 2026Wednesday, April 15th, 2026 (beginning of day)You can preview the rendered OEP here: https://open-edx-proposals--773.org.readthedocs.build/en/773/best-practices/oep-0068-bp-content-identifiers.html
AI Note 🤖
I taught Claude about the concepts in this OEP and then had it write a first draft. I then rewrote most of the content myself and corrected Claude's misunderstandings. I then told Claude "make it easier to read" and then reviewed and tweaked that.