Skip to content

docs: OEP-68 Learning Content Identifiers#773

Merged
kdmccormick merged 29 commits intomasterfrom
kdmccormick/keys
Apr 16, 2026
Merged

docs: OEP-68 Learning Content Identifiers#773
kdmccormick merged 29 commits intomasterfrom
kdmccormick/keys

Conversation

@kdmccormick
Copy link
Copy Markdown
Member

@kdmccormick kdmccormick commented Mar 18, 2026

Abstract

Open edX code has four kinds of identifiers for learning content: Integer Primary Keys,
Codes, OpaqueKeys, and UUIDs. Choosing the wrong kind—or naming it ambiguously—can
cause subtle bugs, broken exports, and painful code reviews. This OEP explains what each kind
is, when to reach for it, and how to name it consistently.

The primary audience is developers working on openedx-core, openedx-platform, and other
repositories that manage learning content
. Plugin developers, site administrators, and
nontechnical Open edX Core Contributors who work with learning content will also find these
concepts useful.

🌐 This is part of a broader "Open edX Core v1.0" effort.

Review 👀

Open: Monday, March 23rd, 2026
Closed: Monday, April 13th, 2026 Wednesday, April 15th, 2026 (beginning of day)

You can preview the rendered OEP here: https://open-edx-proposals--773.org.readthedocs.build/en/773/best-practices/oep-0068-bp-content-identifiers.html

AI Note 🤖

I taught Claude about the concepts in this OEP and then had it write a first draft. I then rewrote most of the content myself and corrected Claude's misunderstandings. I then told Claude "make it easier to read" and then reviewed and tweaked that.

@kdmccormick kdmccormick changed the title docs: Naming Identifiers docs: OEP-68 Naming Identifiers Mar 18, 2026
Comment thread oeps/best-practices/oep-0068-bp-naming-identifiers.rst Outdated
@kdmccormick kdmccormick changed the title docs: OEP-68 Naming Identifiers docs: OEP-68 Identifiers Mar 20, 2026
@kdmccormick
Copy link
Copy Markdown
Member Author

@ormsbee , could I ask you to give a very high-level review of what this OEP's trying to achieve? Don't worry about the exact prose and examples, I'd like to work on those a bit to make them more reader-friendly. Just want to make sure the scope and intent feels appropriate before I dig in to making a final draft.

@kdmccormick kdmccormick changed the title docs: OEP-68 Identifiers docs: OEP-68 Learning Content Identifiers Mar 20, 2026
@ormsbee
Copy link
Copy Markdown
Contributor

ormsbee commented Mar 20, 2026

@kdmccormick: I think the high level scope and framing are good. I was a little surprised to see UUIDs in there, but it makes sense. As part of this PR, you might want to amend OEP-38 so that people see a mention of this new OEP if they're looking at Data Modeling more generally.

Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
@kdmccormick kdmccormick marked this pull request as ready for review March 23, 2026 16:27
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Copy link
Copy Markdown
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one comment

Summary
=======

.. list-table::
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this table had guidance on database / model types for each identifier type as well. Some places where this comes up:

  • OpequeKeyField vs CharField
  • UUIDField vs CharField vs BinaryField
  • How long to make the CharField for a code? Not that there's a single answer, but guidance might be useful

Copy link
Copy Markdown
Member Author

@kdmccormick kdmccormick Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, I will do that. My proposal is:

  1. OpaqueKeyField for keys, always.
  2. UUIDField for UUIDs, always
  3. For codes, default to 255 chars,[A-Za-z0-9_\-\.], case sensitive. We could define this as a utility class somewhere.

Let me know if you disagree, or if you know of exceptions to (1) and (2). cc @ormsbee

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That proposal looks good to me. I have a suspicion that 1 & 2 are somewhat inconsistent currently, but nothing to back that up aside from a memory of being irritated by it at some point. It could have been in an archived IDA or something.

@bradenmacdonald
Copy link
Copy Markdown
Contributor

Looks great! Thanks so much for taking this on.

  1. Do we have a clear example of something that "needs" to be a UUID, to illustrate it?
  2. I think it's worth mentioning that UUIDs are much less common[ly needed] that the other identifier types.

@kdmccormick
Copy link
Copy Markdown
Member Author

I think it's worth mentioning that UUIDs are much less common[ly needed] that the other identifier types.

agreed

Do we have a clear example of something that "needs" to be a UUID, to illustrate it?

That's a great question. The immediate example that comes to mind is an event, but that's not really a piece of learning content. Do we have any examples which are closer to "content"? Maybe a certificate UUID?

@Agrendalath
Copy link
Copy Markdown
Member

@kdmccormick

That's a great question. The immediate example that comes to mind is an event, but that's not really a piece of learning content. Do we have any examples which are closer to "content"? Maybe a certificate UUID?

A certificate is a good example. Another could be the LTI configuration, where we use UUID to avoid leaking the primary ID.

Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Copy link
Copy Markdown
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work—thank you so much for doing it. Minor questions and suggestions, but nothing that I would consider blocking.

Comment thread oeps/best-practices/oep-0038-Data-Modeling.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
@ormsbee
Copy link
Copy Markdown
Contributor

ormsbee commented Mar 28, 2026

That's a great question. The immediate example that comes to mind is an event, but that's not really a piece of learning content. Do we have any examples which are closer to "content"? Maybe a certificate UUID?

The draft and publish logs are another example.

@sarina sarina requested a review from rodmgwgu March 30, 2026 16:58
@kdmccormick
Copy link
Copy Markdown
Member Author

@bmtcril @ormsbee @bradenmacdonald @Agrendalath Thanks for your reviews!

I've incorporated all feedback. Here's the diff w.r.t. the initial OEP: 31aaed6...110dc06 . I wouldn't mind another set of eyes on the "OpaueKeys" section which I've amended to explain that OpaqueKeys can sometimes have meaning outside of the instance.

Copy link
Copy Markdown

@rodmgwgu rodmgwgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this proposal, this is great, I've been confused in the past with course_id's that are actually course keys, so this is really needed.

Comment on lines +143 to +148
**How to name:** On Django models, the primary key is ``id`` and foreign keys automatically
get the ``_id`` suffix (e.g. ``collection_id``). Outside of model definitions—in variable
names, REST APIs, event schemas, and so on—use the suffix ``_pk`` instead (e.g.
``collection_pk``). When accessing the primary key on a django model, prefer ``.pk``, e.g.
``collection.pk``. "id" is an overloaded term that means many things; ``_pk`` leaves no
doubt that the value is a Django model integer primary key.
Copy link
Copy Markdown
Member Author

@kdmccormick kdmccormick Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Braden's recent research shows that the name pk is in conflict with our desire to use typed model keys: openedx/openedx-core#534 (comment) . He suggested changing this guidance to recommend .id.

I'm happy switching back to .id for accessing the PK on the model object and for queries. Anyway, that's what 95% of our code does, so we'd have way less code to update.

What I'll miss about _pk is that it disambiguates non-model fields very well:

some_data = {
    "course_id": ...,  # ambiguous :(
    "course_key": ..., # clearly an opaque key
    "course_pk": ...,  # clearly a PK
}

Maybe we could offer a new convention for ambiguous situations, while still respecting the _id suffix? Like, _db_id:

some_data = {
    "course_id": ...,  # ambiguous :(
    "course_key": ..., # clearly an opaque key
    "course_db_id": ...,  # clearly a PK
}

Lmk your thoughts @ormsbee , @bradenmacdonald , et al

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with having vars named course_id an dealing with a bit of ambiguity, but I do strongly wish that we could keep model_instance.pk working somehow. It's a standard Django practice, and it's useful for 1:1 models like Component. I suppose we could modify PublishableEntityMixin so that the OneToOneField is named id, which would probably help. Right now, my_component.id would just error because the actual pk is at my_component.publishable_entity_id.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my PR, I added a typed accessor so that you can use .id and it returns self.publishable_entity_id with the correctly-typed PK type.

I do strongly wish that we could keep model_instance.pk working somehow

Yeah. It works, but it has Any type, so it kind of bypasses all the advantages of typed primary keys if you use it.


If we want the best of both worlds (fully typed + using pk terminology), I think the only feasible option (working around django-stubs limitations) is to use modelname_pk on each model, e.g. container.container_pk, entity.entity_pk, other_entity.entity_pk. This lets us be fairly consistent across the board (except if we need to access the ID field in django ORM queries/joins, e.g. obj__entity_id derived from obj.entity.id). But it's a lot more verbose.

Or we go with id which is simple, concise, can be made fully-typed, but is ambiguous in old code.

Or something novel like db_id, but I'd rather avoid that.

I think I'm personally leaning toward just id/..._id everywhere as it's by far the most consistent in new code, and we can try to work on bringing the old code in line over time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm personally leaning toward just id/..._id everywhere as it's by far the most consistent in new code, and we can try to work on bringing the old code in line over time.

I'm supportive of this as well. At the very least, we'll be able to make openedx-core fully consistent, and most platform code will naturally be consistent. course_id (a CourseKey) will be the major offender, but we can work that out over time.

@ormsbee @bradenmacdonald , I can make this OEP update, any objections?

Btw @bradenmacdonald , I'd love if we could add guidance to the OEP about typed primary keys, but I'm fine merging this OEP first and then updating it after your openedx-core PR merges.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Btw @bradenmacdonald , I'd love if we could add guidance to the OEP about typed primary keys, but I'm fine merging this OEP first and then updating it after your openedx-core PR merges.

I'd like to see if we can find any ways to improve this, as adding two id_fields.py[i] files to each app is slightly more overhead than I'd like. But if we only do it for "major" models, I think it's still a good approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw @bradenmacdonald , I'd love if we could add guidance to the OEP about typed primary keys, but I'm fine merging this OEP first and then updating it after your openedx-core PR merges.

I do have a recommendation on this now. Let me know what you think: openedx/openedx-core#534 (comment)

kdmccormick and others added 2 commits April 13, 2026 10:34
@kdmccormick
Copy link
Copy Markdown
Member Author

kdmccormick commented Apr 13, 2026

@bmtcril @Agrendalath @rodmgwgu @ormsbee @bradenmacdonald , based on the latest discussion, I've pushed a commit to change it so that we'll use _id instead of _pk for database primary keys.

Please let me know before Wednesday morning (15 April) if you have concerns or if you'd like more time to re-review! cc @sarina .

Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Comment thread oeps/best-practices/oep-0068-bp-content-identifiers.rst Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants