Skip to content

Conversation

@dcollie2
Copy link
Collaborator

See FILE_RENAMING_SPECIFICATION.md.

This specs out a possible approach to resolving our challenges with filenames. The goal is to have parity between SkillRx filenames and the names in Azure. We already have this with the imported files; the file names in SkillRx and Azure are the same for those files. The files uploaded via SkillRx, however, have different names.

### Core Principles

1. **Rename once, when files are first attached to a topic**
2. **Never rename on topic updates** - maintains parity with Azure
Copy link
Collaborator

@dmitrytrager dmitrytrager Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume that S3 has the same document filenames as Azure from the beginning.
S3 remains our provider for active storage uploads, so Azure content is not managed automatically.

Now let's imagine that we update document_prefix (or file_name_prefix part from provider) for topic and then archive it.
We need to move documents from one Azure folder to another. How do we find documents with updated topic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to move the files in Azure.

Renaming the document prefix or file_name_prefix should only affect uploads after that change. Files we have already uploaded will maintain their old names.

So if we have a topic that we imported, along with its one document, and then later we added another document, we would have:

  • this_is_an_imported_name.jpg (in Azure: this_is_an_imported_name.jpg)
  • [skillrx_internal_upload]_ep34_06_a_later_file.mp3 (in Azure: 206206_myprovider_2025_08_ep34_06_a_later_file.mp3)

After we add the renaming to our lifecycle and also run a one-time fix of the filenames we are storing, we will have:

  • this_is_an_imported_name.jpg (in Azure: the same thing)
  • 206206_myprovider_2025_08_ep34_06_a_later_file.mp3 (in Azure: the same thing)

If we then update the provider prefix, then upload a new file, we will have:

  • this_is_an_imported_name.jpg (in Azure: the same thing)
  • 206206_myprovider_2025_08_ep34_06_a_later_file.mp3 (in Azure: the same thing)
  • 206206_mynewproviderprefix_2025_10_06_yet_another_file (in Azure: the same thing)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they decide they want to do any sort of bulk renaming of files after this is all in place, including moving those around in Azure, that will be a separate effort.

Copy link
Collaborator

@dmitrytrager dmitrytrager Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't move files because of file_name_prefix, for example. We move it because of deletion or archiving, right?
When topic is archived, we should move files. Is it possible not to find them by name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do not allow edit document_prefix for topics. And we just disabled editing file_name_prefix for providers. If we enable the latter again, we may get into trouble.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stakeholders want to update file_name_prefix for all providers. And they want to use that to name the provider-specific files. That's what's motivating me to want to get our file names in line with what we upload to Azure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I remember. I just don't understand how this sync will solve our problem.
Will try to go carefully through this doc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you want to talk it through. Basically, it solves the problem because we will always be managing our file sync with Azure using the names stored in our app. If they change one of the fields involved in generating those names, those changes will only apply to future files. If at any point they want to rename the files in Azure, that will be a separate issue with a separate solution.

Copy link
Collaborator

@dmitrytrager dmitrytrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masterpiece

@dmitrytrager
Copy link
Collaborator

I think I understand benefits now.
Please let me know do you think about the problem described here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants