-
Notifications
You must be signed in to change notification settings - Fork 6
Renaming skillrx-uploaded files #460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| ### Core Principles | ||
|
|
||
| 1. **Rename once, when files are first attached to a topic** | ||
| 2. **Never rename on topic updates** - maintains parity with Azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume that S3 has the same document filenames as Azure from the beginning.
S3 remains our provider for active storage uploads, so Azure content is not managed automatically.
Now let's imagine that we update document_prefix (or file_name_prefix part from provider) for topic and then archive it.
We need to move documents from one Azure folder to another. How do we find documents with updated topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to move the files in Azure.
Renaming the document prefix or file_name_prefix should only affect uploads after that change. Files we have already uploaded will maintain their old names.
So if we have a topic that we imported, along with its one document, and then later we added another document, we would have:
- this_is_an_imported_name.jpg (in Azure: this_is_an_imported_name.jpg)
- [skillrx_internal_upload]_ep34_06_a_later_file.mp3 (in Azure: 206206_myprovider_2025_08_ep34_06_a_later_file.mp3)
After we add the renaming to our lifecycle and also run a one-time fix of the filenames we are storing, we will have:
- this_is_an_imported_name.jpg (in Azure: the same thing)
- 206206_myprovider_2025_08_ep34_06_a_later_file.mp3 (in Azure: the same thing)
If we then update the provider prefix, then upload a new file, we will have:
- this_is_an_imported_name.jpg (in Azure: the same thing)
- 206206_myprovider_2025_08_ep34_06_a_later_file.mp3 (in Azure: the same thing)
- 206206_mynewproviderprefix_2025_10_06_yet_another_file (in Azure: the same thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they decide they want to do any sort of bulk renaming of files after this is all in place, including moving those around in Azure, that will be a separate effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't move files because of file_name_prefix, for example. We move it because of deletion or archiving, right?
When topic is archived, we should move files. Is it possible not to find them by name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we do not allow edit document_prefix for topics. And we just disabled editing file_name_prefix for providers. If we enable the latter again, we may get into trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stakeholders want to update file_name_prefix for all providers. And they want to use that to name the provider-specific files. That's what's motivating me to want to get our file names in line with what we upload to Azure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I remember. I just don't understand how this sync will solve our problem.
Will try to go carefully through this doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you want to talk it through. Basically, it solves the problem because we will always be managing our file sync with Azure using the names stored in our app. If they change one of the fields involved in generating those names, those changes will only apply to future files. If at any point they want to rename the files in Azure, that will be a separate issue with a separate solution.
dmitrytrager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
masterpiece
|
I think I understand benefits now. |
See FILE_RENAMING_SPECIFICATION.md.
This specs out a possible approach to resolving our challenges with filenames. The goal is to have parity between SkillRx filenames and the names in Azure. We already have this with the imported files; the file names in SkillRx and Azure are the same for those files. The files uploaded via SkillRx, however, have different names.