feat: Add Damdami taksal bani variations to nitnem dasam granth banis#1920
feat: Add Damdami taksal bani variations to nitnem dasam granth banis#1920BalkiratS wants to merge 9 commits intoshabados:mainfrom
Conversation
|
Thank you for the PR! We'll review the contents shortly - if you'd be able to rewrite the commits to specify |
There was a problem hiding this comment.
@BalkiratS Thank you for your first PR! Please use the correct page/line as shown in the PDF. If you don't want to do that, please remove those fields. I don't personally care what you name your commits in this PR, as we will ultimately squish it into one commit later.
@harjot1singh we need to discuss the vastly same data sources (aside from minor spelling disparities) it's mostly just vishram placements. could we have a "variants" section that primary sources reference? something like the following:
[[content]]
source = "SDGR"
type = "meta"
page = 7
line = 4
data = [
"ਕਿ ਸਰਬੱਤ੍ਰ ਜਾਪਿਯੈ ॥",
"ਕਿ, ਸਰਬਤ੍ਰ ਜਾਪਯੈ ॥",
]
[[content]]
asset = "DGDG"
type = "primary"
data = 1
[[content]]
asset = "DDTK"
type = "primary"
data = 2
[[content]]
asset = "DSKO"
type = "translation"
language = "en"
data = "That Thou art remembered everywhere!"
[[content]]
asset = "RSJD"
type = "translation"
language = "pa"
data = "ਤੂੰ ਸਭ ਥਾਂਵਾਂ ਤੇ ਜਪਿਆ ਜਾਂਦਾ ਹੈਂ,"
TOML has ordered arrays and doesn't care about indents or whitespace and encourages trailing commas. If we keep adding sources and they all look largely the same, this will save us a lot of space. Can also do something like ignoring vishraman in the primary (but that would be more annoying to dictate on each primary type imo).
Edit: There should be some bot that checks PRs to make sure no one enters a variant in the middle (and that they're always appended to the last of a list), as well as to make sure that all variants are unique.
Edit 2: It also makes sense to have a bot make sure that all primary types are at the top (below the first block for variants)
Edit 3: It could be named "meta", as the page/line info is also supposed to be per asset/PDF. So if the meta section existed, it would place the "page/line" in whatever the chosen "standard" is (for example to reference SGGS ji saroops). I've updated my toml exaple above to match that. This way a primary type can use page/line of the actual PDF/files that it's showing in physical format. And we can use it for bound box to highlight files exactly where the line is.
|
@harjot1singh another question, here bani for chaupai sahib is added as a new bookmark, should bookmarks have variants similar to lines? |
|
@bhajneet re the variants, we'll have to discuss this in more detail. I think we originally decided that vishraams should just be tied to assets for now, as published. Hmm. And yes, we need to add checks to the PR (at least a build) so that there's general feedback (think there are some tests too) |
Add variants for primary type #1921 |
|
My only "blocker" is that we remove the repeated "page"/"line" from the new primary type added |
|
@bhajneet I have added the commit to remove the page/line from the new primary lines. |
|
@BalkiratS I'm on my final review, and I cannot merge this for the following reason: the data doesn't seem to match the PDF you've linked. The pauri numbers are all different. See here for the data you've added and compare to screenshot:
It also looks like you didn't do the following lines? VFFW ("ਸੁਨੈ ਗੁੰਗ ਜੋ ਯਾਹਿ; ਸੁ ਰਸਨਾ ਪਾਵਈ ॥") and HUK1 ("ਸੁਨੈ ਮੂੜ ਚਿਤ ਲਾਇ; ਚਤੁਰਤਾ ਆਵਈ ॥"). Because they are the exact same? @harjot1singh we would want this added even if it's the same value, correct?
Lastly it looks like there is a unicode issue with the following line: collections/lines/4/4D/4DNZ.toml This should be typed ਸਿੰਮ੍ਰਿਤਿ I think. We have to decide where the pair-rara goes. It cannot go on the sihari itself. My suggestion is to go with ਮ੍ਰਿ vs ਤ੍ਰਿ How were these added? Via OCR or hand typed? Do you know where else this kind of typing issues might have shown up? |
|
@bhajneet Fixed the pauri numbers and the unicode issue. I did not added a new source entry to the lines where the data field is exactly the same. I thought the idea was to only add new content for the variations and not add duplicate data. But if we want to have duplicate lines from different sources I can update my changes. Let me know. |
|
Thank you @BalkiratS and if you can also comment on how the data is being added/modified? @harjot1singh thoughts on duplicate data? I don't think we can assume which sources have missing data vs equal/duplicate data, so I think we have to add it as is (even if it matches) |
|
@bhajneet The data is added by a script, modified by hand. If i see any other unicode issues, ill add the fix, so far I haven't seen any. |
Okay, that was what I was wondering. The python script in gurmukhiutils repo has been personally vetted to work in a myriad of cases. It may still not have worked on this string, but was curious what that script would've outputted. I may run this personally and check later to compare outputs. |


Summary
This PR updates the following Banis by adding Damdami Taksal Variations.
A new asset source has been added to represent Damdami Taksal Sundar Gutka. Only the lines with variations have been updated. Additionally, a new Taksal Version of Chaupai Sahib have been added to Banis list.
Damdami Taksal Sundar Gutka Source used: https://archive.org/details/sundar-gutka-damdami-taksal-complete_202308/mode/2up