Skip to content

Update 'utils' from PHP to TypeScript#280

Open
javagl wants to merge 36 commits into
KhronosGroup:mainfrom
javagl:new-utils
Open

Update 'utils' from PHP to TypeScript#280
javagl wants to merge 36 commits into
KhronosGroup:mainfrom
javagl:new-utils

Conversation

@javagl
Copy link
Copy Markdown
Contributor

@javagl javagl commented May 21, 2026

Previous state

The 'utils' directory contained two PHP scripts, model.php and modelmetadata.php. The purpose of these scripts was to create and update several files in the repository. For example:

  • The model-index.json that lists all models and their tags and variants
  • The "listing" files like Model-showcase.md that shows all models with the 'showcase' tag
  • The 'dep5' file that contains the repo-wide license information
  • For each model:
    • The README.md file
    • The LICENSE.md file
    • The metadata.json file

The metadata.json file was also the input for all of this, and there have been a few functionalities that appeared to be related to various forms of "updating" the license information (that have been a bit hard to follow, admittedly).

New state

The (relevant?) functionality of these scripts is rewritten in TypeScript here. This includes the introduction of structures/types for ... "various things", extensive documentation, and (what I hope to be) clearer processes.

There is a main.ts that only parses the command line arguments.

Some of the command line arguments had been marked as obsolete, or they have been "smeared" through the code in a form that made it hard to parse (~40 appearances of dry-run or dryRun...). Once the command lines are parsed, they are passed to the only "public function", namely ModelMetadata.process

The ModelMetadata class summarizes the main, actual tasks.

It collects the models that should be processed (usually, all models from the Models subdirectory). These are collected as Model objects (details below). Once the models are collected, several functions are called:

  • createListings, which creates the list files that summarize models by their tags, e.g. Models-showcase.md
  • createModelIndex, which created the model-index.json file
  • createReuseLicense which creates the dep5-file with license information about all models
  • Optional: updateAllModelsFiles: This updates the README.md, LICENSE.md, and metadata.json of all models

The Model class summarizes information about "one model"

This class contains, typed, documented information. The main goal was to avoid lines like
$this->metadata['createReadme'] = (isset($this->metadata['AutoGenerateREADME'])) ? $this->metadata['AutoGenerateREADME'] && $this->metadata['createReadme'] : $this->metadata['createReadme'];
and to have a more clearly encapsulated, consistent functionality. This does require a certain level of validation. Therefore...:

The Models class is used for creating Model objects.

This class takes the data that was read from the metadata.json of the models, and performs several levels of validation:

  • It checks that all required fields in the metadata JSON are present
  • It checks that all required fields in the legal array of the metadata JSON are present
  • It checks that the metadata is valid
  • It checks that all 'legal' objects are valid

Only when all this succeeds, it creates an actual, valid Model. (Otherwise, it collects any errors/warnings, so that the caller can deal with that).

Changes to licenses

No, this is not what it sounds like. The point is that the previous state defined an array of licenses, and there have been quite a few inconsistencies in that.

For example, the so-called spdx identifiers have not been valid SPDX identifiers. Some of the license links have been plainly wrong. I don't know where this CC0-1.0 link is coming from, but it redirects to some random blog post. Also, there is no estabished license that is called "Public Domain". And there is no difference between "CC-BY International 4.0" and "CC-BY-4.0". Long story short: There are now exactly (EDIT:) four licenses left. When a metadata.json contains a license where the license is not a valid, known SPDX identifier, then it is assumed to be a "custom" license, meaning that it must have a licenseUrl that points to one of the licenses in this repository.

(The current state here contains a few quirks to "canonicalize" and "clean up" the license information in the existing metadata.json files...)

Review

I know, this is a big change, coming out of the blue (or not so, because we talked about some of that earlier). So I assume that not everybody will be willing or able to review this just so. Nevertheless, I'll tag a few people as potential reviewers here:

@DRx3D @emackey @lexaknyazev

The process for running this state locally should be:

  • cd utils
  • npm install
  • npm run check
  • npm run update

This will check/update all files locally. And the result may look faaar worse than it actually is: It will mark (nearly) all files as "modified", but nearly all these modifications are only the result of canonicalizing the license names to their proper SPDX names. For the other files, I tried to keep the structure and layout the same as before.

Comment thread util/data/listings.json Outdated
Comment thread util/src/main.ts Outdated
Comment thread util/src/main.ts Outdated
Comment thread util/src/main.ts Outdated
Comment thread util/src/main.ts
Comment thread util/src/ModelMetadata.ts Outdated
Comment thread util/src/ModelMetadata.ts Outdated
Comment thread util/src/ModelMetadata.ts Outdated
Comment thread util/src/ModelMetadata.ts Outdated
Comment thread util/src/Models.ts
@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 22, 2026

An inlined comment pointed out that the ./reuse/dep5 file is deprecated, and this should be a REUSE.toml in the repository root. After a bit of back and forth, this is updated here accordingly: The process now writes out a new REUSE.toml file during the update. I also cleaned up the LICENSES directory accordingly, and now, running
reuse lint
locally after the update prints:

SUMMARY

  • Bad licenses: 0
  • Deprecated licenses: 0
  • Licenses without file extension: 0
  • Missing licenses: 0
  • Unused licenses: 0
  • Used licenses: CC-BY-4.0, CC-BY-NC-4.0, CC-BY-NC-SA-4.0, CC0-1.0, LicenseRef-3DRT-Testing, LicenseRef-Adobe-Stock, LicenseRef-CRYENGINE-Agreement, LicenseRef-LegalMark-Cesium, LicenseRef-LegalMark-DGG, LicenseRef-LegalMark-Khronos, LicenseRef-LegalMark-UX3D, LicenseRef-Poser-EULA, LicenseRef-Stanford-Graphics, SCEA
  • Read errors: 0
  • Invalid SPDX License Expressions: 0
  • Files with copyright information: 2317 / 2317
  • Files with license information: 2317 / 2317

Congratulations! Your project is compliant with version 3.3 of the REUSE Specification :-)

So that seems to be fine.


The careful observer might have noticed that this PR now removes 19.160 lines. I have no clue what reuse.spdx was, but am 100% sure that it should either not have been there at all, or at least, not in the given form. You know, with these random parts of .gltf files and such.


The validation via reuse lint could (and probably should) be part of the CI one day, but I think that this could be addressed in a dedicated PR.


Another inlined comment pointed out the updateLegacyLicenseStructure function that is part of the current state. This step essentially just updates all metadata.json files to use proper, canonical SPDX identifiers.

This should not be part of the state that is eventually merged. I think that one way of dealing with this would be:

  • Run the update locally
  • Commit the updated state (basically updating all files in the repo, including the metadata.json files)
  • Remove that updateLegacyLicenseStructure function call
  • Commit the state without that function

Any thoughts on that...?

@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 22, 2026

Small addendum to the last point from the previous comment: Such a "one-time-cleanup pass" could also fix #281 . The current state contains some "debug" function to print these cases. Of course, that will be removed before marking this PR as ready, but only when the intended workflow for that one-time-cleanup is sorted out.

@lexaknyazev
Copy link
Copy Markdown
Member

My personal preference would be to split changes into more targeted PRs, e.g., scripts vs data as appropriate, stacked on top of each other or opened sequentially. At the very least, obvious and trivial one-time cleanups (removal of reuse.spdx in particular) could be merged separately in advance to not inflate PRs that contain changes for reviews.

That said, other maintainers may have different preferences and, ultimately, the goal is to eventually land all changes, so an all-in-one PR that brings the repo to a new state is great too. Up to you how to organize PR's commit history (it can be rewritten if desired).

@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 22, 2026

It's not clear know how that file ended up here, but the step of removing it is just part of the update for the dep5-recation. It could probably have been done at any time, as a 1-commit-PR. There could also be a dedicated PR for #281. And a dedicated PR for that unused test tag. And one for the newly added pbrtest. We could set up a project board, and have planning meetings, and set up a task force, and define a roadmap, but I'll leave that to the managers. I'm just trying to (spend my spare time to) get some overdue cleanup work done here.

By the way, I just saw #277. Did anybody care about that? No. Should it have been merged by now? Yes. However, I'll try to align it with the changes here.

@lexaknyazev
Copy link
Copy Markdown
Member

By the way, I just saw #277. Did anybody care about that? No. Should it have been merged by now? Yes. However, I'll try to align it with the changes here.

I suppose there are only two ways of handling this: either merging #277 and resyncing this PR afterwards or closing #277 and incorporating its changes here. As this PR's name suggests that it's about rewriting scripts, merging #277 first would be a bit more semantically accurate.

Anyway, please feel free to structure this work as you see fit. Getting it merged is more important than planning details.

@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 23, 2026

merging #277 first would be a bit more semantically accurate.

Merging that other PR does not make sense: As soon as it is merged and the PHP script runs, it will build a new dep5 file, leading to a completely inconsistent state.

But that other PR is pretty much a strict subset of what is done here: Removing dep5, adding .toml, adding/fixing some licenses. So in the spirit of not trying to cross a chasm with two jumps, I'd add the other (minor) changes from that PR here (renaming the MD to CONTRIBUTING.md and such), and then, this PR should supersede the other one.

@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 23, 2026

Addressing the points from #277

  • Migrate REUSE to current version
    This was done here as well. In contrast to the other PR, this PR does not only remove the dep5 file and add the REUSE.toml file, but also updates the util-functionality to actually generate updated REUSE.toml files in CI
  • Renamed SubmittingModels.md -> CONTRIBUTING.md to fall in line with other repositories in KhronosGroup
    This was done here as well. And while I'm at it, and because I have nothing else to do, I essentially re-wrote the CONTRIBUTING.md, trying to give it a structure that makes it easier to follow the contribution process:
    • The document now lists the required files immediately: metadata.json, README.body.md, screenshot.
    • It explains all properties in the metadata.json and its legal array, in a way that is hopefully clearer and easier to fill out.
    • I also added some information about the 'variants' (even though I noticed that the variant names have turned a bit arbitrary...).
    • I added an appendix, listing the valid 'tags' and licenses (this wasn't really made clear before that)
    • Direct preview for reviewers: CONTRIBUTING.md
  • Added AI Contribution text to CONTRIBUTING.md
    Yeah, added an appendix for that as well 🤷

The remaining points from that PR...

  • Fixed typos breaking REUSE lint (CC BY-SA-NC 4.0 -> CC-BT-SA-NC-4.0)
  • Added CC-BT-SA-NC-4.0 license
  • Add missing DGG license, using standard used from other 3rd party model licenses

had been done here as well, as part of the overall cleanup of the license namings.

So I think that this PR will make the PR #277 obsolete now.


Previews:

@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 23, 2026

I'll mark this as "Ready For Review" now (even though I expect some minor cleanups and comments to come in during the review).

EDIT: This is obsolete now:

The only "larger" task that I'd consider to be a remaining TODO would be to add the
reuse lint
step to the CI, to ensure that newly submitted models have proper (custom or SPDX) licensing. Some validation could be done by the utils here (e.g. checking for the presence of a .txt file in the LICENSES folder), but maybe that can be left to the 'reuse' linting.

The CI integration of the REUSE check is drafted in #282

Note Before this is actually merged, the updateLegacyLicenseStructure function should be removed. (It wouldn't do any harm, but should no longer be necessary in the future)

@javagl javagl marked this pull request as ready for review May 23, 2026 16:50
@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 23, 2026

A very specific technical aside:

The CONTRIBUTING.md lists the properties that have to go into the metadata.legal entries. For SPDX licenses, this does not require the properties

"licenseUrl": "https://creativecommons.org/licenses/by/4.0/legalcode",
"spdx": "CC-BY-4.0",
"icon": "https://licensebuttons.net/l/by/4.0/88x31.png"

These are inserted automatically. I don't know what the intention was behind that legal structure, and why it includes that unused icon, for example. I think that there could/should be a structure of that legal that does not include that icon, but that would be a 'breaking change'. If people think that this should/could be omitted, I'm all in favor of making that metadata.json as lean and clean as possible.

EDIT: When I said that this would be a 'breaking change', then this referred to things like https://github.khronos.org/glTF-Assets/model/AnimatedTriangle , which already does display this 'icon' at the bottom, and one has to assume that the URL for that is pulled from the metadata. So it can not simply be removed...

@javagl
Copy link
Copy Markdown
Contributor Author

javagl commented May 23, 2026

Recreational saturday evening coding: The new structure would allow some additional functionalities more easily. For example, it took only a few minutes to extend the functionality of creating 'listings' to other categories - specifically, the extensions that the models are using. A preview: A listing of all models using EXT_materials_clearcoat (with links to all other extension listings).

But of course, at some point, it does not make sense to maintain these as MD files here. The https://github.khronos.org/glTF-Assets/ website (or https://asset-explorer.needle.tools/ ) may be more suitable for all this.

Comment thread util/src/Model.ts Outdated
Comment on lines +114 to +134
if (fileLowercase.endsWith(".glb")) {
if (!file.endsWith(".glb")) {
warnings.push(
`File does not have an all-lowercase extension: ${file}`
);
}
variants[name] = file;
foundFile = true;
break;
}
if (fileLowercase.endsWith(".gltf")) {
if (!file.endsWith(".gltf")) {
warnings.push(
`File does not have an all-lowercase extension: ${file}`
);
}
variants[name] = file;
foundFile = true;
break;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider deduplicating this section as .endsWith(".glb") and .endsWith(".gltf") can't be true simultaneously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. That else was actually there in the original code.

Also a bit more generally:

I'd like that "variants search" to be "better" in some waysl. But we currently don't have a strict enough definition of what 'variants' are. I tried to summarize a bit of that in the updated CONTRIBUTING.md. But there are oddly specific ones like glTF-Binary-KTX-ETC1S-Draco that are not mentioned there explicitly.

Nearly a detail: The current check does not complain about a glTF-Binary subdirectory containing a .gltf file and such.

We should probably consider to define an accepted set of 'variants' somewhere, maybe even including the expected extension per variant (e.g. glTF-Binary: .glb). That would likely mean to rename some of the existing ones, and be more strict to disallow things like glTF-ReallyObscureName.

Maybe also worth escalating into an issue...?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That else was actually there in the original code.

I meant deduplicating code blocks themselves as they are identical except for the conditions. That surely could be done later.

We should probably consider to define an accepted set of 'variants' somewhere

For starters, we should require standard names for standard variants:

  • glTF and glTF-Embedded: .gltf
  • glTF-Binary: .glb

All others could be ignored for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added via eb8672e (only a slight improvement, but maybe OK for now)

Comment thread README.md Outdated
Comment thread CONTRIBUTING.md Outdated
Comment thread CONTRIBUTING.md Outdated
Comment thread CONTRIBUTING.md Outdated
Comment thread CONTRIBUTING.md
- Each model must pass the [glTF-Validator](https://github.khronos.org/glTF-Validator/).
- Each model must have an associated `README.md` markdown file that describes the model and the features of the model that make it appropriate of this repo
- Each model must have an associated `metadata.json` file that includes legal information (ownership, copyright, and license)
- Each model must have a properly formatted screenshot
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain "properly formatted" here or elsewhere; the dedicated section below does not contain any specifics either.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll close the fixed typo ones and push the update soon. Some of the text (e.g. 'appropritate of') was taken from the original state, but a cleanup pass should clean up this as well)

The "properly formatted" probably referred to the size of the screenshot. The current state mentions that the screenshot should be at most 150 pixels wide, but I think that's a bit too small. I considered to add some details about the screenshots having ~"an aspect ratio of roughly 1.0". But there are some broader questions around all of that in #49 . If there is any preference (e.g. allowing widths of 150...500, and using a fixed width of 250px in the table or so), then I could add this here as well.

Comment thread CONTRIBUTING.md Outdated
- `summary`: A short summary of the model, to be displayed in tables and overviews. It should usually be a single sentence of short paragraph, and _not_ a full description of the model. For example, a summary might be `"A simple triangle with a rotation animation"`.
- `screenshot`: The path to a screenshot that should be displayed for the model, suitable for being displayed in an overview table.
- `tags`: An array of tags that are used for classifying the model. See the [Tags](#tags) section for the set of tags that are currently supported.
- `createReadme`: Whether a `README.md` file should be generated automatically for the model. This should usually be `true`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be false?

Copy link
Copy Markdown
Contributor Author

@javagl javagl May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taken from the PHP state. The only reason for this being false is probably when someone wants to create the whole README.md manually, and not let it be auto-generated from the screenshot and the README.body.md.

I'd actually lean towards removing this createReadme flag completely, and require the README.body.md, to ensure some consistency between the READMEs, and avoid that "None provided" that is/was inserted by default (even though this model does have a body file - there are some inconsistencies here...)

Anyone opposed to ...

  • removing the createReadme flag
  • requiring the README.body.md

?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the createReadme flag

+1; the main README.md should always be auto-generated.

requiring the README.body.md

+1; if there's absolutely nothing to say about the asset, then it should not be added.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in 8743f37

Comment thread CONTRIBUTING.md Outdated
```

- `license`: An identifier for the license, e.g. `"LicenseRef-LegalMark-Khronos"`
- `licenseUrl`: A link to the license text in the `LICENSES/` directory of the repository. The license text must be a file with the naming pattern _`<license>`_`.txt`. For example, such a license URL could be `"../../LICENSES/LicenseRef-LegalMark-Khronos.txt"`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this field redundant? If the directory is fixed and the file name must match the license identifier, it would be better to omit it and enforce correct names implicitly (by checking that the file exists).

Copy link
Copy Markdown
Contributor Author

@javagl javagl May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, there are some legacy aspects inherited from the PHP here.

The current state contains
"spdx": "LicenseRef-LegalMark-Khronos"
in metadata files, which I'd consider to be undesirable - it's simply not an SPDX identifier. My intention to clean this up somehow became obsolete: Apparently, the REUSE.toml contains a SPDX-License-Identifier that has to be a string that is not an SPDX identifier - ouch 🤕

The current intention is that:

  • the license can either be a known SPDX identifier, or that LicenseRef... string
  • if it is a known SPDX identifier, then the associated fields (licenseUrl, spdx, icon) are filled out automatically
  • if it is a custom license name, then there must be the LicenseUrl. Apparently, the new REUSE policy is that there must be that 1-to-1 correspondence between that license name, and the licenseUrl pointing to the LICENSES/....txt file. So yes, this could probably be omitted.

Checking that this file exists is a TODO in the current state. This existence check is also done by the reuse lint step. But it could be better to catch this earlier in the process (i.e. at the place marked with that TODO).

I'll try to remove the licenseUrl and check the validity earlier. (There currently are some places with a wrong ../-nesting, so I have to review that anyhow)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in b7a00dc

Comment thread CONTRIBUTING.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants