You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 1, 2026. It is now read-only.
I am investigating the generalization capabilities of the TRM architecture presented in "Less is More." While the paper claims the model learns to "tease out underlying task rules" through recursive refinement, recent independent analyses (and my own replication attempts) suggest the model is heavily reliant on the learned Task_ID embeddings rather than inferring logic from the input grid itself.
The Technical Issue
When the specific Task_ID is removed or randomized, the model's reasoning capabilities appear to collapse completely, suggesting it is performing conditional retrieval (lookup) rather than fluid intelligence.
Observed Behavior (Ablation Results):
Standard Input (Grid + Correct ID): ~45% Accuracy (Matches Paper)
Ablation Input (Grid + Blank/Random ID): 0.0% Accuracy
The Discrepancy
The paper asserts that the 7M parameter network is solving the ARC tasks via recursive refinement. However, if the model requires a unique, pre-learned embedding vector for every single task to achieve a score >0%, this indicates the "logic" is encoded in the embedding table (memory), not the recursive weights (reasoning).
Impact:
Parameter Count: The claim of "7M parameters" excludes the massive embedding table required to store these task-specific priors.
Generalization: A model that fails completely without a task-specific tag cannot be claimed to solve "unseen" tasks in a general sense, as it requires a learned index for that specific problem distribution.
Can you provide a checkpoint or a script where the model successfully solves any unseen puzzle without accessing the specific Task_ID embedding for that puzzle?
If not, how does this architecture differ from a learned lookup table?
I am investigating the generalization capabilities of the TRM architecture presented in "Less is More." While the paper claims the model learns to "tease out underlying task rules" through recursive refinement, recent independent analyses (and my own replication attempts) suggest the model is heavily reliant on the learned Task_ID embeddings rather than inferring logic from the input grid itself.
The Technical Issue
When the specific Task_ID is removed or randomized, the model's reasoning capabilities appear to collapse completely, suggesting it is performing conditional retrieval (lookup) rather than fluid intelligence.
Observed Behavior (Ablation Results):
Standard Input (Grid + Correct ID): ~45% Accuracy (Matches Paper)
Ablation Input (Grid + Blank/Random ID): 0.0% Accuracy
The Discrepancy
The paper asserts that the 7M parameter network is solving the ARC tasks via recursive refinement. However, if the model requires a unique, pre-learned embedding vector for every single task to achieve a score >0%, this indicates the "logic" is encoded in the embedding table (memory), not the recursive weights (reasoning).
Impact:
Parameter Count: The claim of "7M parameters" excludes the massive embedding table required to store these task-specific priors.
Generalization: A model that fails completely without a task-specific tag cannot be claimed to solve "unseen" tasks in a general sense, as it requires a learned index for that specific problem distribution.
Can you provide a checkpoint or a script where the model successfully solves any unseen puzzle without accessing the specific Task_ID embedding for that puzzle?
If not, how does this architecture differ from a learned lookup table?