Skip to content

Cleanup stdlib Component implementations after merging conceptual spans. #264

@nrfulton

Description

@nrfulton

The current stdlib implementation ignores the parts() semantics that was present in early span-based impementations of primodial Mellea. We are now re-introducing spans, which means we need to define parts().

Many of our stdlib component implementations eschew the use of CBlocks and instead insist on strings at iniitialization time. In the past, CBlocks and Strings were semanticaly equivalent. Now they are not. So we need to go through those components and reimplement both their format_for_llm function and their internal representation. This will probably result in breaking changes to initializers as well.

This issue tracks all of the required work.

After we merge #249, we need to spend some time cleaning up the richdocument.py interface:

richdocument.py:

  • DoclingDocuments are naturally chunked. We should reuse these chunks as CBlocks and incorporate those CBlocks into the parts() methods.
  • We should choose a canonical representation for tables and include the table itself in parts().

mify.py:

  • allow re-use instead of passing back a huge string constructed at format time.

Sub-issues

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions