Skip to content

Conversation

@jrray
Copy link
Collaborator

@jrray jrray commented Oct 3, 2025

Use a moka Cache to funnel multiple requests for the same missing payload into a single task to sync the payload locally. Any failure to sync the payload will be cached for 5 minutes, after which another request for that digest will attempt to sync it again.

This was inspired by observing that syncing a single spfs tag would report having "repaired" several different digests multiple times, often over a dozen times. It suggests that Syncer may unexpectedly trying to process the same digest multiple times. Each of these digests were for large payloads (500MB+). The sync was happening over a high-latency link. Without this change, it can end up performing excessive duplicate copies over a limited bandwidth link.

@jrray jrray self-assigned this Oct 3, 2025
@jrray jrray added enhancement New feature or request SPI AOI Area of interest for SPI pr-chain This PR doesn't target the main branch, don't merge! labels Oct 3, 2025
@jrray jrray force-pushed the push-ozqxvnxmwlov branch 2 times, most recently from 7ff7fc2 to b937392 Compare October 3, 2025 02:28
self.code().unwrap_or_else(|| Box::new("spfs::generic"))
)
)]
pub enum SyncError {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did all this Error rework and then realized I didn't need it. But I think this change is for the better anyway.

Err(SyncError::PayloadReadError(digest, Error::String("no repositories could successfully read the payload".into()).into()))
})
.await
.map_err(|err| Error::String(format!("failed to open payload: {err}")))?;
Copy link
Collaborator Author

@jrray jrray Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather return a more specific error here, but this function has to return spfs::Error and the error type in the cache is now SyncError. Most of the SyncError variants have a nested Arc<spfs::Error> that in theory could be returned, but spfs::Error is not Clone.

I worked on making a new error type that is Clone but oops, this function would have to change to return that error type. Or it could change to return yet another new Error type that is Clone and more specific to the storage/repository domain instead of the catch-all spfs::Error type.

@jrray jrray force-pushed the push-volovwxpnxry branch 2 times, most recently from c69b0d5 to bca6bae Compare October 17, 2025 00:11
@jrray jrray force-pushed the push-ozqxvnxmwlov branch from b937392 to 88019a9 Compare October 20, 2025 05:59
@jrray jrray force-pushed the push-volovwxpnxry branch from a288a39 to bc02db1 Compare October 20, 2025 06:06
@jrray jrray force-pushed the push-ozqxvnxmwlov branch from 88019a9 to 18b497f Compare October 20, 2025 06:06
@jrray jrray force-pushed the push-volovwxpnxry branch from bc02db1 to b165c1a Compare October 20, 2025 18:20
@jrray jrray force-pushed the push-ozqxvnxmwlov branch 2 times, most recently from 41fb372 to cc58d46 Compare October 20, 2025 20:45
@jrray jrray force-pushed the push-volovwxpnxry branch from b165c1a to b507fd1 Compare October 20, 2025 20:45
jrray added 8 commits October 20, 2025 13:49
This should continue to work after removing blob objects.

Add a "test-fixtures" feature to spfs in order to use the fixtures in
external crates.

Signed-off-by: J Robert Ray <[email protected]>
This is also expected to still succeed after removing blob objects.

Signed-off-by: J Robert Ray <[email protected]>
Blob files contain the digest and size of the payload. The digest is
redundant because it is the same as the file name of the blob. The size
is also available in the Entry type in Manifests. Reading a blob to
learn the digest of the payload requires already knowing the payload
digest in order to read the blob.

The invariant of a blob not existing unless the payload also exists has
proven to be difficult to maintain. By eliminating blobs, we eliminate
this invariant.

Signed-off-by: J Robert Ray <[email protected]>
Since we don't write a blob object anymore is may be confusing to
continue calling this "commit_blob".

Signed-off-by: J Robert Ray <[email protected]>
With blob files gone there is no longer an invariant to maintain when
writing payloads.

Signed-off-by: J Robert Ray <[email protected]>
Without a blob object to read, if something wants to know the size of a
payload it is possible with this to do it without having to read the
payload data.

Signed-off-by: J Robert Ray <[email protected]>
The blob existence invariant is gone and these operations are now safe.

Signed-off-by: J Robert Ray <[email protected]>
Following up on #1052 there are still spurious test failures in other
tests, e.g., sync::sync_test::test_sync_ref_including_annotation_blob,
that fail because the spfs config is set to legacy encoding format.

These tests that change the config were not putting the config back to
defaults, so that could explain the spurious failures depending on the
order that tests run. Although, it is suspicious that the tests weren't
failing more often if this is the cause.

Signed-off-by: J Robert Ray <[email protected]>
@jrray jrray force-pushed the push-volovwxpnxry branch from 505d504 to d1bf725 Compare October 21, 2025 00:08
@jrray jrray force-pushed the push-ozqxvnxmwlov branch from c9b4e7c to 8d55227 Compare October 21, 2025 00:08
jrray added 3 commits October 20, 2025 17:40
The previous commit that removed `write_blob` didn't actually accomplish
much because blobs were mostly being written with `write_object`. There
is now a debug assertion in `write_object` to prevent writing blobs.
Once this was in place, the tests starting revealing all the places that
were still writing blobs, and then all the places that didn't work
anymore when blobs don't exist.

This commit redefines some terminology used in the API. Previously an
"object" could include blob objects and their digests could be used
interchangeably with payload digests. Now payloads are not considered to
be an "object" and methods that operate on object digests do not work
for payloads. Some method names have been changed to make this more
apparent.

The term "item" is used for things (digests) that can be either an
object of a payload.

A "ref" can be a digest and it is expected that it is still possible to
interact with payloads via a ref. A tag can refer to a payload as well.

The 'find_digests' operation used to return digests for blob objects so
absent blobs it has been changed to return both object digests and
payload digests.

Signed-off-by: J Robert Ray <[email protected]>
This new error type can give better context to the kinds of errors that
can happen during a sync and also limits the enum variants compared to
spfs::Error so errors that can't happen aren't represented.

This work was originally motivated to make an Error type that is Clone,
which is why the wrapped errors are all Arc, however it ended up that
FallbackProxy::open_payload still has to return spfs::Error so having a
Clone SyncError didn't help. The actual need is for an Error type that
open_payload can return that is Clone.

A Clone Error type is needed because the moka Cache puts errors into an
Arc, since it needs to be able to clone the error itself. To get an
error out of the cache and then return it, it needs to be Clone.

Signed-off-by: J Robert Ray <[email protected]>
@jrray jrray force-pushed the push-ozqxvnxmwlov branch from 8d55227 to 3c7788b Compare October 21, 2025 00:40
@jrray jrray force-pushed the push-volovwxpnxry branch from d1bf725 to 8df9867 Compare October 21, 2025 00:40
@jrray jrray requested a review from rydrman October 21, 2025 00:55
jrray added 3 commits October 20, 2025 21:21
This error type is Clone and can be used in the upcoming change to
open_payload that can return a cached result.

Signed-off-by: J Robert Ray <[email protected]>
As well as not checking the test code.

Signed-off-by: J Robert Ray <[email protected]>
Use a moka Cache to funnel multiple requests for the same missing
payload into a single task to sync the payload locally. Any failure to
sync the payload will be cached for 5 minutes, after which another
request for that digest will attempt to sync it again.

This was inspired by observing that syncing a single spfs tag would
report having "repaired" several different digests multiple times, often
over a dozen times. It suggests that Syncer may be unexpectedly trying to
process the same digest multiple times. Each of these digests were for
large payloads (500MB+). The sync was happening over a high-latency link.
Without this change, it can end up performing excessive duplicate copies
over a limited bandwidth link.

Signed-off-by: J Robert Ray <[email protected]>
@jrray jrray force-pushed the push-ozqxvnxmwlov branch from 3c7788b to ef4563c Compare October 21, 2025 04:39
@jrray jrray force-pushed the push-volovwxpnxry branch 3 times, most recently from 9ef4438 to 1c49e90 Compare October 25, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pr-chain This PR doesn't target the main branch, don't merge! SPI AOI Area of interest for SPI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants