Replicate store instances

We want to isolate our main store instance from being overloaded by external RPC requests. To facilitate this we need to replicate the primary store to secondary follower stores.

We can re-use most of the existing interfaces to do this, making the initial implementation fairly non-invasive.

Our store contains several separate storage media, and it is therefore not possible to use an off-the-shelf solution to sync. We therefore use an in-protocol approach.

This acts as both a backup solution, and an IO scaling solution.

### No gRPC changes

A secondary store can be spun up by:

- fetching blocks from another store by polling the `getBlockByNumber` endpoint
- applying each block to itself using `apply_block` as per normal

This means a secondary store is no longer simply passive, but also contains an active task which drives the fetch and apply loop.

### Shortcomings

The above approach does fall short a bit, because we also want to fetch block proofs. The endpoint does support including block proofs, but since proofs will also lag behind a committed block, we will have to fetch the same block data twice. Once for the committed block, and then a second time to fetch the proof (and we get the block data redundantly).

Additionally, since we use polling, and we cannot know what the latest proven block is, we have to continually poll block `proven+1` which will exist as a committed block (but maybe not proven yet), and will therefore return redundant block data on each poll.

This means we are fetching quite a bit of redundant data continuously, but in the short term this is unlikely to be a large problem. In part this is because we don't actually have block proofs yet, so the proven and committed tips will be close/identical for a large part.

This also sort of fails as a robust backup solution, since the primary store has no way of knowing that a block backup has completed. 

There are also alternatives available, but they require quite a bit more work, and this will suffice for now.

### Infrastructure

A trickier part is defining health at the infrastructure level. We want more than 1 of these secondary stores in order to load-balance, and for redundancy. This also means we should be able to identify a lagging/unhealthy store node.

This can be done by defining the `chain tip` as as the maximum of all stores, and then marking any that lag behind by e.g. `N=2` as unhealthy or out-of-sync. This can be done by the load-balancer (presumably), though that means it needs to perform non-trivial work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate store instances #1960

No gRPC changes

Shortcomings

Infrastructure

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replicate store instances #1960

Description

No gRPC changes

Shortcomings

Infrastructure

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions