Skip to content

Conversation

@mateeullahmalik
Copy link
Collaborator

PR Description

This PR fixes the long-standing incorrect account sequence errors when multiple cascade finalizations occur close together.

Key changes:

  • TxHelper now owns and tracks the local account sequence (accountNumber, nextSequence), instead of re-querying committed state before every tx.

  • Deterministic sequence handling:

    • Initialize sequence once from chain.
    • Build/sign using in-memory sequence.
    • Increment on success.
    • On sequence mismatch, re-sync from chain and retry once.
  • Added strict serialization in action_msg (already in place) to ensure only one tx flow per signer.

  • This resolves the race where mempool txs hadn’t been committed yet, causing “expected X, got Y” failures.

Outcome:

  • Finalize and RequestAction no longer collide on sequence.
  • Deterministic tx flow for the supernode key.
  • No other protocol or operational changes needed.

@roomote
Copy link

roomote bot commented Nov 25, 2025

Rooviewer Clock   See task on Roo Cloud

I've reviewed the changes and found a few issues related to thread safety and sequence handling that should be addressed.

  • Ensure TxHelper is thread-safe by adding a mutex.
  • Verify transaction success code (not just error) before incrementing sequence.

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

txmod Module
config *TxConfig

accountNumber uint64
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TxHelper now maintains internal state (accountNumber, nextSequence), which makes it not thread-safe. While action_msg currently handles synchronization externally, this makes TxHelper dangerous to use in other contexts.

Consider adding a sync.Mutex to TxHelper and locking in ExecuteTransaction (and other state-modifying methods) to ensure thread safety.

Fix it with Roo Code or mention @roomote and request a fix.


// Run full tx flow
resp, err := h.ExecuteTransactionWithMsgs(ctx, []types.Msg{msg}, localAcc)
if err == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking only err == nil is insufficient because a transaction might fail CheckTx (e.g. insufficient fees) and return a successful response object but with a non-zero Code. In such cases, the sequence is not incremented on-chain.

Incrementing nextSequence here will cause a drift, forcing the next transaction to fail with a sequence mismatch and trigger a retry. You should also check resp.TxResponse.Code == 0.

		if err == nil {
			if resp != nil && resp.TxResponse != nil && resp.TxResponse.Code == 0 {
				// SUCCESS → bump local sequence
				h.nextSequence++
			}
			return resp, nil
		}

Fix it with Roo Code or mention @roomote and request a fix.

Copy link

@jawad-rafique jawad-rafique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed pkg/lumera/modules/tx/helper.go changes.

Feedback:

  • TxHelper now stores accountNumber/nextSequence/seqInit without any synchronization. If a single TxHelper instance can be used concurrently (multiple goroutines), this introduces data races and can still produce duplicate/out-of-order sequences. Consider guarding sequence state with a sync.Mutex (or making the helper explicitly single-threaded in docs and enforcing it at call sites).
  • The retry path resyncs from committed chain state via AccountInfoByAddress. In the common mismatch scenario where another tx is already in mempool but not committed, that query will still return the old sequence and the retry may fail again. Consider parsing the “expected N, got M” value from the error and setting nextSequence = expected before retrying (or a dedicated RPC for mempool-aware account sequence if available).
  • isSequenceMismatch is very broad (expected + got) and may classify unrelated errors as sequence mismatch. If possible, narrow matching (e.g. codespace/auth + ErrWrongSequence, or a more specific regex).

CI:

  • cascade-e2e-tests is currently failing on this PR (Finalize simulation failed due to invalid metadata: “Kademlia ID doesn’t match expected format”). Might be unrelated to sequence tracking, but it will need to be green before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants