Skip to content

Conversation

@jeet1995
Copy link
Member

@jeet1995 jeet1995 commented Nov 12, 2025

Description

The objective of this pull request is to fail document operations when their enclosed barrier requests hit a 410 Lease Not Found. This will help with quicker cross region retries for reads and also curb resource utilization by bailing out of aggressive barrier retries on the client side.

Approach

For reads:

  • A quorum of Head requests is performed. If from within a quorum, a single response is a 410 Lease Not Found, one last Head request is performed on the primary replica post a force address refresh. If the Head request on the primary also returns a 410 Lease Not Found then enclosing request (a non-write request) is failed with a 503 Lease Not Found. If a barrier from primary replica cannot be satisfied, the barrier attainment loop is reentered.
sequenceDiagram
    participant Caller
    participant readStrongAsync
    participant readQuorumAsync
    participant StoreReader
    participant waitForReadBarrier
    participant isBarrierMeetPossible
    participant performBarrierOnPrimary

    Caller->>readStrongAsync: entity, readQuorumValue, readMode
    activate readStrongAsync
    
    loop Retry up to maxNumberOfReadQuorumRetries
        readStrongAsync->>readQuorumAsync: entity, readQuorum, includePrimary, readMode
        activate readQuorumAsync
        
        readQuorumAsync->>StoreReader: readMultipleReplicaAsync(includePrimary, readQuorum)
        activate StoreReader
        StoreReader-->>readQuorumAsync: List<StoreResult>
        deactivate StoreReader
        
        alt Any StoreResult.isAvoidQuorumSelectionException == TRUE
            rect rgb(255, 230, 200)
            Note over readQuorumAsync,performBarrierOnPrimary: 🔴 AVOID QUORUM SELECTION EXCEPTION FLOW 🔴
            Note over readQuorumAsync: ⚠️ AvoidQuorumSelection detected!<br/>Cannot select quorum in current region
            
            readQuorumAsync-->>readStrongAsync: QuorumNotPossibleInCurrentRegion
            
            rect rgb(255, 180, 180)
            Note over readStrongAsync: 🚫 FAIL FAST!<br/>Return failFastException immediately
            readStrongAsync-->>Caller: Mono.error(failFastException)
            end
            end
            
        else QuorumMet (LSN aligned across replicas)
            rect rgb(200, 255, 200)
            Note over readQuorumAsync: ✅ QUORUM MET!<br/>Replicas aligned on LSN<br/>No barrier needed
            readQuorumAsync-->>readStrongAsync: QuorumMet + response
            readStrongAsync-->>Caller: Mono.just(response)
            end
            
        else QuorumSelected (Need barrier wait)
            Note over readQuorumAsync: ⚠️ QUORUM SELECTED<br/>LSN selected, need barrier
            readQuorumAsync-->>readStrongAsync: QuorumSelected + selectedLsn
            
            readStrongAsync->>waitForReadBarrier: barrierRequest, selectedLsn,<br/>globalCommittedLsn, readMode
            activate waitForReadBarrier
            Note over waitForReadBarrier: Initialize:<br/>readBarrierRetryCount (max retries)<br/>readBarrierRetryCountMultiRegion<br/>maxGlobalCommittedLsn = 0
            
            loop Retry barrier checks
                waitForReadBarrier->>StoreReader: readMultipleReplicaAsync(allowPrimary, readQuorum)
                activate StoreReader
                StoreReader-->>waitForReadBarrier: List<StoreResult>
                deactivate StoreReader
                
                alt Any StoreResult.isAvoidQuorumSelectionException == TRUE
                    rect rgb(255, 230, 200)
                    Note over waitForReadBarrier,performBarrierOnPrimary: 🔴 AVOID QUORUM SELECTION IN BARRIER WAIT 🔴
                    Note over waitForReadBarrier: ⚠️ AvoidQuorumSelection during barrier!<br/>Attempt optimistic primary read
                    
                    waitForReadBarrier->>isBarrierMeetPossible: barrierRequest, readBarrierLsn,<br/>targetGlobalCommittedLSN
                    activate isBarrierMeetPossible
                    
                    isBarrierMeetPossible->>performBarrierOnPrimary: Check primary with forceRefresh
                    activate performBarrierOnPrimary
                    Note over performBarrierOnPrimary: forceRefreshAddressCache = TRUE
                    
                    performBarrierOnPrimary->>StoreReader: readPrimaryAsync(requiresValidLsn)
                    activate StoreReader
                    StoreReader-->>performBarrierOnPrimary: StoreResult or Exception
                    deactivate StoreReader
                    
                    alt Primary: LSN >= readBarrierLsn AND globalCommittedLSN >= target
                        rect rgb(200, 255, 200)
                        Note over performBarrierOnPrimary: ✅ PRIMARY CAUGHT UP!<br/>Barrier satisfied
                        performBarrierOnPrimary-->>isBarrierMeetPossible: TRUE
                        Note over isBarrierMeetPossible: bailFlag = true<br/>exceptionHolder = null
                        isBarrierMeetPossible-->>waitForReadBarrier: TRUE
                        waitForReadBarrier-->>readStrongAsync: Boolean.TRUE
                        readStrongAsync-->>Caller: Mono.just(selectedResponse)
                        end
                        
                    else Primary: AvoidQuorumSelection AGAIN
                        rect rgb(255, 180, 180)
                        Note over performBarrierOnPrimary: 🚫 BAIL OUT!<br/>Primary ALSO throws AvoidQuorumSelection<br/>Cannot satisfy barrier
                        performBarrierOnPrimary-->>isBarrierMeetPossible: FALSE
                        Note over isBarrierMeetPossible: bailFlag = TRUE<br/>exceptionHolder = SERVICE_UNAVAILABLE<br/>(with original sub-status)
                        isBarrierMeetPossible-->>waitForReadBarrier: FALSE (BAIL OUT)
                        waitForReadBarrier-->>readStrongAsync: Boolean.FALSE
                        Note over readStrongAsync: bailOnBarrier = true<br/>cosmosException set
                        readStrongAsync->>readQuorumAsync: Retry with QuorumNotPossibleInCurrentRegion
                        readQuorumAsync-->>readStrongAsync: QuorumNotPossibleInCurrentRegion
                        readStrongAsync-->>Caller: Mono.error(cosmosException)
                        end
                        
                    else Primary: Not ready OR invalid
                        rect rgb(255, 255, 200)
                        Note over performBarrierOnPrimary: ⏳ PRIMARY NOT READY<br/>but no exception - can retry
                        performBarrierOnPrimary-->>isBarrierMeetPossible: FALSE
                        Note over isBarrierMeetPossible: bailFlag = false<br/>exceptionHolder = null
                        isBarrierMeetPossible-->>waitForReadBarrier: EMPTY (retry allowed)
                        Note over waitForReadBarrier: Continue loop
                        end
                    end
                    end
                    
                else Replicas have LSN >= readBarrierLsn (quorum met)
                    rect rgb(200, 255, 200)
                    Note over waitForReadBarrier: ✅ BARRIER MET!<br/>Replicas caught up to LSN
                    waitForReadBarrier-->>readStrongAsync: Boolean.TRUE
                    readStrongAsync-->>Caller: Mono.just(selectedResponse)
                    end
                    
                else Retry count exhausted (single-region)
                    rect rgb(220, 220, 220)
                    Note over waitForReadBarrier: ❌ Regular barriers have been exhausted (Includes either barrier on GCLSN or barrier on GLSN)!<br/>Move to multi-region global strong barrier (Only includes barrier on GCLSN - applies only to multi-region Strong Consistency account)
                    
                    alt targetGlobalCommittedLSN > 0 (Global Strong)
                        Note over waitForReadBarrier: Enter multi-region barrier loop<br/>with different retry counts/delays
                        
                        loop Multi-region barrier retries
                            waitForReadBarrier->>StoreReader: readMultipleReplicaAsync()
                            activate StoreReader
                            StoreReader-->>waitForReadBarrier: List<StoreResult>
                            deactivate StoreReader
                            
                            alt AvoidQuorumSelection in multi-region
                                rect rgb(255, 230, 200)
                                Note over waitForReadBarrier: Handle same as single-region<br/>with bailout logic
                                end
                            else Barrier met
                                waitForReadBarrier-->>readStrongAsync: Boolean.TRUE
                            else Multi-region retries exhausted
                                Note over waitForReadBarrier: All retries exhausted
                                waitForReadBarrier-->>readStrongAsync: Boolean.FALSE
                            end
                            
                            alt Retry > maxShortBarrierRetries
                                Note over waitForReadBarrier: Delay: barrierRetryIntervalInMs
                            else Short retry
                                Note over waitForReadBarrier: Delay: shortBarrierRetryIntervalInMs
                            end
                        end
                    else Not global strong
                        waitForReadBarrier-->>readStrongAsync: Boolean.FALSE
                    end
                    end
                    
                else Need retry (LSN not ready)
                    Note over waitForReadBarrier: Track maxGlobalCommittedLsn<br/>retryCount--
                    Note over waitForReadBarrier: Delay: delayBetweenReadBarrierCallsInMs
                    Note over waitForReadBarrier: Continue loop
                end
            end
            
        else QuorumNotSelected (Need primary read)
            Note over readQuorumAsync: ⚠️ QUORUM NOT SELECTED<br/>Read from primary
            readQuorumAsync-->>readStrongAsync: QuorumNotSelected
            
            alt hasPerformedReadFromPrimary
                rect rgb(220, 220, 220)
                Note over readStrongAsync: Already tried primary<br/>Cannot meet quorum
                readStrongAsync-->>Caller: Mono.error(GoneException)<br/>READ_QUORUM_NOT_MET
                end
            else Perform primary read
                Note over readStrongAsync: Read from primary replica
                readStrongAsync->>readStrongAsync: readPrimaryAsync()
                
                alt Primary read successful
                    rect rgb(200, 255, 200)
                    readStrongAsync-->>Caller: Mono.just(primaryResponse)
                    end
                else shouldRetryOnSecondary
                    Note over readStrongAsync: Set: shouldRetryOnSecondary = true<br/>hasPerformedReadFromPrimary = true<br/>includePrimary = true<br/>REPEAT LOOP
                else Primary read failed
                    rect rgb(255, 180, 180)
                    readStrongAsync-->>Caller: Mono.error(GoneException)<br/>READ_QUORUM_NOT_MET
                    end
                end
            end
        end
    end
    deactivate readStrongAsync
Loading

For writes:

  • A cycle of a single Head request is performed. If the response is a 410 Lease Not Found, one last Head request is performed on the primary replica post a force address refresh. If the Head request on the primary also returns a 410 Lease Not Found then the enclosing request (a write request) is failed with a 408 Lease Not Found. If a barrier from primary replica cannot be satisfied, the barrier attainment loop is reentered.
sequenceDiagram
    participant Caller
    participant barrierForGlobalStrong
    participant waitForWriteBarrier
    participant StoreReader
    participant isBarrierMeetPossible
    participant performOptimisticBarrier

    Caller->>barrierForGlobalStrong: request, response, cosmosExceptionHolder
    activate barrierForGlobalStrong
    
    Note over barrierForGlobalStrong: Check if Global Strong enabled<br/>and numberOfReadRegions > 0
    barrierForGlobalStrong->>barrierForGlobalStrong: getLsnAndGlobalCommittedLsn(response)
    
    alt LSN headers missing
        barrierForGlobalStrong-->>Caller: Mono.error(GoneException)<br/>SERVER_GENERATED_410
    else globalCommittedLsn >= lsn
        Note over barrierForGlobalStrong: Barrier already met!
        barrierForGlobalStrong-->>Caller: Mono.just(response)
    else globalCommittedLsn < lsn - NEED BARRIER WAIT
        Note over barrierForGlobalStrong: Store: globalStrongWriteResponse = response<br/>globalCommittedSelectedLSN = lsn.v<br/>forceRefreshAddressCache = false
        
        barrierForGlobalStrong->>waitForWriteBarrier: barrierRequest, selectedLSN, exceptionHolder
        Note over waitForWriteBarrier: Initialize:<br/>retryCount = 30<br/>maxGlobalCommittedLsnReceived = 0
        
        loop Retry up to 30 times
            waitForWriteBarrier->>StoreReader: readMultipleReplicaAsync(allowPrimary=true, requiredCount=1)
            StoreReader-->>waitForWriteBarrier: List<StoreResult>
            
            alt Any StoreResult.isAvoidQuorumSelectionException == TRUE
                rect rgb(255, 230, 200)
                Note over waitForWriteBarrier,performOptimisticBarrier: 🔴 AVOID QUORUM SELECTION EXCEPTION FLOW 🔴
                Note over waitForWriteBarrier: ⚠️ AvoidQuorumSelection detected!<br/>retryCount++ (undo decrement)
                
                waitForWriteBarrier->>isBarrierMeetPossible: Call with cosmosException
                Note over isBarrierMeetPossible: Attempt optimistic barrier check on primary
                
                isBarrierMeetPossible->>performOptimisticBarrier: Check primary with forceRefresh
                Note over performOptimisticBarrier: forceRefreshAddressCache = TRUE
                
                performOptimisticBarrier->>StoreReader: readPrimaryAsync(forceRefresh=TRUE)
                StoreReader-->>performOptimisticBarrier: StoreResult or Exception
                
                alt Primary: globalCommittedLSN >= selectedLSN
                    rect rgb(200, 255, 200)
                    Note over performOptimisticBarrier: ✅ PRIMARY CAUGHT UP!<br/>Barrier can be satisfied
                    performOptimisticBarrier-->>isBarrierMeetPossible: TRUE
                    Note over isBarrierMeetPossible: bailFlag = true<br/>exceptionHolder = null
                    isBarrierMeetPossible-->>waitForWriteBarrier: TRUE
                    waitForWriteBarrier-->>barrierForGlobalStrong: Boolean.TRUE
                    barrierForGlobalStrong-->>Caller: Mono.just(globalStrongWriteResponse)
                    end
                else Primary: AvoidQuorumSelection AGAIN
                    rect rgb(255, 180, 180)
                    Note over performOptimisticBarrier: 🚫 BAIL OUT!<br/>Primary ALSO throws AvoidQuorumSelection<br/>Cannot satisfy barrier
                    performOptimisticBarrier-->>isBarrierMeetPossible: FALSE
                    Note over isBarrierMeetPossible: bailFlag = TRUE<br/>exceptionHolder = REQUEST_TIMEOUT<br/>(with original sub-status)
                    isBarrierMeetPossible-->>waitForWriteBarrier: FALSE (BAIL OUT)
                    waitForWriteBarrier-->>barrierForGlobalStrong: Boolean.FALSE
                    Note over barrierForGlobalStrong: Throw stored cosmosException
                    barrierForGlobalStrong-->>Caller: Mono.error(cosmosException)
                    end
                else Primary: Not ready OR invalid
                    rect rgb(255, 255, 200)
                    Note over performOptimisticBarrier: ⏳ PRIMARY NOT READY<br/>but no exception - can retry
                    performOptimisticBarrier-->>isBarrierMeetPossible: FALSE
                    Note over isBarrierMeetPossible: bailFlag = false<br/>exceptionHolder = null
                    isBarrierMeetPossible-->>waitForWriteBarrier: EMPTY (retry allowed)
                    Note over waitForWriteBarrier: Continue loop with delay
                    end
                end
                end
                
            else Any StoreResult.globalCommittedLSN >= selectedLSN
                rect rgb(200, 255, 200)
                Note over waitForWriteBarrier: ✅ SUCCESS! Barrier met
                waitForWriteBarrier-->>barrierForGlobalStrong: Boolean.TRUE
                barrierForGlobalStrong-->>Caller: Mono.just(globalStrongWriteResponse)
                end
                
            else retryCount == 0 (exhausted)
                rect rgb(220, 220, 220)
                Note over waitForWriteBarrier: ❌ Max retries (30) exhausted<br/>Log: maxGlobalCommittedLsnReceived
                waitForWriteBarrier-->>barrierForGlobalStrong: Boolean.FALSE
                alt cosmosException exists
                    barrierForGlobalStrong-->>Caller: Mono.error(cosmosException)
                else No exception
                    barrierForGlobalStrong-->>Caller: Mono.error(GoneException)<br/>GLOBAL_STRONG_WRITE_BARRIER_NOT_MET
                end
                end
                
            else Retry needed (LSN not ready)
                Note over waitForWriteBarrier: Track maxGlobalCommittedLsnReceived<br/>retryCount--<br/>forceRefreshAddressCache = false
                alt Retry > 4
                    Note over waitForWriteBarrier: Delay 30ms
                else Retry <= 4
                    Note over waitForWriteBarrier: Delay 10ms (short retry)
                end
                Note over waitForWriteBarrier: Mono.empty() → REPEAT LOOP
            end
        end
    end
    deactivate barrierForGlobalStrong
Loading

NOTE

This PR does not bail out barrier requests when a quorum of document requests could not be achieved and read is being performed on purely the primary replica (QuorumNotSelected flow).

Fixes

#46135

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 changed the title Fail fast barrier lease not found Bail out from barriers when barriers hit 410 Lease Not Found. Nov 14, 2025
@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@jeet1995 jeet1995 marked this pull request as ready for review November 18, 2025 01:24
@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks!

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants