Skip to content

High latencies observed at the SDK entry point during Cloud operations #6817

@NaveenSai1605

Description

@NaveenSai1605

ask: We are currently investigating high latencies observed at the SDK entry point during Cloud operations. Our measurements indicate that these latencies occur when performing Put() operations via the Azure SDK. There have been suggestions that the elevated latencies may be related to CPU utilization within the I/O stack, which appears to be actively engaged in Cloud Put() activities.

This observation seems counterintuitive because our understanding is that Put() calls should primarily wait for server-side acknowledgment of successful completion and therefore should not consume significant CPU resources on the client side. However, profiling consistently shows that I/O threads are on CPU, executing within the SDK, specifically in Put() calls, with the top of the stack being:

Azure::Core::Http::CurlConnection::SendBuffer()

Integration Details

SDK Version: azure-identity-1.5.1
SDK is using the C++ interfaces, as described in https://learn.microsoft.com/en-us/azure/storage/blobs/quickstart-blobs-c-plus-plus?tabs=managed-ide…

  • The main entry point from Rubrik code into the SDK for putting a BlockBlok into a container, that we see as occupying the CPU is:

        - Azure::Response<Models::UploadBlockBlobResult> Upload(Azure::Core::IO::BodyStream& content, const UploadBlockBlobOptions& options = UploadBlockBlobOptions(), const Azure::Core::Context& context = Azure::Core::Context()) const
    
  • Have attached a detailed thread stack of what it looks like from Upload(), all the way to Azure::Core::Http::CurlConnection::SendBuffer(), in case that is helpful, please see attached file,

thread_stack_bt_upload_sendbuffer.txt

  • For completeness, the corresponding Get() interface we use is:
    - Azure::ResponseModels::DownloadBlobToResult DownloadTo(uint8_t* buffer, size_t bufferSize, const DownloadBlobToOptions& options = DownloadBlobToOptions(), const Azure::Core::Context& context = Azure::Core::Context()) const

Key Observations
Threads executing SendBuffer() are consistently in the ‘R’ state (as seen in top -H output), indicating active CPU usage.
This behavior raises questions about why these threads are spinning on CPU rather than being idle while awaiting server acknowledgment.
Request to SDK Team:

We seek clarity on the following points:
Root Cause Analysis:
Why are threads in SendBuffer() consuming CPU cycles during Put() operations? Is this expected behavior or indicative of an underlying issue?
Latency Attribution:
How can we measure and differentiate latencies within the SDK layer versus lower-level factors (e.g., network delays) that may contribute to overall high latencies observed at the SDK entry point?

Goal is to disambiguate SDK-level delays from external factors to better understand and optimize performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions