-
Notifications
You must be signed in to change notification settings - Fork 142
Description
ask: We are currently investigating high latencies observed at the SDK entry point during Cloud operations. Our measurements indicate that these latencies occur when performing Put() operations via the Azure SDK. There have been suggestions that the elevated latencies may be related to CPU utilization within the I/O stack, which appears to be actively engaged in Cloud Put() activities.
This observation seems counterintuitive because our understanding is that Put() calls should primarily wait for server-side acknowledgment of successful completion and therefore should not consume significant CPU resources on the client side. However, profiling consistently shows that I/O threads are on CPU, executing within the SDK, specifically in Put() calls, with the top of the stack being:
Azure::Core::Http::CurlConnection::SendBuffer()
Integration Details
SDK Version: azure-identity-1.5.1
SDK is using the C++ interfaces, as described in https://learn.microsoft.com/en-us/azure/storage/blobs/quickstart-blobs-c-plus-plus?tabs=managed-ide…
-
The main entry point from Rubrik code into the SDK for putting a BlockBlok into a container, that we see as occupying the CPU is:
- Azure::Response<Models::UploadBlockBlobResult> Upload(Azure::Core::IO::BodyStream& content, const UploadBlockBlobOptions& options = UploadBlockBlobOptions(), const Azure::Core::Context& context = Azure::Core::Context()) const -
Have attached a detailed thread stack of what it looks like from Upload(), all the way to Azure::Core::Http::CurlConnection::SendBuffer(), in case that is helpful, please see attached file,
thread_stack_bt_upload_sendbuffer.txt
- For completeness, the corresponding Get() interface we use is:
- Azure::ResponseModels::DownloadBlobToResult DownloadTo(uint8_t* buffer, size_t bufferSize, const DownloadBlobToOptions& options = DownloadBlobToOptions(), const Azure::Core::Context& context = Azure::Core::Context()) const
Key Observations
Threads executing SendBuffer() are consistently in the ‘R’ state (as seen in top -H output), indicating active CPU usage.
This behavior raises questions about why these threads are spinning on CPU rather than being idle while awaiting server acknowledgment.
Request to SDK Team:
We seek clarity on the following points:
Root Cause Analysis:
Why are threads in SendBuffer() consuming CPU cycles during Put() operations? Is this expected behavior or indicative of an underlying issue?
Latency Attribution:
How can we measure and differentiate latencies within the SDK layer versus lower-level factors (e.g., network delays) that may contribute to overall high latencies observed at the SDK entry point?
Goal is to disambiguate SDK-level delays from external factors to better understand and optimize performance.