Skip to content

Sync fork#1

Open
kislaykishore wants to merge 1330 commits intokislaykishore:masterfrom
GoogleCloudPlatform:master
Open

Sync fork#1
kislaykishore wants to merge 1330 commits intokislaykishore:masterfrom
GoogleCloudPlatform:master

Conversation

@kislaykishore
Copy link
Copy Markdown
Owner

Sync fork

abhishek10004 and others added 30 commits January 13, 2026 20:06
Adding metrics support in mrd_simple_reader. Also added a UT to verify it.
…inode is destroyed (#4262)

- Destroy MRDinstance & remove it from cache when inode is destroyed
- Refactored the cache logic for better encapsulation & clarity
- UTs for relevant functionality
* Make the bucket-names for csi builds small.

If the project-id is long enough, then when the BUILD_ID is appended to create the bucket name, the total bucket length can become too long. This can result in the bucket creation failing. Fix this by reducing the bucket names to use only a few characters of the build id.
* Ensure that the script exits if the bucket creation fails but doesn't fail if bucket-cleanup fails.
* update logging

* update message

* Update internal/storage/storageutil/retry.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* lint and fmt issue

* review comment

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ch (#4260)

* implementation for metadata prefetch

* add unit tests

* start prefetching at start offset instead of caching everything

* lint fix

* review comments

* review comments

* some fixes

* changed to prefetch from start offset only if directory is large

* review comment

* review comment

* if prefetch completes fast, prefetch in progress state is not seen
…meter settings utils to a separate internal package (#4270)

* create separate package for managing kernel optimisations and contract for GKE

* update logs

* remove indentation as the file is not read by humans and logs are already json formatted in GKE

* review fixes for gemini comments

* add more tests and review fixes

* fix review comments

* kernelparam -> kernelparams
…files (#4271)

* Replace pFlag with Viper to check even config files for IsSet

* remove TestGetMachineType_InputPrecedenceOrder

* Address comments

* Add extra assertions

* Add test constant for maxSupportedTTLInSeconds

* Remove bucket type optimization tests

* update test name
### Description
The Issue Tests passed on the first run but failed on subsequent runs (-count > 1) with 0 spans detected. This was caused by stale global state: the tracer was captured once at startup and became "orphaned" when the test suite moved to the next iteration, sending spans to a dead provider.

The Fix:

Changed GCSFuseTracer() to fetch the tracer from the current global provider instead of a one-time package variable and move it to OtelTracer traceHandle struct.

### Link to the issue in case of a bug fix.
b/475710717
b/476297291

### Testing details
1. Manual - Yes
 Ran the below command where tests were run 5 times, multiple times to check they are passing.
 
 /usr/local/go/bin/go test -v   -fullpath   -timeout 30s -count 5  -run "^TestTracingTestSuite$"   github.com/googlecloudplatform
/gcsfuse/v3/internal/fs   -testify.m "^(TestTraceLookupInode)$"

Also verified that the tracing is working fine by exporting the traces to google cloud trace exporter.


2. Unit tests - existing tests need to pass.

3. Integration tests - tests need to pass.

### Any backward incompatible change? If so, please explain.
NA
…n inode (#4259)

- Adding changes to ensure minObject is updated in MrdInstance as well when it's updated in inode. Also, recreating the MRDPool if the object generation changes.

- Ensuring that we decrement the refCount from simple reader only when it was incremented from that reader as well.
Bumps [pyasn1](https://github.com/pyasn1/pyasn1) from 0.6.1 to 0.6.2.
- [Release notes](https://github.com/pyasn1/pyasn1/releases)
- [Changelog](https://github.com/pyasn1/pyasn1/blob/main/CHANGES.rst)
- [Commits](pyasn1/pyasn1@v0.6.1...v0.6.2)

---
updated-dependencies:
- dependency-name: pyasn1
  dependency-version: 0.6.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [pyasn1](https://github.com/pyasn1/pyasn1) from 0.6.1 to 0.6.2.
- [Release notes](https://github.com/pyasn1/pyasn1/releases)
- [Changelog](https://github.com/pyasn1/pyasn1/blob/main/CHANGES.rst)
- [Commits](pyasn1/pyasn1@v0.6.1...v0.6.2)

---
updated-dependencies:
- dependency-name: pyasn1
  dependency-version: 0.6.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [pyasn1](https://github.com/pyasn1/pyasn1) from 0.6.1 to 0.6.2.
- [Release notes](https://github.com/pyasn1/pyasn1/releases)
- [Changelog](https://github.com/pyasn1/pyasn1/blob/main/CHANGES.rst)
- [Commits](pyasn1/pyasn1@v0.6.1...v0.6.2)

---
updated-dependencies:
- dependency-name: pyasn1
  dependency-version: 0.6.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): bump the go-dependencies group across 1 directory with 8 updates (#4268)

Bumps the go-dependencies group with 6 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) | `1.58.0` | `1.59.0` |
| [github.com/fsouza/fake-gcs-server](https://github.com/fsouza/fake-gcs-server) | `1.52.2` | `1.52.3` |
| [github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure) | `2.4.0` | `2.5.0` |
| [github.com/prometheus/common](https://github.com/prometheus/common) | `0.67.4` | `0.67.5` |
| [golang.org/x/net](https://github.com/golang/net) | `0.48.0` | `0.49.0` |
| [google.golang.org/api](https://github.com/googleapis/google-api-go-client) | `0.257.0` | `0.259.0` |



Updates `cloud.google.com/go/storage` from 1.58.0 to 1.59.0
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.58.0...spanner/v1.59.0)

Updates `github.com/fsouza/fake-gcs-server` from 1.52.2 to 1.52.3
- [Release notes](https://github.com/fsouza/fake-gcs-server/releases)
- [Commits](fsouza/fake-gcs-server@v1.52.2...v1.52.3)

Updates `github.com/go-viper/mapstructure/v2` from 2.4.0 to 2.5.0
- [Release notes](https://github.com/go-viper/mapstructure/releases)
- [Changelog](https://github.com/go-viper/mapstructure/blob/main/CHANGELOG.md)
- [Commits](go-viper/mapstructure@v2.4.0...v2.5.0)

Updates `github.com/prometheus/common` from 0.67.4 to 0.67.5
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/CHANGELOG.md)
- [Commits](prometheus/common@v0.67.4...v0.67.5)

Updates `golang.org/x/net` from 0.48.0 to 0.49.0
- [Commits](golang/net@v0.48.0...v0.49.0)

Updates `golang.org/x/sys` from 0.39.0 to 0.40.0
- [Commits](golang/sys@v0.39.0...v0.40.0)

Updates `golang.org/x/text` from 0.32.0 to 0.33.0
- [Release notes](https://github.com/golang/text/releases)
- [Commits](golang/text@v0.32.0...v0.33.0)

Updates `google.golang.org/api` from 0.257.0 to 0.259.0
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.257.0...v0.259.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-version: 1.59.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/fsouza/fake-gcs-server
  dependency-version: 1.52.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/go-viper/mapstructure/v2
  dependency-version: 2.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/prometheus/common
  dependency-version: 0.67.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: golang.org/x/net
  dependency-version: 0.49.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sys
  dependency-version: 0.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: golang.org/x/text
  dependency-version: 0.33.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: google.golang.org/api
  dependency-version: 0.259.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump urllib3 in /perfmetrics/scripts/micro_benchmarks (#4254)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump urllib3 (#4242)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* revert fake-gcs-server dependecy upgrade due to linux text failure

* review comments

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…4283)

* feat(simple_reader): changing the default value for max-background

* minor change in unit test

* Updating the value based on the latest performance
…n GKE and nonGKE environments. (#4277)

* create separate package for managing kernel optimisations and contract for GKE

* update logs

* remove indentation as the file is not read by humans and logs are already json formatted in GKE

* review fixes for gemini comments

* add more tests and review fixes

* fix review comments

* kernelparam -> kernelparams

* local changes

* update comment and usage

* fix usage

* do nothing when params are empty and fix log structure

* fix param name

* fix review comments
…ownloader (#4281)

* Invalidate metadata Cache for NewMultiRangeDownloader

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix formatting

---------

Co-authored-by: Ashmeen Kaur <57195160+ashmeenkaur@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* delete implicit dir with type cache deprecation

* rebase

* delete implicit dir

* some cleanup

* small fix

* review comments by gemini

* remove erasing from type cache twice

* Small fix
* add stall timeout for direct path connectivity

* Gemini suggestions

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Gemini suggestions

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* lint and linux test fix

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ze file (#4288)

* feat(simple_reader): optimize mrd pool size based on object-size

* Updating the heuristic value

* pool-size 3 gives regression for 500mb to 900mb file, hence keeping 4 only

* fixing unit test

* Fixing unit test

* minor comment update

* Adding minor review comments
…ing (#4292)

* add configs for metadata prefetching

* rename flags/configs
* Handle clobbered error in MRD Pool

* Handle Clobbered error in MRDWrapper

* Address comments
…se Test migration] (#4274)

* remove spaces in test_config and replace with commmas

* gemini reviews resolve

* add parsing logic
* fix network unreachable error by adding retries

* add unit tests

* increase retry

* gemini review comments

* keeping changes for storage client only

* renaming function

* review comment

* review comment

* case in sesitive check
… to directory metadata prefetcher (#4301)

* avoid prefetch storms (edge case)

* review comments
…egration tests. (#4266)

### Description
Disable **grpc** metrics to avoid **otel** plugin error log in integration tests. 
**ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present**
is logged when running any integration tests with grpc protocol and is a known issue in otel grpc plugin and we already disabled grpc metrics in production, doing the same for storage client creation for tests to ensure the noise from this logs removed in tests as well.

### Link to the issue in case of a bug fix.
b/475117270

### Testing details
1. Manual
Go to the directory:
cd ~/gcsfuse/tools/integration_tests/buffered_read .
Run the bufferred read package integration tests.
go test . -test.timeout 5400s -test.parallel 1 --integrationTest -test.v --testbucket=thrivikramks_test_runs_zb --zonal=true -test.count 1

without changes in this PR, you would see logs like 
ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present

with changes in this PR, this error is no longer observed.

2. Unit tests - Existing tests need to pass.
3. Integration tests - Existing tests need to pass.

### Any backward incompatible change? If so, please explain.
NA
…on] (#4131)

* monitoring test migration

* Correction of HNS flags for TestPromGrpcMetricsSuite

* lint correction

* Add cache dir flag

* BidiReadObject grpc metric error resolve for HNS

* \Skip grpc metrics monitoring tests if GCE VM
kislaykishore and others added 30 commits April 13, 2026 13:11
* Fix known-issues entry for premature-EOF

* Indicate that the premature-EOF/incorrect data reads can happen for versions older than 3.0.
* Remove disabling-streaming-writes from the list of workarounds.
…symlink representation (#4616)

* make gcsfuse compatible with older version

* update semantics doc

* update unit test
### Description
Adding traces in write flows to help quicker debugging.
Traces added are as below.

| Trace Name | Identifier | Description |
| :--- | :--- | :--- |
| **Write File Staged** | `write.staged` | Traces a write operation using the legacy staged writes. |
| **Write File Streaming** | `write.streaming` | Traces a write operation using the buffered writes handler. |
| **Sync File Staged** | `write.staged.sync` | Traces the synchronization of the staged temp file to GCS. |
| **Sync File Streaming** | `write.streaming.sync` | Traces the synchronization of buffered writes to GCS. |
| **Streaming Upload Block** | `write.streaming.upload.block` | Traces the upload of a single streaming block to GCS. |
| **Streaming Upload Finalize** | `write.streaming.upload.finalize` | Traces the finalization of the streaming upload. |
| **Streaming Upload Flush** | `write.streaming.upload.flush` | Traces the flushing of pending streaming writes. |
| **Streaming Uploader** | `write.streaming.uploader` | Traces the go routine uploading blocks to GCS when received |

Perf results run on the PR:

<img width="660" height="296" alt="Screenshot 2026-04-10 at 6 45 39 PM" src="https://github.com/user-attachments/assets/01d648c3-2bd0-415b-82f5-768d023830f2" />

There is no degradation on the performance of writes seen.

Attaching golang benchmark test results for the traceHandle interface. Added a method Trace to the handle for better code reuse.

goos: linux
goarch: amd64
pkg: github.com/googlecloudplatform/gcsfuse/v3/tracing
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
BenchmarkTrace/BenchmarkStartSpan_Otel-64         	 6412791	       188.3 ns/op	     240 B/op	       3 allocs/op
BenchmarkTrace/BenchmarkStartServerSpan_Otel-64   	 5128497	       234.1 ns/op	     272 B/op	       5 allocs/op
BenchmarkTrace/BenchmarkRecordError_Otel-64       	265906203	         4.510 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithErrorNoBytes_Otel-64         	 4450281	       269.1 ns/op	     305 B/op	       4 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithErrorWithBytes_Otel-64       	 4457209	       267.2 ns/op	     305 B/op	       4 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithoutErrorNoBytes_Otel-64      	 4598284	       260.4 ns/op	     305 B/op	       4 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithoutErrorWithBytes_Otel-64    	 4535978	       262.5 ns/op	     305 B/op	       4 allocs/op
BenchmarkTrace/BenchmarkSetCacheReadAttributes_Otel-64              	56318166	        21.30 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkSetUploadAttributes_Otel-64                 	40176628	        29.71 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkPropagateTraceContext_Otel-64               	25519111	        45.39 ns/op	      48 B/op	       1 allocs/op
BenchmarkTrace/BenchmarkStartSpan_Noop-64                           	504248462	         2.380 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkStartServerSpan_Noop-64                     	448551762	         2.675 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkRecordError_Noop-64                         	644497599	         1.861 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithErrorNoBytes_Noop-64         	447284846	         2.682 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithErrorWithBytes_Noop-64       	504384649	         2.386 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithoutErrorNoBytes_Noop-64      	446925662	         2.684 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkTraceUploadWithoutErrorWithBytes_Noop-64    	504111261	         2.379 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkSetCacheReadAttributes_Noop-64              	745226696	         1.572 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkSetUploadAttributes_Noop-64                 	799512030	         1.501 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrace/BenchmarkPropagateTraceContext_Noop-64               	534620713	         2.233 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/googlecloudplatform/gcsfuse/v3/tracing	23.932s

We check no-op is very performant to ensure tracing when disabled has minimal if not no impact to the performance.

### Link to the issue in case of a bug fix.
b/454482833

### Testing details
1. Manual 

Triggered a single file upload with sequential writes for a new file and observed the following trace(Note: GCS writer doesn't take in context, so we only see one single clear trace). Along with the object name and the upload size details.

<img width="1601" height="837" alt="Screenshot 2026-04-10 at 5 08 20 PM" src="https://github.com/user-attachments/assets/6713524e-8666-4b80-b37e-3e6a26f609b1" />


Triggered a single file upload with staged writes, below shows WriteFile making a read call to download the complete object from GCS 

<img width="1612" height="826" alt="Screenshot 2026-04-09 at 4 17 43 PM" src="https://github.com/user-attachments/assets/4e7b74f7-330a-47b6-9f08-5ef208aefaf8" />

Flush fs op shows the file content being uploaded to GCS from the tempFile.

<img width="1593" height="837" alt="Screenshot 2026-04-10 at 5 02 23 PM" src="https://github.com/user-attachments/assets/c7196ea1-a325-4d40-a2cb-6f058c1ccb55" />


2. Unit tests - NA
3. Integration tests - NA

### Any backward incompatible change? If so, please explain.
N/A
* chore: update lint target to run against new changes and enforce linting before build

* fix gemini comments
…ration tests with gRPC with explicit SA. (#4611)

* refactor: integrate BillingProject support into storage client and integration test utilities

* review fixes.

* fix lint and project

* feat: add support for key file authentication and improve requester-pays handling in integration tests

* refactor: update storage client initialization to use auth credentials from key file

* fix: handle comma-separated values when parsing billing project flag in integration tests

* refactor: delegate credential creation to CreateCredentialsForSA in CreateCredentials
… 1.26.1 (#4624)

Upgrading golang version to 1.26.2 due to multiple CVEs in 1.26.1
#4503)

* fixing lint and better formatting

* swapping the file content - incorrectly swapped during rebase

* review comments

* fixing formatting
…s becomes reliable (#4628)

* feat: add skipDirectPathEnforcement parameter to createGRPCClientHandle to allow conditional DirectPath enforcement

* chore: increase directPathDetectionTimeout from 10 seconds to 5 minutes

* simplifying a bit

* making direct-path verification non-fatal

* removing the timeout from the client creation

* minor change

* removing empty line
* pr review

* refactor: update runtime table visualization to support timeline segments and detailed status icons

* simplify retry logic

* nested lock issue resolve

* increase lock timeout

* remove lock based status tracking

* fix the attempt numbers

---------

Co-authored-by: Mohit Yadav <mohitkyadav@google.com>
Fixes a memory leak in the buffered reader code during multi-block reads. Previously, if an initial block downloaded successfully but a later block failed (triggering a fallback), the reference counts on the successfully downloaded blocks were never decremented. This PR addresses the leak by catching gcsx.FallbackToAnotherReader errors and calling a new releaseInflightBlocks helper to immediately invoke callbacks and release references for any in-flight blocks.
…egy (#4635)

Updating the direct path enforcement strategy:

- For zonal buckets, log direct path verification status but do not block client creation on it
- For non-zonal buckets, block grpc client creation on direct path verification status. In case of failure in detecting direct path status, fallback to http client would happen based on the set grpc-strategy

Also, updated the timeout to 15 seconds for direct path verification.
* migration buffered read

* mountdir and env changes

* lint correction

* gemini code reviews

* use standard setup and teardown functions

* cleanup and format test
* updating otel library from 1.42.0 to 1.43.0

* updating otel library from 1.42.0 to 1.43.0

* updating remaining otel dependencies

* updating remaining otel dependencies
### Description
Adding the metric gen code for read/block_sizes metric.
This is a histogram metric and the details for the metric are available in metrics.yaml .

### Link to the issue in case of a bug fix.
b/504892196

### Testing details
1. Manual - NA
2. Unit tests - Auto generated unittests have been added and need to pass.
3. Integration tests - NA

### Any backward incompatible change? If so, please explain.
N/A
### Description
Adding metric for tracking streaming write fallback with reasons metric-gen code.

### Link to the issue in case of a bug fix.
b/504892393

### Testing details
1. Manual - NA (given it is auto generated code, existing tests passing and new tests added getting passed is good enough)
2. Unit tests - otel_metrics_test.go updated.
3. Integration tests - NA

### Any backward incompatible change? If so, please explain.
N/A
* fix: prevent symlink attack in cmd/ by using O_NOFOLLOW

Updated `cmd/mount.go` to open the wire log file with `syscall.O_NOFOLLOW`
instead of `os.Create`, which prevents following symlinks.

Updated `cmd/legacy_main.go` to open the stderr log file with `syscall.O_NOFOLLOW`
when running in daemon mode, which also prevents following symlinks.

This fixes the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files (like /etc/shadow)
when gcsfuse runs as root by creating symlinks in places where gcsfuse
writes logs.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Added robust unit tests to ensure that these log files cannot be
overwritten via symlinks.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Added robust unit tests to ensure that these log files cannot be
overwritten via symlinks.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Added robust unit tests to ensure that these log files cannot be
overwritten via symlinks.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* testcase review comments

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Added robust unit tests to ensure that these log files cannot be
overwritten via symlinks.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Added robust unit tests to ensure that these log files cannot be
overwritten via symlinks.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: prevent symlink attack in log file creation across the project

Updated `internal/logger/logger.go`, `cmd/mount.go`, and `cmd/legacy_main.go`
to open log files with the `unix.O_NOFOLLOW` flag.

This prevents the Container-to-Host escape vulnerability reported where an
unprivileged user could overwrite arbitrary host files when gcsfuse runs
as root by creating symlinks in places where gcsfuse writes logs.

Also updated to use `unix.O_NOFOLLOW` consistently instead of `syscall.O_NOFOLLOW`.

Added robust unit tests to ensure that these log files cannot be
overwritten via symlinks.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…arsing (#4620)

* enforce log-file strictness and centralize log filename parsing

* remove .log addition

* revert an if block changes
When takeover offset doesn't match the requested offset, logs are more
useful if we know what each value was.
…not empty (#4641)

* fix: release lock in renameHierarchicalDir

Fixes issue #4636 where `newDirInode` lock was never released in `renameHierarchicalDir` when `checkDirNotEmpty` returned an error because the directory was not empty.

The fix adds `newDirInode` to `pendingInodes` immediately after successfully getting it, ensuring it's released by the deferred function regardless of subsequent errors.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: release lock in renameHierarchicalDir on error

Fixes issue #4636 where `newDirInode` lock was never released in `renameHierarchicalDir` when `checkDirNotEmpty` returned an error because the directory was not empty.

Added an integration test to cover renaming a directory to an existing non-empty directory.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* fix: release lock in renameHierarchicalDir on error

Fixes issue #4636 where `newDirInode` lock was never released in `renameHierarchicalDir` when `checkDirNotEmpty` returned an error because the directory was not empty.

Added an integration test to cover renaming a directory to an existing non-empty directory.

Co-authored-by: vadlakondaswetha <101323867+vadlakondaswetha@users.noreply.github.com>

* code review comments

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
### Description
Adding support for read block sizes metrics.
This metric tracks the read block size requests coming from kernel to GCSFuse into a histogram metric with the following bucket sizes.
{8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728}

### Link to the issue in case of a bug fix.
b/499889366

### Testing details
1. Manual .

Set the parameter of **cloud-metrics-export-interval-secs**  to 60 secs and execute the below fio command with a **bssplit** to be able to run random read with different block sizes from kernel to show up the metrics as a heatmap or histogram.

Below are heat map & bar chart representations of the metric: custom.googleapis.com/gcsfuse/read/block_sizes .

fio --name=gcsfuse-precise-test --rw=randread --size=512M --bssplit=8k/20:64k/30:1m/40:128m/10 --numjobs=1

<img width="1115" height="813" alt="Screenshot 2026-04-20 at 5 58 03 AM" src="https://github.com/user-attachments/assets/9c93fe36-7038-432e-9735-3ab679054a80" />

<img width="1118" height="816" alt="Screenshot 2026-04-20 at 5 59 06 AM" src="https://github.com/user-attachments/assets/e7c40631-eaa9-46b2-829c-9ae134810ae5" />

Also scraped the metrics from the prometheus export to check the values are as expected for a histogram metric.
with prometheus port 8080 and hitting the metrics endpoint to search for the read/block_sizes metric, here is a sample result.

curl -s http://localhost:8080/metrics | grep "read_block_sizes"

read_block_sizes_bucket{le="8192"} 218
read_block_sizes_bucket{le="16384"} 232
read_block_sizes_bucket{le="32768"} 269
read_block_sizes_bucket{le="65536"} 558
read_block_sizes_bucket{le="131072"} 4225
read_block_sizes_bucket{le="262144"} 4225
read_block_sizes_bucket{le="524288"} 4225
read_block_sizes_bucket{le="1.048576e+06"} 4225
read_block_sizes_bucket{le="2.097152e+06"} 4225
read_block_sizes_bucket{le="4.194304e+06"} 4225
read_block_sizes_bucket{le="8.388608e+06"} 4225
read_block_sizes_bucket{le="1.6777216e+07"} 4225
read_block_sizes_bucket{le="3.3554432e+07"} 4225
read_block_sizes_bucket{le="6.7108864e+07"} 4225
read_block_sizes_bucket{le="1.34217728e+08"} 4225
read_block_sizes_bucket{le="+Inf"} 4225
read_block_sizes_sum 4.98950144e+08
read_block_sizes_count 4225

2. Unit tests - added
3. Integration tests - added

### Any backward incompatible change? If so, please explain.
N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.