Skip to content

Conversation

@supervacuus
Copy link
Collaborator

@supervacuus supervacuus commented Nov 25, 2025

📜 Description

As discussed last week with @markushi, the process will be to make minimal atomic changes in each PR and merge directly to main rather than accumulating on an uber feature branch. This allows for easier review/feedback/corrections, and we can already test a subset of the entire feature "in the field".

The first minimal change is a basic tombstone integration:

  • exposes internal option to enable/disable (disabled by default)
  • the integration will only ever be enabled if the runtime system is at least Android 12
  • the current setup suggests an operation where users will have to disable NDK support (or only depend on sentry-android-core) or get two reports for the same crash
  • only retrieves the most current tombstone (the current implementation is not entirely correct: we should either remove any remaining ApplicationExitInfo entries with REASON_CRASH_NATIVE or report them too, including the option for the latter; I left this out for minimal interface exposure in the first step, but either variant is easy to add to this PR or later)
  • adds a basic tombstone parser/decoder and accompanying snapshot test
  • introduces the following dependencies:
    • runtime: protobuf-javalite (the entire features adds ca. 75KiB to the Android sample release APK)
    • build: the protobuf plugin and the protoc compiler to automate protocol updates

Open Issues:

  • While the protobuf runtime is relatively small, there is still the possibility of conflicting with client-side protobuf versions (major versions often introduce quite severe breakage, but I haven't tested this yet, only reviewed change logs).
  • finding clarity in how to proceed with older tombstones (ignore or handling similarly to ANRv2)
  • decision if this minimal setup already makes sense to release as an internal API
  • add ManifestMetadataReader to configure conveniently? (or not since the corresponding options are only internal?)
  • No changes yet to the UI (there are quite a few aspects that would make these stack-traces more readable, from my pov this is currently out of scope, but i am writing stuff down)

💡 Motivation and Context

First sensible release step for #3295

Part of https://linear.app/getsentry/project/tombstone-support-android-0024cb6e3ffd/

💚 How did you test it?

  • Added a basic parser snapshot test for a serialized tombstone protobuf.
  • Manual testing.

📝 Checklist

  • I added GH Issue ID & Linear ID
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

  • Decide if we want to go forward with client-side tombstone processing. This decision should happen with this PR.
    • If no, consider sending a tombstone as an attachment, similar to how we attach minidumps and let ingestion/processing deal with decoding (I can prototype this in parallel, since I think we can do most of it in relay, as a first step)
    • If yes, decide on adding the protobuf runtime library as a dependency to sentry-android-core (or shade/relocate, or implement our own decoder, given that this is a stable format which only requires a subset of protobuf).
      • Depending on feedback, adaptation to this minimal change (primarily handling older tombstones).
      • Otherwise, integrate an EventProcessor that merges crash events from sentry-native with tombstones.

#skip-changelog (for now)

@supervacuus
Copy link
Collaborator Author

supervacuus commented Nov 26, 2025

@markushi, I just realized I cannot omit having a separate "tombstone" marker, even if I report all events, without repeating them. I mean, this was clear to me, but, in addition to it being a must for this PR already, unlike the ANR marker, I must also align it with crashedLastRun.

I wonder if it would make the most sense to do it similarly to ANR by using a TombstoneHint to ensure writing the marker timestamp at the correct life-cycle stage of the event, but introduce a new interface analog to AbnormalExit called CrashExit, so that the hint can take a different route (since it isn't truly an abnormal exit and should affect similar paths to the Native SDK crash marker).

The biggest issue with that approach is that the crashedLastRun is handled entirely in EnvelopeCache rather than AndroidEnvelopeCache, whereas ApplicationExitInfo hint/marker handling happens in the latter.

The PR still makes sense for a first review from you (since, if the general direction makes sense and you have todos not in my list, I can also add a test for the integration itself). Still, I would appreciate a short sync on how to align these execution paths (maybe I don't need to align tombstones with crashedLastRun and can abuse AbnormalExit for the same outcome, though it feels like a significant shortcut, even for minimal internal exposure, which is not at all what I aimed for).

I can convert the PR back to a draft if you prefer.

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments - great work so far!

otel-semconv = { module = "io.opentelemetry.semconv:opentelemetry-semconv", version.ref = "otelSemanticConventions" }
otel-semconv-incubating = { module = "io.opentelemetry.semconv:opentelemetry-semconv-incubating", version.ref = "otelSemanticConventionsAlpha" }
p6spy = { module = "p6spy:p6spy", version = "3.9.1" }
protobuf-javalite = { module = "com.google.protobuf:protobuf-javalite", version.ref = "protobuf"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romtsn Not sure if you followed the conversations, but as you can see here, protobuf requires a runtime dependency. It will have an impact of around 10kb. IMHO fine for now, we should still check how stable this library is to avoid and consumer version mismatch issues.

… S, since that is the earliest version where `REASON_CRASH_NATIVE` provides tombstones via the `traceInputStream`)
…rker

This currently does not work:

While we now can optionally enable reporting of "historical" tombstones, by making the `TombstoneHint` `Backfillable` it will automatically be enriched by the `ANRv2EventProcessor` which is currently the only `BackfillingEventProcessor`.

The `ANRv2EventProcessor` is partially written in way that is potentially generic for events with `Backfillable` hints, but other parts are enriching as if those are always were ANRs, which up to now was true, but with Tombstones that assumption now breaks.

Next Steps:

* There is considerable duplication between the ANRv2Integration and TombstoneIntegration
Comment on lines +16 to +26
@NotNull private final BuildInfoProvider buildInfoProvider;

public TombstoneEventProcessor(
@NotNull Context context,
@NotNull SentryAndroidOptions options,
@NotNull BuildInfoProvider buildInfoProvider) {
this.context = context;
this.options = options;
this.buildInfoProvider = buildInfoProvider;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: AnrV2EventProcessor incorrectly processes tombstone events, overwriting signal data and misclassifying them as foreground ANRs due to TombstoneHint interface mismatch.
Severity: CRITICAL | Confidence: High

🔍 Detailed Analysis

The AnrV2EventProcessor incorrectly processes tombstone events, leading to loss of critical signal information and incorrect event classification. Because TombstoneHint does not implement the AbnormalExit interface, the isBackgroundAnr() method in AnrV2EventProcessor always returns false for tombstone events. This causes all tombstone events to be fingerprinted as 'foreground-anr' and to have their inForeground status set to true, regardless of their actual background status. Additionally, the AnrV2EventProcessor overwrites the signal-specific exception type and mechanism data carefully parsed by TombstoneParser with generic ANR data, discarding crucial native crash details.

💡 Suggested Fix

Modify AnrV2EventProcessor to check the type of the hint object before applying ANR-specific logic, or ensure TombstoneHint implements AbnormalExit and provides the necessary mechanism() method to correctly distinguish between ANR and tombstone events and their background status.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location:
sentry-android-core/src/main/java/io/sentry/android/core/TombstoneEventProcessor.java#L1-L26

Potential issue: The `AnrV2EventProcessor` incorrectly processes tombstone events,
leading to loss of critical signal information and incorrect event classification.
Because `TombstoneHint` does not implement the `AbnormalExit` interface, the
`isBackgroundAnr()` method in `AnrV2EventProcessor` always returns `false` for tombstone
events. This causes all tombstone events to be fingerprinted as 'foreground-anr' and to
have their `inForeground` status set to `true`, regardless of their actual background
status. Additionally, the `AnrV2EventProcessor` overwrites the signal-specific exception
type and mechanism data carefully parsed by `TombstoneParser` with generic ANR data,
discarding crucial native crash details.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 4922118

String.join(" ", tombstone.getCommandLineList())));

return message;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing signal info check causes inconsistent event data

The constructMessage() method unconditionally accesses tombstone.getSignalInfo() at line 165 without checking hasSignalInfo() first. Meanwhile, createException() at line 121 does check hasSignalInfo() before accessing signal properties. If a tombstone lacks signal info, getSignalInfo() returns a default proto instance with empty strings and zeros, producing a confusing message like "Fatal signal (0), (0), pid = X ()" while the exception correctly omits signal-related fields. This inconsistency results in malformed event data.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants