Skip to content

An alternative TLS implementation to fix a crash on musl/aarch64#600

Open
zhengyu123 wants to merge 25 commits into
mainfrom
zgu/tls_replacement
Open

An alternative TLS implementation to fix a crash on musl/aarch64#600
zhengyu123 wants to merge 25 commits into
mainfrom
zgu/tls_replacement

Conversation

@zhengyu123

@zhengyu123 zhengyu123 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?:
This PR is intended to replace thread_local variables (except Otel ones) with an alternative implementation, which is based on pthread_get/setspecific() API.

Motivation:
We have encountered a quite few defects related to TLS - whose implementation is compiler specific. In some circumstances, the implementation is very restrictive that does not fit our usage pattern - it manifests in SCP-1238

Additional Notes:
To limit the scope, I only replaced 3 thread locals in livenessTracker which led to the crash reported in SCP-1238](https://datadoghq.atlassian.net/browse/SCP-1238)

Converting ProfiledThread to use newly introduced ThreadLocal will be done in separate PR.

How to test the change?:

  • CI tests
  • New test case.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a security review (run the dd:platform-security-review
    skill, or file a request via the PSEC review form).
    bewaire also runs automatically on every PR.
  • This PR doesn't touch any of that.
  • JIRA: PROF-15114

Unsure? Have a question? Request a review!

@dd-octo-sts

dd-octo-sts Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #28033179820 | Commit: f0d86ba | Duration: 13m 4s (longest job)

5 of 32 test jobs failed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Failed Tests

musl-aarch64/debug / 8-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 21-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 17-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 11-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 25-librca

Job: View logs

No detailed failure information available. Check the job logs.

Summary: Total: 32 | Passed: 27 | Failed: 5


Updated: 2026-06-23 15:10:20 UTC

@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 17, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 20 Pipeline jobs failed

DataDog/java-profiler | integration-test-x64-glibc: [hotspot, 11]   View in Datadog   GitLab

CI Run | test-matrix / test-linux-musl-aarch64 (11-librca, debug)   View in Datadog   GitHub Actions

CI Run | test-matrix / test-linux-musl-aarch64 (17-librca, debug)   View in Datadog   GitHub Actions

View all 20 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e6e0dc5 | Docs | Datadog PR Page | Give us feedback!

@zhengyu123 zhengyu123 changed the title Zgu/tls replacement WIP: An alternative TLS implementation to fix a crash on musl/aarch64 Jun 18, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Reliability & Chaos Results

All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/120524978

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a pthread-key-based ThreadLocal helper as an alternative to C++ thread_local, and migrates LivenessTracker’s per-thread random/subsampling state to use it (motivated by musl/aarch64 TLS issues). It also adds C++ unit tests validating the new TLS wrapper behavior.

Changes:

  • Added ddprof-lib/src/main/cpp/threadLocal.h implementing a ThreadLocal<T> wrapper on top of pthread_key_*.
  • Updated LivenessTracker::track() to replace thread_local PRNG/distribution/skipped-state with ThreadLocal instances.
  • Added threadLocal_ut.cpp gtest coverage for ThreadLocal set/get semantics, per-thread isolation, and per-thread cleanup.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.

File Description
ddprof-lib/src/main/cpp/threadLocal.h New pthread-key TLS abstraction (generic + double specialization).
ddprof-lib/src/main/cpp/livenessTracker.cpp Switches liveness subsampling TLS state from thread_local to the new ThreadLocal wrapper.
ddprof-lib/src/test/cpp/threadLocal_ut.cpp Adds gtests validating correctness and cleanup behavior of the new TLS wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/cpp/threadLocal.h Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
zhengyu123 and others added 13 commits June 18, 2026 09:57
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@zhengyu123 zhengyu123 changed the title WIP: An alternative TLS implementation to fix a crash on musl/aarch64 An alternative TLS implementation to fix a crash on musl/aarch64 Jun 18, 2026
@zhengyu123 zhengyu123 marked this pull request as ready for review June 18, 2026 18:09
@zhengyu123 zhengyu123 requested a review from a team as a code owner June 18, 2026 18:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13c88912f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/threadLocal.h Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp Outdated
Comment thread ddprof-lib/src/test/cpp/threadLocal_ut.cpp
Comment thread ddprof-lib/src/test/cpp/threadLocal_ut.cpp
Comment thread ddprof-lib/src/test/cpp/threadLocal_ut.cpp

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1de7f18db7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants