[CI] Add new script + cronjob to amend pull request data by jriv01 · Pull Request #809 · llvm/llvm-zorg

jriv01 · 2026-03-16T20:35:19Z

Implement a new script for amending pull request data stored in BigQuery. This script also fetches open pull requests, as that data is not already recorded by process_llvm_commits.py. Data regarding new pull requests and reviews and requeried and amended every two hours, separate from the process_llvm_commits cronjob.

Scheduling of this script reuses the operational metrics container, as both process_llvm_commits.py and amend_pull_request_data.py depend on operational_metrics_lib.py. The image commands for each cronjob have been explicitly added to each cronjob manifest.

premerge/cronjobs/amend_pull_request_data_cronjob.yaml

premerge/ops-container/amend_pull_request_data.py

boomanaiden154 · 2026-03-16T20:45:57Z

premerge/ops-container/amend_pull_request_data.py

+LLVM_REVIEWS_TABLE = "llvm_reviews"
+
+
+def fetch_open_pull_requests_from_github(


We're only querying open PRs. That means we are going to miss PRs that are short lived (<2 hours open to merge)(actually somewhat common), assuming that they do not straddle a two hour boundary.

That also means we are going to miss any postcommit review information. Is that considered out of scope for now?

In cases like those, we'd need to rely on process_llvm_commits.py to record merged commits first before we can re-query them for post-commit reviews. They'd be subject to the 2 day delay that script runs with, but post-commit reviews should be captured at some point.

It might be fine for the time being, but not ideal. Ideally we'd either increase the frequency we poll for open pull requests, or reduce/remove our delay on process_llvm_commits.py. I could follow up on this in another PR.

So postcommit review information is just out of scope for this PR?

We could also query for closed PRs that were opened and closed in the last two hours, which I think should capture that without duplicating data.

There are other race conditions probably though (like PRs that were opened after the two hour mark but before the script starts running.

It's not entirely out of scope, once this script is running we'll start collecting post-commit data for commits submitted without review within the last two weeks (using whatever data we already have stored, since these tables are shared).

I agree that we could query for PRs that had a quick turnaround time between open & close, duplicate data shouldn't be an issue since the SQL used will only insert new records and just update old ones. We could also avoid missing PRs that miss the 2 hour window due to race conditions by widening the window we query for. Querying the past 3 hours while running the script every 2 hours shouldn't be an issue since we don't have to worry about duplicates being recorded.

I think for this pull request, missing some post commit information is fine. I'll play around with some different solutions for addressing the edge cases (whether that be changing the queries, increasing the frequency, or both) and open a follow up with whatever yields the best results.

[CI] Add new script + cronjob to amend pull request data

e58c3c9

jriv01 requested review from boomanaiden154 and cmtice March 16, 2026 20:35

boomanaiden154 reviewed Mar 16, 2026

View reviewed changes

jriv01 added 2 commits March 16, 2026 20:52

Correct mistakes in comments

4610259

Remove nodeSelector field from manifests

454b892

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add new script + cronjob to amend pull request data#809

[CI] Add new script + cronjob to amend pull request data#809
jriv01 wants to merge 3 commits intollvm:mainfrom
jriv01:amend-data

jriv01 commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

boomanaiden154 Mar 16, 2026

Uh oh!

jriv01 Mar 16, 2026

Uh oh!

boomanaiden154 Mar 16, 2026

Uh oh!

jriv01 Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		LLVM_REVIEWS_TABLE = "llvm_reviews"


		def fetch_open_pull_requests_from_github(

Conversation

jriv01 commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

boomanaiden154 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

jriv01 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

boomanaiden154 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

jriv01 Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jriv01 Mar 16, 2026 •

edited

Loading