Skip to content

[feature](workload) Support remote scan bytes breaker in workload policy#64649

Open
wenzhenghu wants to merge 3 commits into
apache:masterfrom
wenzhenghu:feature/workload-policy-remote-read-bytes
Open

[feature](workload) Support remote scan bytes breaker in workload policy#64649
wenzhenghu wants to merge 3 commits into
apache:masterfrom
wenzhenghu:feature/workload-policy-remote-read-bytes

Conversation

@wenzhenghu

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:

This PR adds a new workload policy condition be_scan_bytes_from_remote_storage, which allows Doris to cancel queries according to the amount of data read from remote storage by BE scan tasks. This is useful for limiting external table queries that read too much remote HDFS or object storage data.

Implementation summary:

  • Add a new BE-side workload metric type in thrift for remote storage scan bytes.
  • Add FE workload policy parsing, validation, metadata mapping, and replay support for be_scan_bytes_from_remote_storage.
  • Add BE workload condition evaluation based on io_context()->scan_bytes_from_remote_storage().
  • Add regression coverage using an existing Hive external lineitem table.

Release note

Support workload policy cancellation by BE remote storage scan bytes.

Check List (For Author)

  • Test:
    • FE UT: passed
    • BE UT: passed
    • Regression test: passed, test_workload_policy_remote_scan_bytes
    • Manual test: verified existing workload policy behavior and new remote scan bytes cancellation on a deployed Doris instance
  • Behavior changed: Yes. Add a new workload policy condition be_scan_bytes_from_remote_storage.
  • Does this need documentation: Yes. The workload policy condition list should be updated.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add a Hive external table regression test for workload policy cancellation by BE-side remote storage scan bytes. The test creates an HMS catalog, chooses an existing lineitem table from tpch1_parquet or tpch1, creates a workload policy with be_scan_bytes_from_remote_storage, and verifies that the query is cancelled with the remote scan bytes counter in the error message.

### Release note

None

### Check List (For Author)

- Test: Regression test / syntax check
    - Ran test_workload_policy_remote_scan_bytes against 172.16.0.90:9036
    - Ran Groovy FileSystemCompiler syntax check
    - Ran git diff --cached --check
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 77.32% (1889/2443)
Line Coverage 64.43% (33969/52726)
Region Coverage 64.78% (17457/26948)
Branch Coverage 53.95% (9342/17316)

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29595 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 009d2be54404ed6cb4468a53e6dd79272156c87c, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17629	4205	4199	4199
q2	2029	330	223	223
q3	10284	1455	876	876
q4	4746	478	345	345
q5	7952	887	589	589
q6	262	178	145	145
q7	817	872	646	646
q8	10740	1726	1594	1594
q9	6251	4556	4520	4520
q10	6851	1831	1530	1530
q11	463	295	253	253
q12	668	431	301	301
q13	18175	3609	2803	2803
q14	281	266	248	248
q15	q16	789	779	721	721
q17	1019	1005	958	958
q18	7124	5865	5514	5514
q19	1198	1217	1046	1046
q20	495	412	270	270
q21	5782	2650	2507	2507
q22	428	363	307	307
Total cold run time: 103983 ms
Total hot run time: 29595 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4534	4423	4429	4423
q2	340	360	232	232
q3	4590	4973	4497	4497
q4	2116	2223	1435	1435
q5	4482	4347	4723	4347
q6	275	240	176	176
q7	2180	1900	1656	1656
q8	2660	2327	2286	2286
q9	8426	8180	8037	8037
q10	4879	4767	4318	4318
q11	617	458	383	383
q12	797	757	546	546
q13	3353	3638	2925	2925
q14	290	301	276	276
q15	q16	711	726	640	640
q17	1551	1412	1379	1379
q18	8043	7296	7259	7259
q19	1110	1127	1131	1127
q20	2235	2245	1948	1948
q21	5431	4740	4655	4655
q22	535	483	408	408
Total cold run time: 59155 ms
Total hot run time: 52953 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 175919 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 009d2be54404ed6cb4468a53e6dd79272156c87c, data reload: false

query5	4319	652	494	494
query6	444	198	174	174
query7	4884	580	313	313
query8	362	212	199	199
query9	8793	4117	4114	4114
query10	462	302	254	254
query11	5877	2365	2142	2142
query12	163	104	97	97
query13	1299	638	439	439
query14	6426	5427	5094	5094
query14_1	4435	4441	4403	4403
query15	206	197	180	180
query16	1026	472	432	432
query17	1167	729	595	595
query18	2717	472	353	353
query19	205	193	148	148
query20	117	121	105	105
query21	219	143	117	117
query22	13880	13631	13377	13377
query23	17336	16650	16194	16194
query23_1	16373	16278	16389	16278
query24	7545	1810	1328	1328
query24_1	1332	1359	1369	1359
query25	580	478	410	410
query26	1308	321	174	174
query27	2629	571	350	350
query28	4451	2062	2056	2056
query29	1083	657	501	501
query30	310	238	197	197
query31	1110	1095	970	970
query32	101	63	61	61
query33	533	322	266	266
query34	1188	1186	651	651
query35	771	793	686	686
query36	1344	1352	1182	1182
query37	157	104	90	90
query38	3202	3188	3069	3069
query39	925	927	902	902
query39_1	890	873	887	873
query40	212	122	113	113
query41	70	63	60	60
query42	98	95	96	95
query43	320	326	279	279
query44	1501	772	791	772
query45	191	185	183	183
query46	1060	1203	754	754
query47	2326	2321	2255	2255
query48	402	422	303	303
query49	616	480	347	347
query50	1005	358	275	275
query51	4348	4287	4213	4213
query52	90	96	78	78
query53	260	274	195	195
query54	269	223	198	198
query55	81	74	70	70
query56	233	213	209	209
query57	1412	1417	1331	1331
query58	242	206	204	204
query59	1598	1702	1467	1467
query60	283	249	225	225
query61	153	149	151	149
query62	703	645	583	583
query63	232	198	192	192
query64	2456	787	618	618
query65	4876	4794	4800	4794
query66	1737	471	333	333
query67	29770	29778	29628	29628
query68	3328	1586	1005	1005
query69	425	308	274	274
query70	1095	978	949	949
query71	289	231	210	210
query72	2960	2742	2323	2323
query73	857	802	451	451
query74	5097	4946	4748	4748
query75	2625	2598	2252	2252
query76	2307	1214	797	797
query77	366	377	281	281
query78	12409	12366	11852	11852
query79	1414	1188	744	744
query80	1284	463	392	392
query81	524	277	241	241
query82	604	161	123	123
query83	347	279	249	249
query84	262	145	114	114
query85	897	499	439	439
query86	429	307	300	300
query87	3415	3352	3185	3185
query88	3767	2817	2820	2817
query89	422	373	339	339
query90	1928	194	189	189
query91	176	162	135	135
query92	64	64	55	55
query93	1593	1445	930	930
query94	719	339	306	306
query95	688	388	438	388
query96	1028	799	391	391
query97	2691	2700	2546	2546
query98	217	203	197	197
query99	1172	1182	1029	1029
Total cold run time: 262579 ms
Total hot run time: 175919 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 009d2be54404ed6cb4468a53e6dd79272156c87c, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.26	0.14	0.13
query4	1.61	0.14	0.14
query5	0.23	0.22	0.23
query6	1.19	1.05	1.08
query7	0.04	0.01	0.01
query8	0.05	0.04	0.04
query9	0.37	0.35	0.31
query10	0.56	0.56	0.60
query11	0.20	0.14	0.14
query12	0.18	0.15	0.15
query13	0.47	0.49	0.48
query14	1.00	1.01	1.02
query15	0.61	0.60	0.60
query16	0.33	0.31	0.31
query17	1.13	1.11	1.10
query18	0.22	0.20	0.21
query19	2.02	1.92	1.94
query20	0.02	0.01	0.02
query21	15.41	0.24	0.13
query22	4.76	0.05	0.06
query23	16.14	0.31	0.11
query24	3.08	0.45	0.30
query25	0.11	0.05	0.04
query26	0.72	0.21	0.16
query27	0.04	0.03	0.03
query28	3.52	0.92	0.54
query29	12.54	4.23	3.46
query30	0.28	0.15	0.15
query31	2.78	0.59	0.31
query32	3.23	0.60	0.50
query33	3.29	3.27	3.19
query34	15.59	4.24	3.56
query35	3.53	3.53	3.50
query36	0.54	0.43	0.41
query37	0.09	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.03	0.03
query40	0.19	0.16	0.16
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.66 s
Total hot run time: 25.26 s

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.65% (22/23) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.14% (28419/38331)
Line Coverage 58.04% (309761/533691)
Region Coverage 54.90% (259619/472879)
Branch Coverage 56.21% (112668/200456)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 70.00% (14/20) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants