Skip to content

translate : recipes_source/distributed_device_mesh.rst#1132

Open
ehdtjr wants to merge 4 commits into
PyTorchKorea:masterfrom
ehdtjr:translate/distributed_device_mesh
Open

translate : recipes_source/distributed_device_mesh.rst#1132
ehdtjr wants to merge 4 commits into
PyTorchKorea:masterfrom
ehdtjr:translate/distributed_device_mesh

Conversation

@ehdtjr
Copy link
Copy Markdown

@ehdtjr ehdtjr commented May 16, 2026

๋ผ์ด์„ ์Šค ๋™์˜

๋ณ€๊ฒฝํ•ด์ฃผ์‹œ๋Š” ๋‚ด์šฉ์— BSD 3ํ•ญ ๋ผ์ด์„ ์Šค๊ฐ€ ์ ์šฉ๋จ์„ ๋™์˜ํ•ด์ฃผ์…”์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๊ธฐ์—ฌํ•˜๊ธฐ ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.

๋™์˜ํ•˜์‹œ๋ฉด ์•„๋ž˜ [ ]๋ฅผ [x]๋กœ ๋งŒ๋“ค์–ด์ฃผ์„ธ์š”.

  • ๊ธฐ์—ฌํ•˜๊ธฐ ๋ฌธ์„œ๋ฅผ ํ™•์ธํ•˜์˜€์œผ๋ฉฐ, ๋ณธ PR ๋‚ด์šฉ์— BSD 3ํ•ญ ๋ผ์ด์„ ์Šค๊ฐ€ ์ ์šฉ๋จ์— ๋™์˜ํ•ฉ๋‹ˆ๋‹ค.

๊ด€๋ จ ์ด์Šˆ ๋ฒˆํ˜ธ

์ด Pull Request์™€ ๊ด€๋ จ์žˆ๋Š” ์ด์Šˆ ๋ฒˆํ˜ธ๋ฅผ ์ ์–ด์ฃผ์„ธ์š”.

์ด์Šˆ ๋˜๋Š” PR ๋ฒˆํ˜ธ ์•ž์— #์„ ๋ถ™์ด์‹œ๋ฉด ์ œ๋ชฉ์„ ๋ฐ”๋กœ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์˜ˆ. #999 )

PR ์ข…๋ฅ˜

์ด PR์— ํ•ด๋‹น๋˜๋Š” ์ข…๋ฅ˜ ์•ž์˜ [ ]์„ [x]๋กœ ๋ณ€๊ฒฝํ•ด์ฃผ์„ธ์š”.

  • ์˜คํƒˆ์ž๋ฅผ ์ˆ˜์ •ํ•˜๊ฑฐ๋‚˜ ๋ฒˆ์—ญ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ธฐ์—ฌ
  • ๋ฒˆ์—ญ๋˜์ง€ ์•Š์€ ํŠœํ† ๋ฆฌ์–ผ์„ ๋ฒˆ์—ญํ•˜๋Š” ๊ธฐ์—ฌ
  • ๊ณต์‹ ํŠœํ† ๋ฆฌ์–ผ ๋‚ด์šฉ์„ ๋ฐ˜์˜ํ•˜๋Š” ๊ธฐ์—ฌ
  • ์œ„ ์ข…๋ฅ˜์— ํฌํ•จ๋˜์ง€ ์•Š๋Š” ๊ธฐ์—ฌ

PR ์„ค๋ช…

recipes_source/distributed_device_mesh.rst ๋ฌธ์„œ๋ฅผ ๋ฒˆ์—ญํ•˜์˜€์Šต๋‹ˆ๋‹ค.

@testofschool
Copy link
Copy Markdown
Contributor

์•ˆ๋…•ํ•˜์„ธ์š” ๋™์„๋‹˜, ์ „์ฒด์ ์œผ๋กœ ๊น”๋”ํ•œ ๋ฒˆ์—ญ์ธ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค, LGTM!

@ptesogno
Copy link
Copy Markdown

line 177์—์„œ ์ฐธ๊ณ  ๋ฌธ์„œ ์ œ๋ชฉ์ด ๋ฒˆ์—ญ๋˜์–ด ์žˆ๋Š”๋ฐ, line 178์ฒ˜๋Ÿผ ์›๋ฌธ ๊ทธ๋Œ€๋กœ ๋‘๋Š” ๊ฒŒ ๋‚ซ์ง€ ์•Š์„๊นŒ ์‹ถ์Šต๋‹ˆ๋‹ค.
์ˆ˜๊ณ  ๋งŽ์œผ์…จ์Šต๋‹ˆ๋‹ค!

Copy link
Copy Markdown
Member

@hyoyoung hyoyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๊ธด ๋ฌธ์„œ ๋ฒˆ์—ญํ•˜๋А๋ผ ์ˆ˜๊ณ ํ•˜์…จ์Šต๋‹ˆ๋‹ค.
์‚ฌ์†Œํ•œ ์ œ์•ˆ ์‚ฌํ•ญ ๋ช‡๊ฐ€์ง€ ํ™•์ธ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค

=====================================================

**Author**: `Iris Zhang <https://github.com/wz337>`__, `Wanchao Liang <https://github.com/wanchaol>`__
**์ €์ž**: `Iris Zhang <https://github.com/wz337>`__, `Wanchao Liang <https://github.com/wanchaol>`__
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

์ €์ž ์•„๋ž˜์— ์—ญ์ž ํ•ญ๋ชฉ ์ถ”๊ฐ€ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค

DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is required. For example, when your parallelism solutions require both communication across hosts and within each host.
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogeneous setup.
DeviceMesh๋Š” ์—ฌ๋Ÿฌ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์„ ์กฐํ•ฉ(composability)ํ•ด์•ผ ํ•˜๋Š” ๋‹ค์ฐจ์› ๋ณ‘๋ ฌํ™”(์˜ˆ: 3-D ๋ณ‘๋ ฌ)๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์ด ํ˜ธ์ŠคํŠธ ๊ฐ„ ํ†ต์‹ ๊ณผ ๊ฐ ํ˜ธ์ŠคํŠธ ๋‚ด๋ถ€์˜ ํ†ต์‹ ์„ ๋ชจ๋‘ ์š”๊ตฌํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค.
์œ„ ์ด๋ฏธ์ง€๋Š” ๊ท ์ผํ•œ ํ™˜๊ฒฝ์—์„œ ๊ฐ ํ˜ธ์ŠคํŠธ ๋‚ด๋ถ€์˜ ๋””๋ฐ”์ด์Šค๋ฅผ ์—ฐ๊ฒฐํ•˜๊ณ , ๊ฐ ๋””๋ฐ”์ด์Šค๋ฅผ ๋‹ค๋ฅธ ํ˜ธ์ŠคํŠธ์˜ ๋Œ€์‘ ๋””๋ฐ”์ด์Šค์™€ ์—ฐ๊ฒฐํ•˜๋Š” 2D ๋ฉ”์‹œ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๊ท ์ผํ•œ ํ™˜๊ฒฝ๋„ ์ข‹์ง€๋งŒ ๋™์ผํ•œ ๊ตฌ์„ฑ์˜ ํ™˜๊ฒฝ์œผ๋กœ ๋ฐ”๊พธ๋ฉด ์ž์—ฐ์Šค๋Ÿฌ์›Œ์งˆ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค

First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
replicate group to each rank.
DeviceMesh๊ฐ€ ์—†๋‹ค๋ฉด, ์–ด๋–ค ๋ณ‘๋ ฌํ™”๋ฅผ ์ ์šฉํ•˜๊ธฐ ์ „์— ๊ฐ ํ”„๋กœ์„ธ์Šค๋งˆ๋‹ค NCCL ํ†ต์‹ ๊ธฐ์™€ CUDA ๋””๋ฐ”์ด์Šค๋ฅผ ์ง์ ‘ ์„ค์ •ํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๋Š” ๊ฝค ๋ณต์žกํ•œ ์ž‘์—…์ž…๋‹ˆ๋‹ค.
๋‹ค์Œ ์ฝ”๋“œ๋Š” :class:`DeviceMesh` ์—†์ด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ƒค๋”ฉ(hybrid sharding) 2-D ๋ณ‘๋ ฌ ํŒจํ„ด์„ ์„ค์ •ํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2-D๋ณด๋‹ค๋Š” 2์ฐจ์› ์ •๋„๋กœ ๋ฐ”๊พธ๋Š”๊ฒƒ์€ ์–ด๋–จ๊นŒ์š”

With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines, and we can still
access the underlying :class:`ProcessGroup` if needed.
:func:`init_device_mesh` ๋ฅผ ํ™œ์šฉํ•˜๋ฉด ์œ„์˜ 2D ์„ค์ •์„ ๋‹จ ๋‘ ์ค„๋กœ ๋๋‚ผ ์ˆ˜ ์žˆ๊ณ , ํ•„์š”ํ•  ๋•Œ๋Š”
๋‚ด๋ถ€์˜ :class:`ProcessGroup` ์—๋„ ๊ทธ๋Œ€๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๊ทธ๋Œ€๋กœ๋Š” ์—†์–ด๋„ ๋ ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค

--------------------------------------------------------
When working with large scale training, you might have more complex custom parallel training composition. For example, you may need to slice out sub-meshes for different parallelism solutions.
DeviceMesh allows users to slice child mesh from the parent mesh and re-use the NCCL communicators already created when the parent mesh is initialized.
๋Œ€๊ทœ๋ชจ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ๋Š” ๋” ๋ณต์žกํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ๋ณ‘๋ ฌ ํ•™์Šต ๊ตฌ์„ฑ์„ ๋‹ค๋ค„์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์„œ๋กœ ๋‹ค๋ฅธ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์— ๋งž์ถฐ ํ•˜์œ„ ๋ฉ”์‹œ(sub-mesh)๋ฅผ ์ž˜๋ผ๋‚ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

์ž˜๋ผ๋‚ด๋Š”๊ฒŒ ์˜๋ฏธ๋Š” ๋งž๋Š”๋ฐ ์กฐ๊ธˆ ๋” ์˜์—ญํ•ด๋„ ์ข‹์„๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค
ํ•˜์œ„ ๋ฉ”์‹œ๋ฅผ ๋‚˜๋ˆ„์–ด ์‚ฌ์šฉํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ์ •๋„๋Š” ์–ด๋–จ๊นŒ์š”

@ehdtjr ehdtjr requested a review from hyoyoung May 29, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants