Skip to content

docs(table-design): add default-first Partitioning & Bucketing landing#3940

Open
dataroaring wants to merge 10 commits into
apache:masterfrom
dataroaring:docs-partitioning-default-first
Open

docs(table-design): add default-first Partitioning & Bucketing landing#3940
dataroaring wants to merge 10 commits into
apache:masterfrom
dataroaring:docs-partitioning-default-first

Conversation

@dataroaring

Copy link
Copy Markdown
Contributor

What

  • Add a default-first landing page for Table Design > Partitioning & Bucketing. It opens with a recommended CREATE TABLE (auto partition by time + BUCKETS AUTO), a "when to customize" decision table, and links to the how-tos.
  • Make it the category landing (was the 37KB basic-concepts page).
  • Reorder the how-tos default-first: auto, then dynamic, then manual (manual was listed first).
  • Repurpose basic-concepts as the "How It Works" explanation page it already is (title + intro updated; body unchanged).

Why

The section previously opened on basic-concepts (~37KB), which mixes core concepts, the first CREATE TABLE, advanced modes, design recommendations, and operations in one page. A reader who just wants a sensible partitioning setup had to read the full model first, and the sidebar surfaced the most manual option (manual partitioning) before the recommended automated one.

Leading with a recommended default and a short decision guide gets users to a working, well-distributed table quickly, while the full model stays one click away in the dedicated explanation page.

Scope

  • English only.
  • Applies to both current docs (docs/) and versioned_docs/version-4.x; both sidebars updated.
  • New doc id: table-design/data-partitioning/overview.
  • basic-concepts keeps its URL (still linked from the landing and the how-tos), so no links break.
  • Deeper de-duplication (advanced partition/bucket sections in basic-concepts vs the auto/dynamic/manual/bucketing pages) is intentionally left for a follow-up, since it needs section-by-section verification before removing content.

Validation

  • version-4.x-sidebars.json parses as valid JSON.
  • All six referenced doc ids resolve to files.
  • Recommended CREATE TABLE uses the documented AUTO PARTITION BY RANGE (date_trunc(...)) + BUCKETS AUTO syntax.

The Partitioning & Bucketing section opened on a 37KB concept page
(basic-concepts) and listed the most manual option first, so a reader had to
absorb the full model before seeing a recommended configuration.

Add a default-first landing page that leads with a recommended CREATE TABLE
(auto partition by time + BUCKETS AUTO), a decision table for when to
customize, and links to the how-tos. Make it the category landing, reorder the
how-tos default-first (auto, then dynamic, then manual), and repurpose
basic-concepts as the 'How It Works' explanation page it already is.

Content of basic-concepts is unchanged apart from its title and intro; deeper
de-duplication (advanced modes vs the auto/dynamic/manual pages) is left for a
follow-up. Applies to both current docs and version-4.x.
…ts page

Remove the three CREATE TABLE Tab examples in 'Advanced: Partition Modes' and
the Auto Bucketing example, which duplicate the dedicated auto/dynamic/manual
partitioning and data-bucketing pages; replace with links. Keep the partition-
mode comparison table, Colocate, and all design recommendations (FE/BE tablet
limits, bucket-count guidance), which are not duplicated elsewhere. Removes the
now-unused Tabs imports. ~140 lines lighter.
Add the 中文 partition/bucket landing page and apply the same default-first
restructure and example trim to the 中文 basic-concepts page, matching the
English changes. Current docs and version-4.x.
@dataroaring

Copy link
Copy Markdown
Contributor Author

Update: added the 中文 counterparts, so this PR is now bilingual (EN + zh-CN), covering both current and version-4.x. The "English only" note in the description is superseded.

Plainer wording and shorter sentences on the partitioning overview and the
intro of the concepts page. Meaning unchanged.
…nding

Use conditional 'If .../for ...' phrasing instead of comma-splice clauses.
Surface the cross-cutting 'expire old data' need on the partitioning landing
with a comparison: dynamic partitioning uses dynamic_partition.start (time-based
rolling window) and auto RANGE partitioning uses partition.retention_count
(count-based). Note the deprecated auto+dynamic combo, and that retention drops
data whereas tiered storage migrates it. Links to each mode's page for detail;
no standalone page (the two mechanisms differ and are coupled to their modes).
EN + zh-CN.
Both dynamic and auto retention keep the most recent time-ordered partitions;
they differ only in how the limit is expressed: dynamic_partition.start is a
time window, partition.retention_count is a partition count. Note they are
effectively equivalent for regular time partitions and diverge for irregular
granularity or stale data. Corrects the earlier 'time-based vs count-based'
overstatement. EN + zh-CN.
…odes'

Promote Auto Partitioning as the top-level recommended mode; collapse the two
alternatives (manual for explicit control, dynamic which Auto supersedes) under
a collapsed 'Other Partition Modes' category, so the partitioning section shows
fewer entries by default. Not labeled 'Legacy' because manual partitioning is
not deprecated (it is the explicit-control / LIST option).
…rseded

In the overview decision table, recommend auto partitioning and route the
rolling-window case to it (auto + partition.retention_count), noting dynamic is
superseded. Keep manual for the schemes auto cannot express (custom/irregular
ranges, numeric-column ranges, grouped LIST values). EN + zh-CN.
Drop the 'Other Partition Modes' grouping (manual is not legacy, so it should
not be bucketed away). Keep a flat list ordered auto -> manual -> dynamic, and
mark dynamic partitioning as legacy via a sidebar label '(Legacy)' and a
strengthened callout pointing to auto as its successor. EN + zh-CN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant