Skip to content

HDDS-14866. Enhance DiskBalancer Report to show individual volume's density#9969

Merged
ChenSammi merged 12 commits intoapache:masterfrom
Gargi-jais11:HDDS-14866
Mar 31, 2026
Merged

HDDS-14866. Enhance DiskBalancer Report to show individual volume's density#9969
ChenSammi merged 12 commits intoapache:masterfrom
Gargi-jais11:HDDS-14866

Conversation

@Gargi-jais11
Copy link
Copy Markdown
Contributor

@Gargi-jais11 Gargi-jais11 commented Mar 24, 2026

What changes were proposed in this pull request?

The DiskBalancer report today shows only:

ozone admin datanode diskbalancer report --in-service-datanodes
Report result:
Datanode                                VolumeDensity
dn-hostname-3 (10.141.248.70:19864)     0.09267551461620249
dn-hostname-1 (10.141.128.135:19864)    0.06619677701803184
dn-hostname-2 (10.141.126.8:19864)      0.026044182616493772

So users see only a single aggregate VolumeDensity per datanode, with no per-disk breakdown.
According to above report if user wants to run diskbalancer at threshold lower that 10% say at 5% it interprets that diskbalancer will start on DN-3 and DN-1. But it does not start and creates confusion that diskbalancer is not working correctly.
This is because in reality this threshold value checks wether each volumes utilisation is above or below or within the range.

Proposed Solution:
We should also show each volume's density along with the details of each volumes utilisation and pre-allocated container bytes

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14866

How was this patch tested?

Added a unit test and tested manually as well.
DiskBalancer Report before patch:

ozone admin datanode diskbalancer report --in-service-datanodes
Report result:
Datanode                                VolumeDensity
dn-hostname-3 (10.141.248.70:19864)     0.09267551461620249
dn-hostname-1 (10.141.128.135:19864)    0.06619677701803184
dn-hostname-2 (10.141.126.8:19864)      0.026044182616493772

// json
bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes --json
[ {
  "datanode" : "dn-hostname-3 (10.141.248.70:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.09267551461620249
}, {
  "datanode" : "dn-hostname-1 (10.141.128.135:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.06619677701803184
}, {
  "datanode" : "dn-hostname-2 (10.141.126.8:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.026044182616493772
}
} ]

DiskBalancer Report after enhancement:

bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes
Report result:
Datanode: ozone-datanode-2.ozone_default (172.18.0.8:19864)
Aggregate VolumeDataDensity: 0.001985939360415162
IdealUsage: 0.12334072 | Threshold: 10.0% | ThresholdRange: (0.02334072, 0.22334072)

Volume Details -:

StorageID                                     StoragePath                                TotalCapacity       UsedSpace   Container Pre-AllocatedSpace   EffectiveUsedSpace     Utilization   VolumeDensity
DS-1f7c1811-c0f2-4b47-83aa-4fad46aaba41       /data/hdds1/hdds                              1006.75 GB         5.43 GB                            0 B            123.17 GB      0.12234807      0.00099265
DS-d82ba295-999c-4808-935e-e0f043388363       /data/hdds2/hdds                              1006.75 GB         6.47 GB                         892 MB            124.17 GB      0.12334039      0.00000032
DS-1b9db594-cebc-46b7-b0c1-19f587b93181       /data/hdds3/hdds                              1006.75 GB         4.79 GB                        1.74 GB            125.17 GB      0.12433369      0.00099297

-------

Datanode: ozone-datanode-1.ozone_default (172.18.0.9:19864)
Aggregate VolumeDataDensity: 1.1510817523735506E-4
IdealUsage: 0.12332519 | Threshold: 10.0% | ThresholdRange: (0.02332519, 0.22332519)

Volume Details -:

StorageID                                     StoragePath                                TotalCapacity       UsedSpace   Container Pre-AllocatedSpace   EffectiveUsedSpace     Utilization   VolumeDensity
DS-61468f15-6b59-47f0-a9e0-807df0b35e53       /data/hdds1/hdds                              1006.75 GB         5.56 GB                         887 MB            124.22 GB      0.12338210      0.00005691
DS-b0380718-d57c-4706-a640-6fc3e7c78da0       /data/hdds2/hdds                              1006.75 GB         4.65 GB                         885 MB            124.10 GB      0.12326764      0.00005755
DS-2ea9c51e-9bc9-4328-b567-b0a1fe975e01       /data/hdds3/hdds                              1006.75 GB         6.48 GB                         886 MB            124.16 GB      0.12332584      0.00000065

-------

Datanode: ozone-datanode-3.ozone_default (172.18.0.6:19864)
Aggregate VolumeDataDensity: 8.147096356556083E-5
IdealUsage: 0.12333837 | Threshold: 10.0% | ThresholdRange: (0.02333837, 0.22333837)

Volume Details -:

StorageID                                     StoragePath                                TotalCapacity       UsedSpace   Container Pre-AllocatedSpace   EffectiveUsedSpace     Utilization   VolumeDensity
DS-9edbbab2-a426-41ea-ad27-34aae4b4b173       /data/hdds1/hdds                              1006.75 GB         6.46 GB                         888 MB            124.21 GB      0.12337910      0.00004074
DS-44aaa14b-6a75-4ce5-a114-d80b288a3b83       /data/hdds2/hdds                              1006.75 GB         5.55 GB                         887 MB            124.13 GB      0.12329769      0.00004068
DS-a7fcc373-7e78-4afc-acc1-667edb80d980       /data/hdds3/hdds                              1006.75 GB         4.65 GB                         887 MB            124.17 GB      0.12333831      0.00000005


Note:
  - Aggregate VolumeDataDensity: Sum of per-volume density (deviation from ideal); higher means more imbalance.
  - IdealUsage: Target utilization ratio (0-1) when volumes are evenly balanced.
  - ThresholdRange: Acceptable deviation (percent); volumes within IdealUsage +/- Threshold are considered balanced.
  - VolumeDensity: Deviation of a particular volume's utilization from IdealUsage.
  - Utilization: Ratio of actual used space to capacity (0-1) for a particular volume.
  - TotalCapacity: Total volume capacity.
  - UsedSpace: Ozone used space.
  - Container Pre-AllocatedSpace: Space reserved for containers not yet written to disk.
  - EffectiveUsedSpace: This is the actual used space of volume which is visible to the diskBalancer : (ozoneCapacity minus ozoneAvailable) + containerPreAllocatedSpace + move delta for source volume.


  // json output
  
  bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes --json
[ {
  "datanode" : "ozone-datanode-2.ozone_default (172.18.0.8:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.0017305026794109252,
  "idealUsage" : "0.12493073",
  "threshold %" : 10.0,
  "thresholdRange" : "(0.02493073, 0.22493073)",
  "volumes" : [ {
    "storageId" : "DS-1f7c1811-c0f2-4b47-83aa-4fad46aaba41",
    "storagePath" : "/data/hdds1/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "5.43 GB",
    "containerPreAllocatedSpace" : "0 B",
    "effectiveUsedSpace" : "124.90 GB",
    "utilization" : 0.1240654821201758,
    "volumeDensity" : 8.652513397054695E-4
  }, {
    "storageId" : "DS-d82ba295-999c-4808-935e-e0f043388363",
    "storagePath" : "/data/hdds2/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "6.47 GB",
    "containerPreAllocatedSpace" : "892 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.12493073345988127,
    "volumeDensity" : 0.0
  }, {
    "storageId" : "DS-1b9db594-cebc-46b7-b0c1-19f587b93181",
    "storagePath" : "/data/hdds3/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "4.79 GB",
    "containerPreAllocatedSpace" : "1.74 GB",
    "effectiveUsedSpace" : "126.65 GB",
    "utilization" : 0.12579598479958673,
    "volumeDensity" : 8.652513397054556E-4
  } ]
}, {
  "datanode" : "ozone-datanode-1.ozone_default (172.18.0.9:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 1.940025425348213E-6,
  "idealUsage" : "0.12492491",
  "threshold %" : 10.0,
  "thresholdRange" : "(0.02492491, 0.22492491)",
  "volumes" : [ {
    "storageId" : "DS-61468f15-6b59-47f0-a9e0-807df0b35e53",
    "storagePath" : "/data/hdds1/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "5.56 GB",
    "containerPreAllocatedSpace" : "887 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.1249258833963179,
    "volumeDensity" : 9.700127126671676E-7
  }, {
    "storageId" : "DS-b0380718-d57c-4706-a640-6fc3e7c78da0",
    "storagePath" : "/data/hdds2/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "4.65 GB",
    "containerPreAllocatedSpace" : "885 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.12492394337089255,
    "volumeDensity" : 9.700127126810454E-7
  }, {
    "storageId" : "DS-2ea9c51e-9bc9-4328-b567-b0a1fe975e01",
    "storagePath" : "/data/hdds3/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "6.48 GB",
    "containerPreAllocatedSpace" : "886 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.12492491338360523,
    "volumeDensity" : 0.0
  } ]
}, {
  "datanode" : "ozone-datanode-3.ozone_default (172.18.0.6:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 1.2933502835793531E-6,
  "idealUsage" : "0.12492621",
  "threshold %" : 10.0,
  "thresholdRange" : "(0.02492621, 0.22492621)",
  "volumes" : [ {
    "storageId" : "DS-9edbbab2-a426-41ea-ad27-34aae4b4b173",
    "storagePath" : "/data/hdds1/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "6.46 GB",
    "containerPreAllocatedSpace" : "888 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.12492685340903058,
    "volumeDensity" : 6.466751417827377E-7
  }, {
    "storageId" : "DS-44aaa14b-6a75-4ce5-a114-d80b288a3b83",
    "storagePath" : "/data/hdds2/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "5.55 GB",
    "containerPreAllocatedSpace" : "887 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.1249258833963179,
    "volumeDensity" : 3.233375708983077E-7
  }, {
    "storageId" : "DS-a7fcc373-7e78-4afc-acc1-667edb80d980",
    "storagePath" : "/data/hdds3/hdds",
    "totalCapacity" : "1006.75 GB",
    "usedSpace" : "4.65 GB",
    "containerPreAllocatedSpace" : "887 MB",
    "effectiveUsedSpace" : "125.77 GB",
    "utilization" : 0.1249258833963179,
    "volumeDensity" : 3.233375708983077E-7
  } ]
} ]

@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review March 24, 2026 07:14
@sreejasahithi
Copy link
Copy Markdown
Contributor

@Gargi-jais11 , can you please add the before and after of the json output.

Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for working on this , left few comments

@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

@ChenSammi Please review this PR.

@ChenSammi
Copy link
Copy Markdown
Contributor

@Gargi-jais11 , why we only show "Pre-Allocated Container Bytes" for usage detail? Imaging you are the user which is in doubt why no container is moved, what will be your steps to investigate the issue depending on these new outputs? Which outputs will be most useful?

@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

@ChenSammi utilisation shown for each dn is definitly what we need the user to understand but if we see the dn ui used space shown for each volume is always less than actual used as it contains pre allocated and reserved as well. So I am showing this pre allocated because it will make the suer understand why the used bytes show is suppose 3GB with that volume capacity of 10GB but has utilisation of 50% because 2GB is occupied by pre allocated.

If you suggest may be we can also add the usedBytes for that volume which can clearly show the user that usedBytes + pre-allocated = utilisation for that dn.

@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

2026-03-06 12:27:02,444 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Disk balancing state - idealUsage=0.5094568501, thresholdPercentage=10.0%, thresholdRange=(0.4094568501, 0.6094568501), containerSize=5368709120
2026-03-06 12:27:02,444 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Volume[0] - disk=DS-bb305d7f-d780-4832-962b-1174fc11757d, utilization=0.5027842722, capacity=53681722491, effectiveUsed=26990325773, available=53534941184, usableSpace=5216560238, committedBytes=26843544466, delta=0
2026-03-06 12:27:02,444 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Volume[1] - disk=DS-baa06cd6-702b-49fc-9887-e241bce06863, utilization=0.5027846478, capacity=53681722491, effectiveUsed=26990345938, available=53534920704, usableSpace=5216540073, committedBytes=26843544151, delta=0
2026-03-06 12:27:02,445 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Volume[2] - disk=DS-ef78e643-5126-4fa0-8c30-8ff15564f77f, utilization=0.5228016301, capacity=53681722491, effectiveUsed=28064892027, available=25616830464, usableSpace=4141993984, committedBytes=0, delta=0
2                       

Let's take help of above example is any user is giving 10% threshold than it won't start using the threshold range we can say that all have within the range so no movement as per the debug log
But then what user sees from the dn ui for each volume utilisation is :

Disk1-50%
Disk2-1%
Disk3-1%

So giving threshold 10% user assumes it should start but if we have utilisation for each shown that would be great as it shows the clear picture. so as per above log all dn utilisation is somewhat around 52%:

Disk1-50%
Disk2-50%
Disk3-52%

So user will ask why utilisation is high even if it shows very less used bytes on aprticular voolume that's why added pre allocated container bytes to tell this is the one which is also contributing for used bytes and utilisation to be high.
I hope this clears why I am also adding pre-allocated bytes.

@ChenSammi
Copy link
Copy Markdown
Contributor

datanode debug log by default is not available. Does DN web UI show each volume's capacity and used space?
Since we already provide this new command, mainly for debug purpose, it will be more user friendly if we provide all info in one command, instead of user has to visit different places to get a full picture.

BTW, can we limit the Utilization precision, and ThresholdRange? maybe 5 digital, for example, 0.00001 is enough. Or current 20 digital precision is more helpful for debug?

@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

datanode debug log by default is not available. Does DN web UI show each volume's capacity and used space? Since we already provide this new command, mainly for debug purpose, it will be more user friendly if we provide all info in one command, instead of user has to visit different places to get a full picture.

BTW, can we limit the Utilization precision, and ThresholdRange? maybe 5 digital, for example, 0.00001 is enough. Or current 20 digital precision is more helpful for debug?

Dn web UI does show the capacity and usep space of each volume but with that the used space shown is not the actual what dn sees, it just shows the current usedbytes that volume has and doesnot consider showing preallocated byte which actual contribute a lot in volume used space.
I agree with u that it will be good if we show all info in one command. I will add used space, total capacity as well for each volume.

StorageID                 StoragePath               TotalCapacity           UsedSpace         Container Pre-AllocatedBytes                          EffectiveUsedSpace    Utilization    VolumeDensity

Does the above columns look good, I believe all these are enough to analyse why volumes are not chosen?

Precision of 5 is too less we can keep atleast 10 for each utilisation and threshold range.

@ChenSammi
Copy link
Copy Markdown
Contributor

Does EffectiveUsedSpace equal to TotalCapacity - UsedSpace + Pre-AllocatedBytes? If yes, then I think this one can be optional. I'm fine whether you show it or not. The rest looks good to me.

Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for updating the patch.
overall LGTM

+1 , I feel it would be useful for the users if we include UsedSpace , EffectiveUsedSpace, and Utilization in both the text and json output.

@sreejasahithi
Copy link
Copy Markdown
Contributor

Does EffectiveUsedSpace equal to TotalCapacity - UsedSpace + Pre-AllocatedBytes?

@ChenSammi , I believe EffectiveUsedSpace is UsedSpace + Pre-AllocatedBytes

@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

Gargi-jais11 commented Mar 27, 2026

Does EffectiveUsedSpace equal to TotalCapacity - UsedSpace + Pre-AllocatedBytes? If yes, then I think this one can be optional. I'm fine whether you show it or not. The rest looks good to me.

effectiveUsed = usage.getCapacity() - usage.getAvailable() + committed + required; required is delta update by source volume which will be freed up.
Okay I will update then.
committed = pre-allocated bytes.

@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

@ChenSammi I have updated diskBalancerReport, you can have a fresh look now.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances the ozone admin datanode diskbalancer report output to include per-volume utilization details (and corresponding JSON fields), reducing confusion caused by only showing an aggregate density per datanode.

Changes:

  • Extend DatanodeDiskBalancerInfoProto with idealUsage and per-volume VolumeReportProto details.
  • Populate and transmit per-volume report data from datanode DiskBalancer service to CLI.
  • Update CLI report formatting (text + JSON) and expand unit tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DiskBalancerReportSubcommand.java Adds per-volume table output and richer JSON report fields.
hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/datanode/TestDiskBalancerSubCommands.java Updates report JSON assertions and extends random report proto generation with volume info.
hadoop-hdds/interface-client/src/main/proto/hdds.proto Adds VolumeReportProto and new fields on DatanodeDiskBalancerInfoProto.
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java Populates idealUsage and per-volume report data from volume usage snapshots.
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerProtocolServer.java Sends idealUsage and volumeInfo over the DiskBalancer protocol.
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerInfo.java Adds report-only fields to hold idealUsage and volumeInfo.
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/diskbalancer/TestDiskBalancerProtocolServer.java Extends protocol-server tests to validate the new report fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ChenSammi and others added 5 commits March 30, 2026 12:12
…m/cli/datanode/DiskBalancerReportSubcommand.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…m/cli/datanode/DiskBalancerReportSubcommand.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…m/cli/datanode/DiskBalancerReportSubcommand.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…m/cli/datanode/TestDiskBalancerSubCommands.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ozone/container/diskbalancer/DiskBalancerService.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Gargi-jais11
Copy link
Copy Markdown
Contributor Author

@ChenSammi Please review I have resolved Co-pilot comments.

@ChenSammi ChenSammi self-requested a review March 30, 2026 08:15
@ChenSammi ChenSammi merged commit 1b83e16 into apache:master Mar 31, 2026
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants