Skip to content

Track ZooKeeper property creation and modification times#6462

Open
DomGarguilo wants to merge 1 commit into
apache:mainfrom
DomGarguilo:addModTimeTableProps
Open

Track ZooKeeper property creation and modification times#6462
DomGarguilo wants to merge 1 commit into
apache:mainfrom
DomGarguilo:addModTimeTableProps

Conversation

@DomGarguilo

Copy link
Copy Markdown
Member

Fixes #6254

  • adds per-property created and modified metadata to VersionedProperties
    • this metadata is preserved across set, replace, and remove operations
  • for existing property data in ZooKeeper, metadata is backfilled using the blob timestamp
  • this new metadata is displayed in zk info-viewer --print-props

Here is some example output from running on a test table "ts_meta_test" with one custom prop set accumulo zk info-viewer --print-props -t ts_meta_test:

-----------------------------------------------
Report Time: 2026-07-02T17:55:34.493156802Z
-----------------------------------------------
ZooKeeper properties for instance ID: fa04c4d8-ae79-42c2-8d9f-e3200a19079e

Tables:
Name: ts_meta_test, Data Version:6, Data Timestamp: 2026-07-02T17:55:29.749431901Z:
  table.constraint.1=org.apache.accumulo.core.data.constraints.DefaultKeySizeConstraint
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z
  table.custom.owner=domG
    created:  2026-07-02T17:34:53.6940349Z
    modified: 2026-07-02T17:55:07.085812366Z
  table.iterator.majc.vers=20,org.apache.accumulo.core.iterators.user.VersioningIterator
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z
  table.iterator.majc.vers.opt.maxVersions=1
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z
  table.iterator.minc.vers=20,org.apache.accumulo.core.iterators.user.VersioningIterator
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z
  table.iterator.minc.vers.opt.maxVersions=1
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z
  table.iterator.scan.vers=20,org.apache.accumulo.core.iterators.user.VersioningIterator
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z
  table.iterator.scan.vers.opt.maxVersions=1
    created:  2026-07-02T17:34:09.334956155Z
    modified: 2026-07-02T17:34:09.334956155Z


-----------------------------------------------

Here is some pasted lines from some more testing where you can see updating of the created and modified times.

in the shell I ran this:

root@uno> createtable ts_meta_test
root@uno ts_meta_test> config -t ts_meta_test -s table.custom.owner=dom
root@uno ts_meta_test> config -t ts_meta_test -s table.custom.property=fooBar
root@uno ts_meta_test> config -t ts_meta_test -s table.custom.owner=domG
root@uno ts_meta_test> config -t ts_meta_test -s table.custom.owner=domG
root@uno ts_meta_test> config -t ts_meta_test -d table.custom.property

and in between each step I printed the output:

You can see the initial set creates matching timestamps:

  table.custom.owner=dom
    created:  2026-07-02T17:34:53.6940349Z
    modified: 2026-07-02T17:34:53.6940349Z

You can see adding a second property gives it its own timestamps:

  table.custom.property=fooBar
    created:  2026-07-02T17:54:55.574298612Z
    modified: 2026-07-02T17:54:55.574298612Z

You can see changing owner preserves created and updates modified:

  table.custom.owner=domG
    created:  2026-07-02T17:34:53.6940349Z
    modified: 2026-07-02T17:55:07.085812366Z

You can see setting owner=domG again does not update modified:

  Data Version:5
  Data Timestamp: 2026-07-02T17:55:16.216132303Z

  table.custom.owner=domG
    created:  2026-07-02T17:34:53.6940349Z
    modified: 2026-07-02T17:55:07.085812366Z

You can see deleting table.custom.property removes it from the output:

  table.custom.owner=domG
    created:  2026-07-02T17:34:53.6940349Z
    modified: 2026-07-02T17:55:07.085812366Z

@DomGarguilo DomGarguilo self-assigned this Jul 2, 2026
@DomGarguilo DomGarguilo added this to the 4.0.0 milestone Jul 2, 2026
@DomGarguilo

Copy link
Copy Markdown
Member Author

Before merging, I want to discuss the trade off of storing this extra metadata in zookeeper. While it can be helpful to see the create and edit timestamps for individual props, we have to assess whether its worth the increased amount of data written in zookeeper per prop.

The props blob is gzip compressed so things should compress reasonably well but we are still adding two extra timestamps per key.

I do not have a great idea how much this will affect things in a real-world deployment so if anyone has an insights/opinions on things here that would be helpful.

@ctubbsii

ctubbsii commented Jul 3, 2026

Copy link
Copy Markdown
Member

The way I think about the utility of this is analogous to a config file. Each of these blobs of config have a version (modified time) when they are modified already, similar to a config file's modification time on a filesystem. We don't generally expect to be able to open a config file on a filesystem and check the timestamp of when each line in the file is modified, and that's fine, because we don't really need that. If we do, we depend on an external service to monitor changes as patch files or similar (like svn or git). Even if the extra storage isn't that much in ZK, there's also a question of increased code complexity vs. utility.

I think the utility of having the granular versioning baked in is low relative to the increased code quantity/complexity. It is probably simpler, and satisfies user interest better, to just log configuration changes on the server-side, when an RPC request comes in to change the config. Whether it's an API request to remove a property, set a property, or modify the properties, we can easily log the request on the server-side in a very simple manner, without the complexity of tracking the metadata for when and what was changed. Users can collect and analyze those logs if they want to maintain a history of changes. That could also be done very easily in 2.1 also.

That said, I need to spend a little more time looking into this implementation. I was thinking about a couple of edge cases, and need to look more carefully to see how this PR handles those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Determine when table properties have been set or modified

2 participants