Skip to content

feat(hive): add HmsClient connection lifecycle and URI parsing#796

Open
MisterRaindrop wants to merge 2 commits into
apache:mainfrom
MisterRaindrop:feat/hive-hms-client
Open

feat(hive): add HmsClient connection lifecycle and URI parsing#796
MisterRaindrop wants to merge 2 commits into
apache:mainfrom
MisterRaindrop:feat/hive-hms-client

Conversation

@MisterRaindrop

@MisterRaindrop MisterRaindrop commented Jul 1, 2026

Copy link
Copy Markdown
Contributor
  • ParseHmsUris() — tolerant HMS URI parser: thrift://host:port, bare host:port, default port 9083, and comma-separated HA lists. Rejects empty hosts and invalid ports.
  • HmsClient::Connect() — sets up the TSocket → TTransport → TBinaryProtocol → ThriftHiveMetastoreClient chain (transport and timeouts from HiveCatalogProperties), translating Thrift exceptions to kIOError. Thrift types stay out of the public header via a pImpl.
  • CI — new Ubuntu Hive job builds with -DICEBERG_BUILD_HIVE=ON (bundled Thrift) and runs hive_catalog_test.

Add the first iceberg_hive code that actually talks to a Hive Metastore
over Thrift, mirroring the structure of iceberg-rust's HmsCatalog::new
while keeping every Thrift type out of the public header via a pImpl.

  * HmsEndpoint + ParseHmsUris(): tolerant URI parser that accepts
    `thrift://host:port`, bare `host:port`, missing-port (defaults to
    9083), and comma-separated lists for HA failover, with whitespace
    around each segment stripped. Returns InvalidArgument for empty
    hosts, non-numeric or out-of-range ports, and empty list segments.

  * HmsClient::Connect(): wires TSocket -> TBufferedTransport /
    TFramedTransport (selected by HiveCatalogProperties::ThriftTransport)
    -> TBinaryProtocol -> ThriftHiveMetastoreClient. Connect / socket
    timeouts come from the properties. Connection failures are caught
    and translated into ErrorKind::kIOError so callers never see raw
    Thrift exceptions. The dtor best-effort-closes the transport.

  * HmsClient::Impl holds the Thrift state in destruction-order
    (client -> protocol -> transport -> socket) so teardown is clean.

  * src/iceberg/test/hms_client_test.cc adds 15 GoogleTest cases:
    11 covering ParseHmsUris (single, multi-HA, default port, scheme
    prefix, whitespace, empty/bad host/port edges) and 4 covering
    HmsClient::Connect's error paths (missing URI, bad URI, invalid
    transport mode, unreachable HMS).

  * src/iceberg/test/CMakeLists.txt gains an add_hive_iceberg_test()
    helper (mirroring add_rest_iceberg_test) and the hive_catalog_test
    target gated on ICEBERG_BUILD_HIVE.

`ctest --test-dir build --output-on-failure` now reports 17/17 passing
(16 previous + 1 new hive_catalog_test with 15 cases inside).

Part of the iceberg-cpp HiveCatalog port (C06).
@MisterRaindrop MisterRaindrop marked this pull request as ready for review July 1, 2026 06:54
Add an Ubuntu CI job that configures with -DICEBERG_BUILD_HIVE=ON (via
ICEBERG_EXTRA_CMAKE_ARGS) so the iceberg_hive library and
hive_catalog_test are compiled and run, and enable ICEBERG_BUILD_HIVE in
the cpp-linter build so clang-tidy resolves hms_client.cc's Thrift
include paths from the compilation database. Thrift comes from Arrow's
bundled build (ICEBERG_BUNDLE_THRIFT=ON, the default), so no system
Thrift install is required.
@MisterRaindrop MisterRaindrop force-pushed the feat/hive-hms-client branch from e2817ce to 31299ec Compare July 1, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant