All notable changes to this project will be documented in this file.
- Document Helm deployed RBAC permissions and remove unnecessary permissions (#770).
- BREAKING:
configOverridesnow only accepts the known config files (hdfs-site.xml,core-site.xml,hadoop-policy.xml,ssl-server.xml,ssl-client.xmlandsecurity.properties). Previously, arbitrary file names were silently accepted and ignored (#777). - Bump
stackable-operatorto 0.110.1 (#777).
- Added permissions required by Topology Provider (#738).
- Add conversion webhook (#753).
- Support objectOverrides using
.spec.objectOverrides. See objectOverrides concepts page for details (#741).
- Bump stackable-operator to 0.108.0, and strum to 0.28 (#760, #764).
- Gracefully shutdown all concurrent tasks by forwarding the SIGTERM signal (#747).
- Added warning and exit condition to format-namenodes container script to check for corrupted data after formatting (#751).
- Fix "404 page not found" error for the initial object list (#764).
- Previously, some shell output of init-containers was not logged properly and therefore not aggregated, which is fixed now (#746).
- Helm: Allow Pod
priorityClassNameto be configured (#713). - Add end-of-support checker (#718).
EOS_CHECK_MODE(--eos-check-mode) to set the EoS check mode. Currently, only "offline" is supported.EOS_INTERVAL(--eos-interval) to set the interval in which the operator checks if it is EoS.EOS_DISABLED(--eos-disabled) to disable the EoS checker completely.
- Add
prometheus.io/path|port|schemeannotations to metrics service (#721).
- The
prometheus.io/scrapelabel was moved to the metrics service (#721). - The headless service now only exposes product / data ports, the metrics service only metrics ports (#721, #726).
- Bump stackable-operator to
0.100.1and product-config to0.8.0(#722). - Bump testing-tools to
0.3.0-stackable0.0.0-dev(#740).
- Adds new telemetry CLI arguments and environment variables (#672).
- Use
--file-log-max-files(orFILE_LOG_MAX_FILES) to limit the number of log files kept. - Use
--file-log-rotation-period(orFILE_LOG_ROTATION_PERIOD) to configure the frequency of rotation. - Use
--console-log-format(orCONSOLE_LOG_FORMAT) to set the format toplain(default) orjson.
- Use
- The operator now defaults to
AES/CTR/NoPaddingfordfs.encrypt.data.transfer.cipher.suiteto improve security and performance (#693). - The built-in Prometheus servlet is now enabled and metrics are exposed under the
/prompath of all UI services (#695). - Add several properties to
hdfs-site.xmlandcore-site.xmlthat improve general performance and reliability (#696). - Add RBAC rule to helm template for automatic cluster domain detection (#699).
- BREAKING: Replace stackable-operator
initialize_loggingwith stackable-telemetryTracing(#661, #668, #672).- The console log level was set by
HDFS_OPERATOR_LOG, and is now set byCONSOLE_LOG_LEVEL. - The file log level was set by
HDFS_OPERATOR_LOG, and is now set byFILE_LOG_LEVEL. - The file log directory was set by
HDFS_OPERATOR_LOG_DIRECTORY, and is now set byFILE_LOG_DIRECTORY(or via--file-log-directory <DIRECTORY>). - Replace stackable-operator
print_startup_stringwithtracing::info!with fields.
- The console log level was set by
- BREAKING: Inject the vector aggregator address into the vector config using the env var
VECTOR_AGGREGATOR_ADDRESSinstead of having the operator write it to the vector config (#671). - test: Bump to Vector
0.46.1(#677). - BREAKING: Previously this operator would hardcode the UID and GID of the Pods being created to 1000/0, this has changed now (#683)
- The
runAsUserandrunAsGroupfields will not be set anymore by the operator - The defaults from the docker images itself will now apply, which will be different from 1000/0 going forward
- This is marked as breaking because tools and policies might exist, which require these fields to be set
- The
- Use versioned common structs (#684).
- BREAKING: remove legacy service account binding for cluster role nodes (#697).
- BREAKING: Bump stackable-operator to 0.94.0 and update other dependencies (#699).
- The default Kubernetes cluster domain name is now fetched from the kubelet API unless explicitly configured.
- This requires operators to have the RBAC permission to get nodes/proxy in the apiGroup "". The helm-chart takes care of this.
- The CLI argument
--kubernetes-node-nameor env variableKUBERNETES_NODE_NAMEneeds to be set. The helm-chart takes care of this.
- The operator helm-chart now grants RBAC
patchpermissions onevents.k8s.io/events, so events can be aggregated (e.g. "error happened 10 times over the last 5 minutes") (#700).
- Use
jsonfile extension for log files (#667). - Fix a bug where changes to ConfigMaps that are referenced in the HdfsCluster spec didn't trigger a reconciliation (#671).
- Allow uppercase characters in domain names (#699).
- Remove support for HDFS
3.3.4,3.3.6, and3.4.0(#675). - Remove the
lastUpdateTimefield from the stacklet status (#699). - Remove role binding to legacy service accounts (#699).
- The lifetime of auto generated TLS certificates is now configurable with the role and roleGroup
config property
requestedSecretLifetime. This helps reducing frequent Pod restarts (#619). - Run a
containerdebugprocess in the background of each HDFS container to collect debugging information (#629). - Support configuring JVM arguments (#636).
- Aggregate emitted Kubernetes events on the CustomResources (#643).
- Add support for version
3.4.1(#656).
- Bump
stackable-operatorto 0.87.0 andstackable-versionedto 0.6.0 (#655). - Switch the WebUI liveness probe from
httpGetto checking the tcp socket. This helps with setups where configOverrides are used to enable security on the HTTP interfaces. As this results in401HTTP responses (instead of200), this previously failed the liveness checks. - Set the JVM argument
-Xmsin addition to-Xmx(with the same value). This ensure consistent JVM configs across our products (#636). - Default to OCI for image metadata and product image selection (#640).
- BREAKING: Use distinct ServiceAccounts for the Stacklets, so that multiple Stacklets can be deployed in one namespace. Existing Stacklets will use the newly created ServiceAccounts after restart (#616).
- The operator can now run on Kubernetes clusters using a non-default cluster domain.
Use the env var
KUBERNETES_CLUSTER_DOMAINor the operator Helm chart propertykubernetesClusterDomainto set a non-default cluster domain (#591).
- Reduce CRD size from
1.4MBto136KBby accepting arbitrary YAML input instead of the underlying schema for the following fields (#574):podOverridesaffinity
- An invalid
HdfsClusterdoesn't cause the operator to stop functioning (#594).
- Add experimental support for version
3.4.0(#545, #557). We do NOT support upgrading from 3.3 to 3.4 yet!
- Bump
stackable-operatorfrom0.64.0to0.70.0(#546). - Bump
product-configfrom0.6.0to0.7.0(#546). - Bump other dependencies (#549).
- Revert changing the getting started script to use the listener class
cluster-internal(#492) (#493). - Fix HDFS pods crashing on launch when any port names contain dashes (#517).
- Add labels to ephemeral (listener) volumes. These allow
stackablectl stack listto display datanode endpoints (#534) - Processing of corrupted log events fixed; If errors occur, the error messages are added to the log event (#536).
- Added rack awareness support via topology provider implementation (#429, #495).
- More CRD documentation ([#433]).
- Support for exposing HDFS clusters to clients outside of Kubernetes (#450).
- Helm: support labels in values.yaml (#460).
- Add support for OPA authorizer (#474).
- Use new label builders (#454).
- Change the liveness probes to use the web UI port and to fail after one minute (#491).
- Update the getting started script to use the listener class
cluster-internal(#492).
- [BREAKING]
.spec.clusterConfig.listenerClasshas been split to.spec.nameNodes.config.listenerClassand.spec.dataNodes.config.listenerClass, migration will be required when usingexternal-unstable(#450, #462). - [BREAKING] Removed legacy node selector on roleGroups (#454).
- Change default value of
dfs.ha.nn.not-become-active-in-safemodefromtruetofalse(#458). - Removed support for Hadoop
3.2(#475).
- Include hdfs principals
dfs.journalnode.kerberos.principal,dfs.namenode.kerberos.principalanddfs.datanode.kerberos.principalin the discovery ConfigMap in case Kerberos is enabled (#451). - User provided env overrides now work as expected (#499).
- Default stackableVersion to operator version (#381).
- Configuration overrides for the JVM security properties, such as DNS caching (#384).
- Support PodDisruptionBudgets (#394).
- Support graceful shutdown (#407).
- Added support for 3.2.4, 3.3.6 (#409).
vector0.26.0->0.33.0(#378, #409).- Let secret-operator handle certificate conversion (#392).
operator-rs0.44.0->0.55.0(#381, #394, #404, #405, #409).- Consolidate Rust workspace members (#425).
- Don't default roleGroup replicas to zero when not specified (#402).
- [BREAKING] Removed field
autoFormatFs, which was never read (#422).
- Removed support for 3.3.1, 3.3.3 (#409).
- Add support for enabling secure mode with Kerberos (#334).
- Generate OLM bundle for Release 23.4.0 (#350).
- Missing CRD defaults for
status.conditionsfield (#354). - Set explicit resources on all containers (#359).
- Support podOverrides (#368).
- Operator-rs:
0.40.2->0.44.0(#349, #372). - Use 0.0.0-dev product images for testing (#351)
- Use testing-tools 0.2.0 (#351)
- Run as root group (#353).
- Added kuttl test suites (#364)
- Increase the size limit of the log volumes (#372)
- Deploy default and support custom affinities (#319).
- Added OLM bundle files (#328).
- Extend cluster resources for status and cluster operation (paused, stopped) (#337).
- Cluster status conditions (#339).
- [Breaking] Moved top level config option to
clusterConfig(#326). - [BREAKING] Support specifying Service type.
This enables us to later switch non-breaking to using
ListenerClassesfor the exposure of Services. This change is breaking, because - for security reasons - we default to thecluster-internalListenerClass. If you need your cluster to be accessible from outside of Kubernetes you need to setclusterConfig.listenerClasstoexternal-unstable(#340). operator-rs0.36.0->0.40.2(#326, #337, #341, #342).- Use
build_rbac_resourcesfrom operator-rs (#342).
- Avoid empty log events dated to 1970-01-01 and improve the precision of the log event timestamps (#341).
- Removed the
--debugflag for HDFS container start up (#332).
- [BREAKING] Use Product image selection instead of version.
spec.versionhas been replaced byspec.image(#281). - Updated stackable image versions (#271).
- Fix the previously ignored node selector on role groups (#286).
operator-rs0.25.2->0.30.2(#276, #286, #290).- Replaced
thiserrorwithsnafu(#290).
operator-rs0.24.0->0.25.2(#249).
- Set specified resource request and limit on namenode main container (#259).
- Include chart name when installing with a custom release name (#205).
- Added OpenShift compatibility (#225).
- Add recommended labels to NodePort services (#240).
- The possibility to specify
configOverridesandenvOverrides(#122). - Reconciliation errors are now reported as Kubernetes events (#130).
- Use cli argument
watch-namespace/ env varWATCH_NAMESPACEto specify a single namespace to watch (#134). - Config builder for
hdfs-site.xmlandcore-site.xml(#150). - Discovery configmap that exposes the namenode services for clients to connect (#150).
- Documented service discovery for namenodes (#150).
- Publish warning events when role replicas don't meet certain minimum requirements (#162).
- PVCs for data storage, cpu and memory limits are now configurable (#164).
- Fix environment variable names according to https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons (#164).
operator-rs0.10.0->0.15.0(#130, #134, #148).HADOOP_OPTSfor jmx exporter specified toHADOOP_NAMENODE_OPTS,HADOOP_DATANODE_OPTSandHADOOP_JOURNALNODE_OPTSto fix cli tool (#148).- [BREAKING] Specifying the product version has been changed to adhere to ADR018 instead of just specifying the product version you will now have to add the Stackable image version as well, so
version: 3.5.8becomes (for example)version: 3.5.8-stackable0.1.0(#180)
- Monitoring scraping label
prometheus.io/scrape: true(#104).
- Complete rewrite to use
StatefulSets,hostPathvolumes and the Kubernetes overlay network. (#68) operator-rs0.9.0→0.10.0(#104).
operator-rs0.3.0→0.4.0(#20).- Adapted pod image and container command to docker image (#20).
- Adapted documentation to represent new workflow with docker images (#20).
- Switched to operator-rs tag 0.3.0 (#13)