Because we cannot rely on production data in this example, the program sets up some - * Kafka-backed tables with data during the {@code setup} phase. + * An example that illustrates how to structure, test, and deploy a table program for production use + * in a CI/CD pipeline. * - *
Afterward, the program can operate in two modes: one for integration testing ({@code test} - * phase) and one for deployment ({@code deploy} phase). + *
The example separates two concerns that often end up entangled: * - *
A CI/CD workflow could execute the following: + *
- * export EXAMPLE_JAR=./target/flink-table-api-java-examples-1.0.jar - * export EXAMPLE_CLASS=io.confluent.flink.examples.table.Example_08_IntegrationAndDeployment - * java -jar $EXAMPLE_JAR $EXAMPLE_CLASS setup - * java -jar $EXAMPLE_JAR $EXAMPLE_CLASS test - * java -jar $EXAMPLE_JAR $EXAMPLE_CLASS deploy - *+ *
The program is configured with {@link ConfluentSettings#fromArgs(String[])}, which reads + * configuration from command-line arguments, with environment variables as a fallback. This also + * enables the plugin's built-in CI/CD lifecycle actions: when the JAR is run with an action as the + * first argument ({@code list}, {@code describe}, {@code resume}, {@code stop}, or {@code delete}), + * the plugin executes the action and exits before the deployment logic runs, so the same JAR both + * deploys and manages (see {@code .github/workflows-examples/manage.yml}). * - *
NOTE: This example requires write access to a Kafka cluster. Fill out the given variables - * below with target catalog/database if this is fine for you. + *
The statement name and application name are deployment configuration, not source constants: + * the pipeline provides them via {@code --statement-name} / {@code --application-name}, which keeps + * a single source of truth for both deploy and management and is required for the lifecycle actions + * (they read the name at startup, before {@code main()} runs). A program that submits several + * statements instead names each one in code via {@link + * ConfluentTools#setStatementName(TableEnvironment, String)}. * - *
ALSO NOTE: The example submits an unbounded background statement. Make sure to stop the - * statement in the Web UI afterward to clean up resources. + *
Re-running the deployment with unchanged code is idempotent. When the pipeline changed, pass + * {@code --on-conflict replace} to replace the existing statement; see the README's CI/CD section + * for what that means for stateful pipelines. The README also covers configuration, environment + * promotion, and the full set of workflow steps. * - *
The complete CI/CD workflow performs the following steps: + *
NOTE: This example requires write access to a Kafka cluster, configured via the environment + * variables TARGET_CATALOG (environment name) and TARGET_DATABASE (Kafka cluster name). * - *
ALSO NOTE: The example submits an unbounded background statement. Use the lifecycle actions + * (see {@code .github/workflows-examples/manage.yml}) or the Web UI to stop and delete the + * statement afterward to clean up resources. */ public class Example_08_IntegrationAndDeployment { - // Fill this with an environment you have write access to - static final String TARGET_CATALOG = ""; - - // Fill this with a Kafka cluster you have write access to - static final String TARGET_DATABASE = ""; - - // Fill this with names of the Kafka Topics you want to create - static final String SOURCE_TABLE = "ProductsMock"; + // Name of the table that stores the results static final String TARGET_TABLE = "VendorsPerBrand"; - // The following SQL will be tested on a finite subset of data before - // it gets deployed to production. - // In production, it will run on unbounded input. - // The '%s' parameterizes the SQL for testing. - static final String SQL = - "SELECT brand, COUNT(*) AS vendors FROM ProductsMock %s GROUP BY brand"; - - // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. - public static void main(String[] args) throws Exception { - if (args.length == 0) { - throw new IllegalArgumentException( - "No mode specified. Possible values are 'setup', 'test', or 'deploy'."); - } - - EnvironmentSettings settings = ConfluentSettings.fromResource("/cloud.properties"); - TableEnvironment env = TableEnvironment.create(settings); - env.useCatalog(TARGET_CATALOG); - env.useDatabase(TARGET_DATABASE); - - String mode = args[0]; - switch (mode) { - case "setup": - setupProgram(env); - break; - case "test": - testProgram(env); - break; - case "deploy": - deployProgram(env); - break; - default: - throw new IllegalArgumentException("Unknown mode: " + mode); + /** + * The pipeline logic under test: counts the number of vendors per brand. + * + *
This class must not reference any {@code io.confluent.flink.plugin} classes so that unit
+ * tests can run it on Apache Flink without the plugin on the classpath.
+ */
+ public static class VendorsPerBrand {
+ public static Table buildPipeline(Table products) {
+ return products.groupBy($("brand")).select($("brand"), lit(1).count().as("vendors"));
}
}
- // --------------------------------------------------------------------------------------------
- // Setup Phase
- // --------------------------------------------------------------------------------------------
+ // The main() method performs the deployment, unless an action argument is present, in which
+ // case ConfluentSettings.fromArgs(...) below executes that action and exits before the rest
+ // of this method runs.
+ public static void main(String[] args) {
+ // In GitHub Actions, the connection variables map naturally to repository or environment
+ // secrets; see the README for the full list.
+ EnvironmentSettings settings = ConfluentSettings.fromArgs(args);
+ TableEnvironment env = TableEnvironment.create(settings);
- private static void setupProgram(TableEnvironment env) throws Exception {
- System.out.println("Running setup...");
+ env.useCatalog(requireEnv("TARGET_CATALOG"));
+ env.useDatabase(requireEnv("TARGET_DATABASE"));
- System.out.println("Creating table..." + SOURCE_TABLE);
- // Create a mock table that has exactly the same schema as the example `products` table.
- // The LIKE clause is very convenient for this task which is why we use SQL here.
- // Since we use little data, a bucket of 1 is important to satisfy the `scan.bounded.mode`
- // during testing.
+ System.out.println("Creating table... " + TARGET_TABLE);
+ // The pipeline owns its output table and creates it on the first deployment.
env.executeSql(
String.format(
"CREATE TABLE IF NOT EXISTS `%s`\n"
- + "DISTRIBUTED INTO 1 BUCKETS\n"
- + "LIKE `examples`.`marketplace`.`products` (EXCLUDING OPTIONS)",
- SOURCE_TABLE));
-
- System.out.println("Start filling table...");
- // Let Flink copy generated data into the mock table. Note that the statement is unbounded
- // and submitted as a background statement by default.
- TableResult pipelineResult =
- env.from("`examples`.`marketplace`.`products`")
- .select(withAllColumns())
- .insertInto(SOURCE_TABLE)
- .execute();
-
- System.out.println("Waiting for at least 200 elements in table...");
- // We start a second Flink statement for monitoring how the copying progresses
- TableResult countResult = env.from(SOURCE_TABLE).select(lit(1).count()).as("c").execute();
- // This waits for the condition to be met:
- try (CloseableIterator While the unit tests (see {@code Example_08_IntegrationAndDeploymentTest}) verify the logic
+ * locally, these tests verify it on the real service: the exact Confluent SQL semantics, the
+ * Confluent catalog, and Kafka-backed tables.
+ *
+ * The tests run during {@code ./mvnw verify} and fail fast when the required environment
+ * variables are not set, so a CI pipeline cannot silently skip its verification step and still
+ * report success. Builds without Confluent Cloud credentials skip them with {@code ./mvnw verify
+ * -DskipITs}. They require the standard connection variables (see the README's "Via Environment
+ * Variables" section) plus TARGET_CATALOG and TARGET_DATABASE pointing to an environment and Kafka
+ * cluster with write access.
+ *
+ * Because we cannot rely on production data in this example, the test fixture creates a mock
+ * Kafka-backed table and fills it with data from the marketplace examples table. Dynamic options
+ * then make the table bounded, so the pipeline terminates and its result can be asserted.
+ *
+ * NOTE: Running from the IDE needs the opposite classpath exclusion from the unit tests (the
+ * {@code flink-table-planner-loader} JAR) plus the environment variables in the run configuration;
+ * see the README's testing section.
+ */
+class Example_08_IntegrationAndDeploymentIT {
+
+ // Name of the mock Kafka topic that emulates the production input
+ static final String SOURCE_TABLE = "ProductsMock";
+
+ static TableEnvironment env;
+
+ @BeforeAll
+ // The timeout runs the setup in a separate thread so that it can be interrupted even while
+ // blocked on statement results, e.g. when the compute pool has no capacity for the fill
+ // statement. Without it, a stuck setup would hang until the CI job timeout.
+ @Timeout(value = 15, unit = TimeUnit.MINUTES, threadMode = Timeout.ThreadMode.SEPARATE_THREAD)
+ static void setUpMockTable() throws Exception {
+ requireEnvironment();
+ env = TableEnvironment.create(ConfluentSettings.fromGlobalVariables());
+ env.useCatalog(System.getenv("TARGET_CATALOG"));
+ env.useDatabase(System.getenv("TARGET_DATABASE"));
+
+ System.out.println("Creating table... " + SOURCE_TABLE);
+ // Create a mock table that has exactly the same schema as the example `products` table.
+ // The LIKE clause is very convenient for this task which is why we use SQL here.
+ // Since we use little data, a bucket of 1 is important to satisfy the
+ // `scan.bounded.mode` during testing.
+ env.executeSql(
+ String.format(
+ "CREATE TABLE IF NOT EXISTS `%s`\n"
+ + "DISTRIBUTED INTO 1 BUCKETS\n"
+ + "LIKE `examples`.`marketplace`.`products` (EXCLUDING OPTIONS)",
+ SOURCE_TABLE));
+
+ System.out.println("Start filling table...");
+ // Let Flink copy generated data into the mock table. Note that the statement is
+ // unbounded and submitted as a background statement by default.
+ TableResult pipelineResult =
+ env.from("`examples`.`marketplace`.`products`")
+ .select(withAllColumns())
+ .insertInto(SOURCE_TABLE)
+ .execute();
+
+ long count = 0;
+ try {
+ System.out.println("Waiting for at least 200 elements in table...");
+ // A second Flink statement monitors how the copying progresses. The foreground
+ // statement is stopped automatically when its iterator is closed.
+ TableResult countResult =
+ env.from(SOURCE_TABLE).select(lit(1).count()).as("c").execute();
+ try (CloseableIterator These tests run entirely locally on Apache Flink with mock data from {@code fromValues()}. No
+ * Confluent Cloud connectivity, credentials, or compute pool are required, which makes them
+ * suitable for fast feedback during development and for CI runs on pull requests.
+ *
+ * NOTE: The Confluent plugin and the Apache Flink planner cannot share a runtime classpath (both
+ * register Executor and Planner factories under the identifier 'default'), so these tests must be
+ * executed via {@code ./mvnw test}, where the surefire configuration excludes the plugin. Running
+ * them directly from the IDE fails with "Multiple factories for identifier 'default'"; see the
+ * README's testing section for the IDE run-configuration setup.
+ *
+ * ALSO NOTE: Running locally on Apache Flink is not identical to Confluent Cloud.
+ * Confluent-specific features such as the {@code $rowtime} system column, the Confluent catalog,
+ * and Confluent SQL extensions are not available locally. Use the integration tests (see {@code
+ * Example_08_IntegrationAndDeploymentIT}) to verify behavior against the real service.
+ */
+class Example_08_IntegrationAndDeploymentTest {
+
+ private static Table mockProducts(TableEnvironment env) {
+ return env.fromValues(
+ DataTypes.ROW(
+ DataTypes.FIELD("name", DataTypes.STRING()),
+ DataTypes.FIELD("brand", DataTypes.STRING())),
+ row("MacBook", "Apple"),
+ row("iPhone", "Apple"),
+ row("Galaxy", "Samsung"));
+ }
+
+ @Test
+ void countsVendorsPerBrandInBatchMode() throws Exception {
+ // Batch mode computes the final result over the finite mock data, which makes
+ // assertions straightforward.
+ TableEnvironment env = TableEnvironment.create(EnvironmentSettings.inBatchMode());
+
+ Table result =
+ Example_08_IntegrationAndDeployment.VendorsPerBrand.buildPipeline(
+ mockProducts(env));
+
+ assertThat(collectRows(result))
+ .containsExactlyInAnyOrder(Row.of("Apple", 2L), Row.of("Samsung", 1L));
+ }
+
+ @Test
+ void countsVendorsPerBrandInStreamingMode() throws Exception {
+ // Streaming mode emits a changelog: an insert for the first product of a brand,
+ // followed by update_before/update_after pairs as more products arrive. This mirrors
+ // how the statement behaves on Confluent Cloud.
+ TableEnvironment env = TableEnvironment.create(EnvironmentSettings.inStreamingMode());
+
+ Table result =
+ Example_08_IntegrationAndDeployment.VendorsPerBrand.buildPipeline(
+ mockProducts(env));
+
+ List UDFs are plain Java classes, so their logic can be tested with JUnit alone: no Apache Flink,
+ * no Confluent Cloud connectivity, and no artifact upload are required. This is the fastest test
+ * tier and should cover the bulk of a UDF's business logic before it is registered and exercised on
+ * Confluent Cloud.
+ */
+class Example_09_FunctionsTest {
+
+ @Test
+ void customTaxReturnsRatePerLocation() {
+ Example_09_Functions.CustomTax tax = new Example_09_Functions.CustomTax();
+
+ assertThat(tax.eval("USA")).isEqualTo(10);
+ assertThat(tax.eval("EU")).isEqualTo(5);
+ assertThat(tax.eval("Mars")).isEqualTo(0);
+ }
+
+ @Test
+ void explodeEmitsOneRowPerElement() {
+ Example_09_Functions.Explode explode = new Example_09_Functions.Explode();
+
+ // Table functions emit rows via a collector, which tests can replace with a list
+ List