Using Liquibase with Databricks SQL

A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.

The lakehouse architecture and Databricks SQL bring cloud data warehousing capabilities to your data lakes. Using familiar data structures, relations, and management tools, you can model a highly-performant, cost-effective data warehouse that runs directly on your data lake.

For more information on Databricks, see the Databricks website.

Prerequisites

Set up Liquibase

  1. Install Java.
  2. Dive into Liquibase concepts with an Introduction to Liquibase.
  3. Download and install Liquibase on your machine.
  4. (Liquibase Pro users only) Enable Liquibase Pro capabilities. To apply a Liquibase Pro key to your project, add the following property to the Liquibase properties file:
  5. liquibase.licenseKey: <paste key here>

Set up Databricks

  1. Create a Databricks account and workspace.

    If you don't already have a Databricks account and workspace, follow the Databricks Getting Started instructions.

  2. Navigate to your Workspaces tab and select Open Workspace in the upper right of the page.
  3. 'Workspaces' tab contents showing authentication, storage, and connection info about the current workspace

  4. Create a SQL Warehouse.

    If you don't have a SQL Warehouse set up, follow the Databricks instructions on Creating a SQL Warehouse.

  5. Create a catalog.

    If you don't already have a catalog setup, follow the Databricks instructions on Create and Manage Catalogs.

  6. In the left navigation, select SQL Editor .

    Enter your SQL to create your database (also called a schema), and select Run.

  7. CREATE DATABASE IF NOT EXISTS <catalog_name>.<database_name>;

    'SQL Editor' tab contents showing the SQL query to create a new database

Your database is configured and ready to use.

Install drivers

Databricks Pro Extension users

Download the Liquibase Pro Databricks extension (liquibase-commercial-databricks-<version>.jar) from Maven Central.

You do not need need to install a separate JDBC driver. We have included one in liquibase-commercial-databricks.jar.

However, to avoid conflicts between different JDBC driver versions, we recommend that you delete the DatabricksJDBC42.jar or databricks-jdbc-<version>.jar from the Liquibase classpath(liquibase/lib, liquibase/internal/lib liquibase/internal/extensions folders).

Note: If you are running your project on MacOS or Linux, you might need to run the following command in your terminal (you can add it to your Bash profile as well) to allow the dependencies to work properly: export JAVA_OPTS=--add-opens=java.base/java.nio=ALL-UNNAMED

Databricks Open Source Extension users

  1. Download the JAR files:
  2. Place your JAR files in the <liquibase_install_dir>/lib directory:
    • DatabricksJDBC42.jar
    • liquibase-databricks-<version>.jar

Note: If you are running your project on MacOS or Linux, you might need to run the following command in your terminal (you can add it to your Bash profile as well) to allow the dependencies to work properly: export JAVA_OPTS=--add-opens=java.base/java.nio=ALL-UNNAMED

Maven Users

If you use Maven, note that this database does not provide its driver JAR on a public Maven repository, so you must install a local copy and add it as a dependency to your pom.xml file:

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>databricks-jdbc</artifactId>
    <version>[2.7.1,)</version>
</dependency>

<!--Only if you use the OSS extension-->
<dependency>
    <groupId>org.liquibase.ext</groupId>
    <artifactId>liquibase-databricks</artifactId>
    <version>[1.4.1,)</version>
</dependency>

<!--Only if you use the Pro extension-->
<dependency>
    <groupId>org.liquibase.ext</groupId>
    <artifactId>liquibase-commercial-databricks</artifactId>
    <version>[1.0.0,)</version>
</dependency>

Verify Installation

Run the following command to confirm you have successfully installed everything:

liquibase --version

Review the libraries listing output for the two newly installed jar files: DatabricksJDBC42-<version>.zip and liquibase-databricks-<version>.jar.

Liquibase console output showing that the correct JAR files are installed

Database Connection

Configure Connection

  1. Specify the database URL in the liquibase.properties file (defaults file), along with other properties you want to set a default value for. Liquibase does not parse the URL. You can either specify the full database connection string or specify the URL using your database's standard connection format:

    liquibase.command.url: jdbc:databricks://<server_hostname>:443;AuthMech=3;httpPath=/sql/1.0/warehouses/<your_warehouse_id>;ConnCatalog=<your_catalog>;ConnSchema=<your_schema>;

    Your base JDBC connection string can be found on the SQL Warehouses -> your_warehouse -> Connection details tab. For more information, see Databricks JDBC Driver.

    Note: Starting with Databricks JDBC driver version 2.7.1, which was included in databricks-commercial 1.0.0, username/token authentication is no longer supported. If a token is included in the URL, configure the URl to contain UID=token. Before 2.7.1 it was possible to specify username/email in liquibase.command.username property and token in password property, now it’s not allowed.

  2. Specify your username and password in the liquibase.properties file (defaults file):
    1. The username, in our case is just "token" for the User or Service Principal you want to manage Liquibase.
    2. # Enter the username for your Target database.
      liquibase.command.username: token
    3. This is the token for the User or Service Principal we want to authenticate. This is usually passed in dynamically using frameworks like GitActions + Secrets.
    4. # Enter the password for your Target database.
      liquibase.command.password: <your_token_here>
  3. Tip: To find or set up your Databricks user token, first log into your Databricks workspace. Then select Settings > User > Developer > Access Token > "Manage".

Note: The Liquibase Pro 1.0.0 extension for Databricks only supports OAuth M2M (machine-to-machine) authentication. Other OAuth authentication methods, including OAuth token support, are not supported. OAuth is not supported in the Liquibase Open Source extension.

  1. Configure your properties file, environment variables, or command line parameters in the following format:
  2. # Required parameters
    liquibase.command.url: jdbc:databricks://<your_workspace_host_name>:443
    liquibase.databricks.authMechanism=OAUTH
    liquibase.databricks.oauth.clientId={clientIdValue}
    liquibase.databricks.oauth.clientSecret={clientSecretValue}
    liquibase.databricks.httpPath={httpPath}
  3. (Optional) Specify the following parameters in your properties file, environment variables, or command line:
  4. # Optional parameters
    liquibase.databricks.schema={schemaName}
    liquibase.databricks.catalog={catalogName}
    liquibase.databricks.oauth.authFlow=1

    Note: If you specify liquibase.databricks.oauth.authMechanism=OAUTH, by default Liquibase sets AuthMech=11; and Auth_Flow=1; in your connection URL. For more information, see Authentication settings for the Databricks ODBC Driver.

    For more information about Databricks parameters, including alternative ways to specify them, see Liquibase Parameters for Databricks.

Test Connection

  1. Create a text file called changelog (.sql, .yaml, .json, or .xml) in your project directory and add a changeset.

    If you already created a changelog using the init project command, you can use that instead of creating a new file. When adding onto an existing changelog, be sure to only add the changeset and to not duplicate the changelog header.

  2. --liquibase formatted sql
    
    --changeset your.name:1
    CREATE TABLE test_table (test_id INT NOT NULL, test_column INT, PRIMARY KEY (test_id))

    Tip: Formatted SQL changelogs generated from Liquibase versions before 4.2.0 might cause issues because of the lack of space after a double dash ( -- ). To fix this, add a space after the double dash. For example: -- liquibase formatted sql instead of --liquibase formatted sql and -- changeset myname:create-table instead of --changeset myname:create-table.

    databaseChangeLog:
       - changeSet:
           id: 1
           author: your.name
           changes:
           - createTable:
               tableName: test_table
               columns:
               - column:
                   name: test_id
                   type: INT
                   constraints:
                       primaryKey:  true
                       nullable:  false
               - column:
                   name: test_column
                   type: INT
    {
      "databaseChangeLog": [
        {
          "changeSet": {
            "id": "1",
            "author": "your.name",
            "changes": [
              {
                "createTable": {
                  "tableName": "test_table",
                  "columns": [
                    {
                      "column": {
                        "name": "test_id",
                        "type": "INT",
                        "constraints": {
                          "primaryKey": true,
                          "nullable": false
                        }
                      }
                    },
                    {
                      "column": {
                        "name": "test_column",
                        "type": "INT"
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
    <?xml version="1.0" encoding="UTF-8"?>
    <databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
        xmlns:pro="http://www.liquibase.org/xml/ns/pro"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
            http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
            http://www.liquibase.org/xml/ns/dbchangelog-ext
            http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
            http://www.liquibase.org/xml/ns/pro
            http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    
        <changeSet id="1" author="your.name">
            <createTable tableName="test_table">
                <column name="test_id" type="int">
                    <constraints primaryKey="true" nullable="false" />
                </column>
                <column name="test_column" type="int"/>
            </createTable>
        </changeSet>
    
    </databaseChangeLog>
  3. Navigate to your project folder in the CLI and run the Liquibase status command to see whether the connection is successful:
  4. liquibase status --username=test --password=test --changelog-file=<changelog.xml>

    Note: You can specify arguments in the CLI or keep them in the Liquibase properties file.

    If your connection is successful, you'll see a message like this:

    4 changesets have not been applied to <your_connection_url>
    Liquibase command 'status' was executed successfully.

    If you receive this error, the version of Java that you're using doesn't include the required SSL certificate.

    Connection could not be created to jdbc:databricks://...; with driver com.databricks.client.jdbc.Driver. [Databricks][JDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.

    To resolve this error, upgrade Java to a more recent version.

  5. Inspect the deployment SQL with the update-sql command:
  6. liquibase update-sql --changelog-file=<changelog.xml>

    If the SQL that Liquibase generates isn't what you expect, you should review your changelog file and make any necessary adjustments.

  7. Then execute these changes to your database with the update command:
  8. liquibase update --changelog-file=<changelog.xml>

    If your update is successful, Liquibase runs each changeset and displays a summary message ending with:

    Liquibase: Update has been successful.
    Liquibase command 'update' was executed successfully.
  9. From a database UI tool, ensure that your database contains the test_table object you added along with the DATABASECHANGELOG table and DATABASECHANGELOGLOCK table.

Now you're ready to start making deployments with Liquibase!

Troubleshooting

Incomplete JDBC URL: [Databricks][DatabricksJDBCDriver](500540)

Condition

You've received this error message:

Unexpected error running Liquibase: 
Error executing SQL SELECT MD5SUM FROM main.default.DATABASECHANGELOG WHERE MD5SUM IS NOT NULL: [Databricks][JDBCDriver](500540) Error caught in BackgroundFetcher. Foreground thread ID: 1. Background thread ID: 20. 
Error caught: Could not initialize class com.databricks.client.jdbc42.internal.apache.arrow.memory.util.MemoryUtil.

Cause

If you use v1.1.3 of the Liquibase Open Source Databricks extension, you may receive this error running Liquibase

Remedy

To resolve this, append ;UserAgentEntry=Liquibase;EnableArrow=0; to your JDBC URL. For example, using username/password authentication:

jdbc:databricks://<host>:<port>/<schema>;AuthMech=3;httpPath=/sql/1.0/warehouses/<warehouse>;ConnCatalog=<catalog>;UserAgentEntry=Liquibase;EnableArrow=0;

Connection could not be created to <my _URL> with driver: [Databricks][JDBCDriver](500174)

Condition

You've received this error message:

ERROR: Exception Primary Reason: Connection could not be created to <MY_URL> with driver com.databricks.client.jdbc.Driver. [Databricks][JDBCDriver](500174) Connection property UID has invalid value of <your-email-address>. Valid values are: token.

Cause

If you're using Databricks JDBC driver 2.7.1 or higher, it's possible that token authentication support for your Databricks extension has changed.

Remedy

Verify which version of the Databricks driver you are using. If you're using Databricks Extension driver 2.7.1 or higher, specify username: token and password:<PAT> or specify UID=token;PWD=<PAT> in your URL to connect to Databricks.

Missing SSL certificate: [Databricks][JDBCDriver](500593)

Condition

You've received this error message:

Connection could not be created to jdbc:databricks://...; with driver 
com.databricks.client.jdbc.Driver.  

[Databricks][JDBCDriver](500593) Communication link failure. Failed to connect to server. 
Reason: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: 
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: 
unable to find valid certification path to requested target.

Cause

You are receiving this error message because the required SSL certificate is not available in Java versions before 1.8.

Remedy

To resolve this, upgrade Java to version 1.8 or higher.

Related links