Using Liquibase with Databricks SQL
A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.
The lakehouse architecture and Databricks SQL bring cloud data warehousing capabilities to your data lakes. Using familiar data structures, relations, and management tools, you can model a highly-performant, cost-effective data warehouse that runs directly on your data lake.
For more information on Databricks, see the Databricks website.
Prerequisites
Set up Liquibase
- Install Java.
- Dive into Liquibase concepts with an Introduction to Liquibase.
- Download and install Liquibase on your machine.
- (Liquibase Pro users only) Enable Liquibase Pro capabilities. To apply a Liquibase Pro key to your project, add the following property to the Liquibase properties file:
liquibase.licenseKey: <paste key here>
Set up Databricks
- Create a Databricks account and workspace. If you don't already have a Databricks account and workspace, follow the Databricks Getting Started instructions.
- Navigate to your Workspaces tab and click the Open Workspace button in the upper right of the page.
- Create a SQL Warehouse. If you don't have a SQL Warehouse set up, follow the Databricks instructions on Creating a SQL Warehouse.
- Create a catalog. If you don't already have a catalog setup, follow the Databricks instructions on Create and Manage Catalogs.
- Click the SQL Editor option in the left navigation, enter your SQL to create your database (also called a schema), and click the Run button:
CREATE DATABASE IF NOT EXISTS <catalog_name>.<database_name>;
Your database is configured and ready to use.
Install Drivers
All Users
To use Databricks with Liquibase, you need to install two additional JAR files.
- Download the JAR files:
- Download the Databricks JDBC driver (
DatabricksJDBC42-<version>.zip
) from driver download site and unzip the folder to locate theDatabricksJDBC42.jar
file. - Download the Liquibase Databricks extension (
liquibase-databricks-<version>.jar
) from the GitHub "Assets" listed at the end of the release notes.
- Download the Databricks JDBC driver (
- Place your JAR files in the
<liquibase_install_dir>/lib
directory:DatabricksJDBC42.jar
liquibase-databricks-<version>.jar
Note: If you are running your project on MacOS or Linux, you might need to run the following command in your terminal (you can add it to your Bash profile as well) to allow the dependencies to work properly: export JAVA_OPTS=--add-opens=java.base/java.nio=ALL-UNNAMED
Maven Users
If you use Maven, note that this database does not provide its driver JAR on a public Maven repository, so you must install a local copy and add it as a dependency to your pom.xml
file:
<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-jdbc</artifactId>
<version>[2.6.36,)</version>
</dependency>
<dependency>
<groupId>org.liquibase.ext</groupId>
<artifactId>liquibase-databricks</artifactId>
<version>[1.1.4,)</version>
</dependency>
Verify Installation
Run the following command to confirm you have successfully installed everything:
liquibase --version
Review the libraries listing output for the two newly installed jar files: DatabricksJDBC42-<version>.zip
and liquibase-databricks-<version>.jar
.
Database Connection
Configure Connection
-
Specify the database JDBC URL in the
liquibase.properties
file (defaults file), along with other properties you want to set a default value for. Liquibase does not parse the URL.liquibase.command.url: jdbc:databricks://<your_workspace_host_name>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<your_warehouse_id>;ConnCatalog=<your_catalog>;ConnSchema=<your_schema>;
Note: Your base JDBC connection string can be found on the SQL Warehouses -> your_warehouse -> Connection details tab.
Note: Additional information on specifying the Databricks JDBC connection can be found in the Databricks JDBC Driver documentation.
- Specify your username and password in the
liquibase.properties
file (defaults file):- The username, in our case is just "token" for the User or Service Principal you want to manage Liquibase.
- This is the token for the User or Service Principal we want to authenticate. This is usually passed in dynamically using frameworks like GitActions + Secrets.
# Enter the username for your Target database. liquibase.command.username: token
# Enter the password for your Target database. liquibase.command.password: <your_token_here>
Tip: To find or set up your Databricks user token, first log into your Databricks workspace. Then select Settings > User > Developer > Access Token > "Manage".
Test Connection
-
Create a text file called
changelog
(.sql
,.yaml
,.json
, or.xml
) in your project directory and add a changeset.If you already created a changelog using the
init project
command, you can use that instead of creating a new file. When adding onto an existing changelog, be sure to only add the changeset and to not duplicate the changelog header. - Navigate to your project folder in the CLI and run the Liquibase
status
command to see whether the connection is successful: - Inspect the deployment SQL with the
update-sql
command: - Then execute these changes to your database with the
update
command: - From a database UI tool, ensure that your database contains the
test_table
object you added along with the DATABASECHANGELOG table and DATABASECHANGELOGLOCK table.
--liquibase formatted sql
--changeset your.name:1
CREATE TABLE test_table (test_id INT NOT NULL, test_column INT, PRIMARY KEY (test_id))
Tip: Formatted SQL changelogs generated from Liquibase versions before 4.2.0 might cause issues because of the lack of space after a double dash ( --
). To fix this, add a space after the double dash. For example: -- liquibase formatted sql
instead of --liquibase formatted sql
and -- changeset myname:create-table
instead of --changeset myname:create-table
.
databaseChangeLog:
- changeSet:
id: 1
author: your.name
changes:
- createTable:
tableName: test_table
columns:
- column:
name: test_id
type: INT
constraints:
primaryKey: true
nullable: false
- column:
name: test_column
type: INT
{
"databaseChangeLog": [
{
"changeSet": {
"id": "1",
"author": "your.name",
"changes": [
{
"createTable": {
"tableName": "test_table",
"columns": [
{
"column": {
"name": "test_id",
"type": "INT",
"constraints": {
"primaryKey": true,
"nullable": false
}
}
},
{
"column": {
"name": "test_column",
"type": "INT"
}
}
]
}
}
]
}
}
]
}
<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
xmlns:pro="http://www.liquibase.org/xml/ns/pro"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
http://www.liquibase.org/xml/ns/dbchangelog-ext
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
http://www.liquibase.org/xml/ns/pro
http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
<changeSet id="1" author="your.name">
<createTable tableName="test_table">
<column name="test_id" type="int">
<constraints primaryKey="true" nullable="false" />
</column>
<column name="test_column" type="int"/>
</createTable>
</changeSet>
</databaseChangeLog>
liquibase status --username=test --password=test --changelog-file=<changelog.xml>
Note: You can specify arguments in the CLI or keep them in the Liquibase properties file.
If your connection is successful, you'll see a message like this:
4 changesets have not been applied to <your_connection_url>
Liquibase command 'status' was executed successfully.
liquibase update-sql --changelog-file=<changelog.xml>
If the SQL that Liquibase generates isn't what you expect, you should review your changelog file and make any necessary adjustments.
liquibase update --changelog-file=<changelog.xml>
If your update
is successful, Liquibase runs each changeset and displays a summary message ending with:
Liquibase: Update has been successful.
Liquibase command 'update' was executed successfully.
Now you're ready to start making deployments with Liquibase!
Troubleshooting
Missing SSL certificate: [Databricks][JDBCDriver](500593)
If you use Java 1.8 or earlier, you may receive this error message connecting Liquibase to Databricks:
Connection could not be created to jdbc:databricks://...; with driver
com.databricks.client.jdbc.Driver.
[Databricks][JDBCDriver](500593) Communication link failure. Failed to connect to server.
Reason: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException:
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target.
You are receiving this error message because the SSL certificate is not available in that version of Java. To resolve this, upgrade Java to a more recent version.
Incomplete JDBC URL: [Databricks][DatabricksJDBCDriver](500540)
If you use v1.1.3 of the Liquibase Databricks extension, you may receive this error running Liquibase:
Unexpected error running Liquibase:
Error executing SQL SELECT MD5SUM FROM main.default.DATABASECHANGELOG WHERE MD5SUM IS NOT NULL: [Databricks][JDBCDriver](500540) Error caught in BackgroundFetcher. Foreground thread ID: 1. Background thread ID: 20.
Error caught: Could not initialize class com.databricks.client.jdbc42.internal.apache.arrow.memory.util.MemoryUtil.
To resolve this, append ;UserAgentEntry=Liquibase;EnableArrow=0;
to your JDBC URL. For example:
jdbc:databricks://<host>:<port>/<schema>;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<warehouse>;ConnCatalog=<catalog>;UserAgentEntry=Liquibase;EnableArrow=0;