Using Liquibase with Databricks Platforms

Databricks SQL is a fully managed analytics data warehouse that enables users to run SQL queries on large datasets stored in cloud-based data lakes. It is designed for data analysts and engineers who need to perform data exploration, transformation, and analysis. For more information, see Databricks SQL Documentation.

You can deploy most standard Liquibase Change Types to Databricks. Liquibase supports Databricks-specific functionality with some additional Change Types available in the Liquibase extension for Databricks. For a list of supported features, see the "Features" section.

Liquibase offers two extensions for Databricks: a commercial (Pro) extension for Liquibase Pro users and an open-source (OSS) extension for Liquibase Open Source users. The Databricks Pro extension contains all functionality in the Databricks OSS extension, as well as many additional features. To access Liquibase Pro features, you need a Liquibase Pro license key.

Getting started tutorial

To learn how to install, configure, and use the Liquibase Databricks extension with your instance of Databricks, see Using Liquibase with Databricks SQL. This page contains driver download links, permissions guidance, and a sample changelog to use for a test deployment.

Features

Supported Liquibase Pro features include:

Policy Checks: automatically analyze your changelogs for desired format and behavior to increase deployment success rates and uphold security best practices
Secrets Management: keep your authentication data secure by integrating with third-party secrets vaults
Structured Logging: improve your database observability by easily reading Liquibase data in your favorite analytics tool
Operation Reports: generate reports of operations you perform on your database
Flow Files: create repeatable, portable, and platform-independent Liquibase workflows to run in any of your CI/CD tools
DATABASECHANGELOGHISTORY table (DBCLH): record a history of all changes you make to the database, including changes that are not tracked by the DATABASECHANGELOG table (DBCL)
Remote file access: centralize file management with AWS S3 to build a reusable repository of Liquibase files you can update and retrieve
Targeted rollback: avoid collateral damage by specifying which changesets in your changelog to undo
Stored Logic: capture stored logic objects like functions

Supported Change Types

Databricks-specific Change Types available with the Liquibase Open Source Databricks extension and Liquibase Pro Databricks extension:

alterCluster: alter a clustered table
alterTableProperties: alter existing properties on a table
alterViewProperties: alter existing properties on a view
analyzeTable: analyze a table to improve query performance
optimizeTable: optimize table layout to improve performance
vacuumTable: remove unused files in a table directory

Databricks-specific Change Types that require the Liquibase Pro Databricks extension:

alterVolume: alter a volume
cloneTable: creates a table that clones the metadata and/or row data an existing table
createVolume: create a volume
dropVolume: drop a volume
restoreTable: restore a table to an earlier state
syncIdentityColumn: synchronize identity column metadata with actual data

Liquibase Open Source Change Types that accept Databricks-specific attributes or sub-tags:

createTable: create a table
- Databricks sub-tag: extendedTableProperties: specify additional properties on a table, including clusters and partitions
createView: create a view on a Databricks table
- Databricks attribute: tblProperties: similar to extendedTableProperties, but for a view

For more information, see Liquibase Change Types for Databricks.

Liquibase Change Types:

addColumn
addForeignKeyConstraint
addLookupTable
addNotNullConstraint
addPrimaryKey
createFunction (requires Liquibase Pro extension)
- Both SQL and Python functions are supported
createTable
delete
dropAllForeignKeyConstraints
dropColumn
dropForeignKeyConstraint
dropFunction (requires Liquibase Pro extension)
dropNotNullConstraint
dropPrimaryKey
dropTable
executeCommand
insert
loadData
loadUpdateData
mergeColumns
modifyDataType
modifySql
renameColumn
renameTable
setColumnRemarks
setTableRemarks
sql
sqlFile

Supported parameters

You can use all normal Liquibase Parameters with Databricks. The Liquibase Pro extension for Databricks offers additional parameters for more advanced functionality.

Note: The following parameters, including databricks-catalog and databricks-schema, only work with OIDC authentication. They will not work for token authentication.

To configure Databricks-specific behavior, set the following Liquibase parameters in the CLI, in flow files, in your liquibase.properties file, or as environment variables:

For more information, see Liquibase Parameters for Databricks.

Supported commands

You can use the Liquibase CLI commands or Liquibase Maven goals with Databricks:

Data type handling

For general information about how Liquibase handles data types in Databricks, see Liquibase Data Type Handling. However, be aware that Liquibase does not support the change types listed in the Limitations section.

Limitations

Liquibase does not support the followingChange Types:
You can deploy changes to Databricks using formatted SQL, YAML, JSON, and XML changelogs, but not unformatted SQL changelogs. To write SQL directly, you can use the sql and sqlFile Change Types.
Liquibase does not include the values of tableFormat and partitionColumns in files generated by the database inspection commands diff-changelog and generate-changelog. Databricks cannot add new partition columns to an existing table, nor can it change the table's format or location. Running these attributes on existing tables would cause failures, so Liquibase omits them from changesets it generates.
- However, the outputs of diff and snapshot do include tableFormat and partitionColumns.
Liquibase does not include all auto-increment (identity) information in files generated by diff-changelog. This information is omitted when a table has the same name in both the source and target databases, but different values for autoIncrement. Also, the addAutoIncrement Change Type is not supported for Databricks.
- However, files generated by the diff command do include auto-increment information even when the tables have different values for autoIncrement.
Databricks does not support directly adding a new column with a default value to a table that already contains columns with default values. In Liquibase Databricks 1.4.0, you cannot use the addDefaultValue Change Type in an YAML, JSON, or XML changelog to add a default value to an existing Databricks table column. Instead, you must specify the SQL queries for addDefaultValue in a Formatted SQL changelog or use the sql or sqlFile Change Types in a YAML, JSON, or XML changelog. For syntax examples, see the "Troubleshooting" section of the addDefaultValue page.

Verified database versions

Liquibase Pro Databricks extension

Liquibase Databricks Pro extension version	Liquibase Pro version required	Databricks SQL versions verified
1.0.0	Liquibase Pro 4.31.0+ (requires Liquibase Pro license key)	Cloud AWS Azure Google cloud platforms

Liquibase Databricks Pro extension version

Liquibase Pro version required

Databricks SQL versions verified

1.0.0

Liquibase Pro 4.31.0+ (requires Liquibase Pro license key)

Cloud

AWS
Azure
Google cloud platforms

Note: The Liquibase Pro Databricks extension also contains the open-source extension binary.

Liquibase Open Source Databricks extension

Liquibase Databricks Open Source extension version	Liquibase version required	Databricks SQL versions verified
Liquibase Open Source Databricks 1.4.1 release notes	4.31.0+	Cloud AWS Azure Google cloud platforms
Liquibase Open Source Databricks 1.4.0 Release Notes	4.30.0+
Liquibase Open Source Databricks 1.3.0 Release Notes	4.29.1+
Liquibase Open Source Databricks 1.2.0 Release Notes	4.27.0+
Liquibase Open Source Databricks 1.1.3 Release Notes	4.26.0+
1.1.0–1.1.2	4.25.0+
1.0.0–1.0.1	4.23.2+

For release notes, see Liquibase Open Source Databricks Extension Release Notes.