Using Liquibase with Databricks Platforms
Databricks SQL is a fully managed analytics data warehouse that enables users to run SQL queries on large datasets stored in cloud-based data lakes. It is designed for data analysts and engineers who need to perform data exploration, transformation, and analysis. For more information, see Databricks SQL Documentation.
You can deploy most standard Liquibase Change Types to Databricks. Liquibase supports Databricks-specific functionality with some additional Change Types available in the Liquibase extension for Databricks. For a list of supported features, see the "Features" section.
Getting started tutorial
To learn how to install, configure, and use the Liquibase Databricks extension with your instance of Databricks, see Using Liquibase with Databricks SQL. This page contains driver download links, permissions guidance, and a sample changelog to use for a test deployment.
Verified database versions
Liquibase Databricks extension version | Liquibase version required | Databricks SQL versions verified |
---|---|---|
1.4.0 | 4.30.0+ |
|
1.3.0 | 4.29.1+ | |
1.2.0 | 4.27.0+ | |
1.1.3 | 4.26.0+ | |
1.1.0–1.1.2 |
4.25.0+ | |
1.0.0–1.0.1 | 4.23.2+ |
Features
Supported Liquibase Pro features include:
- Policy Checks: automatically analyze your changelogs for desired format and behavior to increase deployment success rates and uphold security best practices
- Secrets Management: keep your authentication data secure by integrating with third-party secrets vaults
- Structured Logging: improve your database observability by easily reading Liquibase data in your favorite analytics tool
- Operation Reports: generate reports of operations you perform on your database
- Flow Files: create repeatable, portable, and platform-independent Liquibase workflows to run in any of your CI/CD tools
- DATABASECHANGELOGHISTORY table (DBCLH): record a history of all changes you make to the database, including changes that are not tracked by the DATABASECHANGELOG table (DBCL)
- Remote file access: centralize file management with AWS S3 to build a reusable repository of Liquibase files you can update and retrieve
- Targeted rollback: avoid collateral damage by specifying which changesets in your changelog to undo
- Stored Logic: capture stored logic objects like procedures, functions, packages, and triggers
Supported Change Types
Databricks-specific Change Types available in Liquibase Open Source and Liquibase Pro:
alterCluster
: alter a clustered tablealterTableProperties
: alter existing properties on a tablealterViewProperties
: alter existing properties on a viewanalyzeTable
: analyze a table to improve query performanceoptimizeTable
: optimize table layout to improve performancevacuumTable
: remove unused files in a table directory
Liquibase Change Types that accept Databricks attributes or sub-tags:
createTable
: create a table- Databricks sub-tag:
extendedTableProperties
: specify additional properties on a table, including clusters and partitions
- Databricks sub-tag:
createView
: create a view on a Databricks table- Databricks attribute:
tblProperties
: similar toextendedTableProperties
, but for a view
- Databricks attribute:
Liquibase Change Types:
addColumn
addForeignKeyConstraint
addLookupTable
addNotNullConstraint
addPrimaryKey
createFunction
createProcedure
createTable
createView
delete
dropAllForeignKeyConstraints
dropColumn
dropForeignKeyConstraint
dropFunction
dropNotNullConstraint
dropPrimaryKey
dropProcedure
dropTable
dropView
executeCommand
insert
loadData
loadUpdateData
mergeColumns
modifyDataType
modifySql
renameColumn
renameTable
renameView
setColumnRemarks
setTableRemarks
sql
sqlFile
Supported commands
You can use the Liquibase CLI commands or Liquibase Maven goals with Databricks:
- Update Commands
- Rollback Commands
- Init Commands
- Change Tracking Commands
- Database Inspection Commands
- Utility Commands
- Flow Commands
- Policy Checks Commands
Data type handling
For information about how Liquibase handles data types in Databricks, see Liquibase Data Type Handling.
Limitations
- You can deploy changes to Databricks using YAML, JSON, and XML changelogs, but not SQL changelogs. To write SQL directly, you can use the
sql
andsqlFile
Change Types. - Liquibase does not include the values of
tableFormat
andpartitionColumns
in files generated by the database inspection commandsdiff-changelog
andgenerate-changelog
. Databricks cannot add new partition columns to an existing table, nor can it change the table's format or location. Running these attributes on existing tables would cause failures, so Liquibase omits them from changesets it generates. - Liquibase does not include all auto-increment (identity) information in files generated by
diff-changelog
. This information is omitted when a table has the same name in both the source and target databases, but different values forautoIncrement
. Also, theaddAutoIncrement
Change Type is not supported for Databricks.- However, files generated by the
diff
command do include auto-increment information even when the tables have different values forautoIncrement
.
- However, files generated by the
- Databricks does not support directly adding a new column with a default value to a table that already contains columns with default values. In Liquibase Databricks 1.4.0, you cannot use the
addDefaultValue
Change Type in an YAML, JSON, or XML changelog to add a default value to an existing Databricks table column. Instead, you must specify the SQL queries foraddDefaultValue
in a Formatted SQL changelog or use thesql
orsqlFile
Change Types in a YAML, JSON, or XML changelog. For syntax examples, see the "Troubleshooting" section of theaddDefaultValue
page.