vacuumTable
vacuumTable
is a Change Type in the Liquibase Open Source Databricks extension that vacuums a table.
Uses
Every table in databricks has a physical storage location containing files associated with the table (data and metadata). You can use this Change Type to recursively remove (vacuum) unused files from a table directory. Unused files are defined as files not referenced by the table. Liquibase lets you specify the number of hours back to keep (retain) files, even if they aren't referenced. It is a best practice to set a retention threshold of at least seven days (168 hours).
Vacuuming can significantly reduce storage use and therefore cloud storage costs. It also guarantees that outdated files (data or metadata that has since been changed) are permanently removed from your storage system.
Warning: vacuumTable
deletes files from physical storage. You cannot run a Liquibase rollback to restore those files because they are not stored in the DATABASECHANGELOG table and cannot be retrieved. However, you can roll back any schema changes you made, such as the value of retentionHours
.
Run vacuumTable
To run this Change Type, follow these steps:
- Add the Change Type to your changeset, as shown in the examples on this page.
- Specify any required attributes. Use the table on this page to see which ones your database requires.
- Deploy your changeset by running the
update
command:
liquibase update
Available attributes
Tip: You must specify all top-level attributes marked as required. If you specify an optional attribute, you must also specify any nested attributes that it requires.
Name | Type | Description | Requirement |
---|---|---|---|
tableName
|
String | Name of the table to vacuum. | Required |
retentionHours
|
Integer | Number of hours into the past to retain files from. Any files created within the retention threshold are retained; any older files are deleted. Cannot be negative. Default: 168 .Warning: Databricks recommends setting a retention threshold of at least seven days (168 hours). Otherwise, you risk deleting files still in use by concurrent processes. If you specify a value under 168, Databricks throws an error. |
Optional |
Note: The DRY RUN
SQL clause is not supported for the vacuumTable
Change Type.
Examples
databaseChangeLog:
- changeSet:
id: 2
author: your.name
changes:
- vacuumTable:
tableName: test_vacuum_table
retentionHours: 192
rollback:
- vacuumTable:
tableName: test_vacuum_table
retentionHours: 168
{
"databaseChangeLog": [
{
"changeSet": {
"id": "2",
"author": "your.name",
"changes": [
{
"vacuumTable": {
"tableName": "test_vacuum_table",
"retentionHours": "192"
}
}
],
"rollback": [
{
"vacuumTable": {
"tableName": "test_vacuum_table",
"retentionHours": "168"
}
}
]
}
}
]
}
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:databricks="http://www.liquibase.org/xml/ns/databricks"
xmlns:pro="http://www.liquibase.org/xml/ns/pro"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
http://www.liquibase.org/xml/ns/databricks
http://www.liquibase.org/xml/ns/databricks/liquibase-databricks-latest.xsd
http://www.liquibase.org/xml/ns/pro
http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
<changeSet id="2" author="your.name">
<databricks:vacuumTable tableName="test_vacuum_table" retentionHours="192"/>
<rollback>
<databricks:vacuumTable tableName="test_vacuum_table" retentionHours="168"/>
</rollback>
</changeSet>
</databaseChangeLog>
Database support
This Change Type is only supported for Databricks. It does not support auto rollback.