vacuumTable

vacuumTable is a Change Type in the Liquibase Open Source Databricks extension that vacuums a table.

Uses

Every table in databricks has a physical storage location containing files associated with the table (data and metadata). You can use this Change Type to recursively remove (vacuum) unused files from a table directory. Unused files are defined as files not referenced by the table. Liquibase lets you specify the number of hours back to keep (retain) files, even if they aren't referenced. It is a best practice to set a retention threshold of at least seven days (168 hours).

Vacuuming can significantly reduce storage use and therefore cloud storage costs. It also guarantees that outdated files (data or metadata that has since been changed) are permanently removed from your storage system.

Note: vacuumTable deletes files from physical storage. You cannot run a Liquibase rollback to restore those files because they are not stored in the DATABASECHANGELOG table and cannot be retrieved. However, you can roll back any schema changes you made, such as the value of retentionHours.

Run vacuumTable

To run this Change Type, follow these steps:

  1. Add the Change Type to your changeset, as shown in the examples on this page.
  2. Specify any required attributes. Use the table on this page to see which ones your database requires.
  3. Deploy your changeset by running the update command:
  4. liquibase update

Available attributes

Tip: You must specify all top-level attributes marked as required. If you specify an optional attribute, you must also specify any nested attributes that it requires.

Name Type Description Requirement
tableName String Name of the table to vacuum. Required
retentionHours Integer Number of hours into the past to retain files from. Any files created within the retention threshold are retained; any older files are deleted. Default: 168.

Warning: Databricks recommends setting a retention threshold of at least seven days (168 hours). Otherwise, you risk deleting files still in use by concurrent processes.

Optional

Note: The DRY RUN SQL clause is not supported for the vacuumTable Change Type.

Examples

databaseChangeLog:
  - changeSet:
      id: 2
      author: your.name
      changes:
        - vacuumTable:
            tableName: test_vacuum_table
            retentionHours: 192
      rollback:
        - vacuumTable:
            tableName: test_vacuum_table
            retentionHours: 168
{
  "databaseChangeLog": [
    {
      "changeSet": {
        "id": "2",
        "author": "your.name",
        "changes": [
          {
            "vacuumTable": {
              "tableName": "test_vacuum_table",
              "retentionHours": "192"
            }
          }
        ],
        "rollback": [
          {
            "vacuumTable": {
              "tableName": "test_vacuum_table",
              "retentionHours": "168"
            }
          }
        ]
      }
    }
  ]
}
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:databricks="http://www.liquibase.org/xml/ns/databricks"
    xmlns:pro="http://www.liquibase.org/xml/ns/pro"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
        http://www.liquibase.org/xml/ns/databricks
        http://www.liquibase.org/xml/ns/databricks/liquibase-databricks-latest.xsd
        http://www.liquibase.org/xml/ns/pro
        http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">

    <changeSet id="2" author="your.name">
        <databricks:vacuumTable tableName="test_vacuum_table" retentionHours="192"/>

        <rollback>
            <databricks:vacuumTable tableName="test_vacuum_table" retentionHours="168"/>
        </rollback>
    </changeSet>

</databaseChangeLog>

Database support

This Change Type is only supported for Databricks. It does not support auto rollback.

Related links