Trimming changelog files

Veteran Liquibase users often ask how to clear out a changelog file that is unwieldy. This guide covers how to address this common occurrence.

You add individual change sets to your changelog file to make changes to your database. Over time, those changes can build up to thousands of entries, many of which are now redundant or inefficient. These situations can look like a table created and then deleted or, a table was created and then columns were added individually instead of creating a table with all the columns.

How to simplify the clutter

We strongly suggest you ask yourself if simplification is necessary. The changelog was built over a long period of time and it was ran and tested countless times. Editing the changelog file introduces risk which has a cost of its own. Do the performance or file size concerns you have outweigh the risk of messing with a script that you know works? If they do, please proceed and find the scenario and solution that best fits your own circumstances and apply them accordingly.

Scenario and solution 1: Changelog File Size

Often the changelog file is so large that your editor chokes on it, or there are too many merge conflicts. The best way to handle this is to simply break up your changelog file into multiple files. Instead of having a single changelog.xml file with everything in it, create a main.changelog.xml file which uses the tag to reference other changelog files.

<databaseChangeLog
	xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
	xmlns:pro="http://www.liquibase.org/xml/ns/pro"
	xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
		http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
		http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
		http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    <include  file="com/example/news/news.changelog.xml"/>  
    <include  file="com/example/directory/directory.changelog.xml"/>  
</databaseChangeLog>

 

When you run liquibase update against the main.changelog.xml file, changesets in com/example/news/news.changelog.xml will run and then the changesets in com/example/directory/directory.changelog.xml will run. You can break up changesets in whatever manner works best for you. Some break them up by feature, some break them up by release. Find what works best for you.

Scenario and solution 2: Liquibase Update Time Efficiency

Other times, the problem is that liquibase update takes too long. Liquibase tries to be as efficient as possible when comparing the contents of the DATABASECHANGELOG table with the current changelog file and even if there are thousands of already ran changesets, an update command should take just seconds to run. If you find that update takes longer than it should, watch the Liquibase log to determine why.

Perhaps there is an old runAlways="true" changeset that no longer needs to run or there are preconditions which are no longer needed. Running Liquibase with --log-level=INFO or even --log-level=DEBUG can give additional output which can help you determine which changesets are slow. Once you know what is slowing down your update, try to alter just those changesets rather than throwing out the whole changelog and starting from scratch. You will still want to retest your changelog in-depth, but it is a far less risky change.

Scenario and solution 3: Surgically Alter Existing changesets

Some users find that liquibase update works well for incremental updates, but creating a database from scratch takes far too long. Are you re-creating databases often enough that the risk of a change to the creation script makes sense? If you are, look for problem changesets as described above. Indexes often cause performance bottlenecks during database creation, so start with those. If you are creating and updating indexes frequently in your creation process, you may be able to combine those changesets into something more efficient.

Note: When you need to surgically alter your existing changesets, remember how Liquibase works: each changeset has an id, an author, and a file path which together uniquely identifies it. If the DATABASECHANGELOG table has an entry for that changeset it will not run it. If it has an entry, it throws an error if the checksum for the changeset in the file doesn't match what was stored on the last run.

Alter a changelog manually by modification, merge, or removal

Modify:

How you modify your existing changesets will also depend on your environment and where in the changelog the problem changesets are. If you are modifying changesets that have been applied to all of your environments and are now only used on fresh database builds you can treat them differently than if they have been applied to some databases but not yet to others.

Merge:

To merge or modify existing changesets you will be doing a combination of editing existing changesets, removing old changesets, and creating new ones.

Remove:

Removing unneeded changesets is easy because Liquibase doesn't care about DATABASECHANGELOG rows with no corresponding changesets. Just delete out of date changesets and you are done. For example, if you have a changeset that creates the table cart and then another that drops it, just remove both changesets from the file. You must make sure, however, that there are no changesets between the create and the delete that make use of that table or they will fail on a fresh database build. That is an example of how you are introducing risk when changing your changelog file.

Example scenario:

Suppose instead you have a cart table that is created in one changeset, then a promo_code column is created in another and an abandoned flag is created in another.

<databaseChangeLog
	xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
	xmlns:pro="http://www.liquibase.org/xml/ns/pro"
	xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
		http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
		http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
		http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    <changeSet  author="nvoxland"  id="1">  
        <createTable  tableName="cart">  
            <column  name="id"  type="int"/>  
        </createTable>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="2">  
        <addColumn  tableName="cart">  
            <column  name="promo_code"  type="varchar(10)"/>  
        </addColumn>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="3">  
        <addColumn  tableName="cart">  
            <column  name="abandoned"  type="boolean"/>  
        </addColumn>  
    </changeSet>  

</databaseChangeLog>

Option 1: 

This will work well if all existing databases have the cart table with the promo_code and abandoned columns already added.

- Combine everything into a new changeset using the existing id="1" .

- Delete the other changesets.

<databaseChangeLog
	xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
	xmlns:pro="http://www.liquibase.org/xml/ns/pro"
	xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
		http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
		http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
		http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    <changeSet  author="nvoxland"  id="1">  
        <validCheckSum>7:f24b25ba0fea451728ffbade634f791d</validCheckSum>  
        <createTable  tableName="cart">  
            <column  name="id"  type="int"/>  
            <column  name="promo_code"  type="varchar(10)"/>  
            <column  name="abandoned"  type="boolean"/>  
        </createTable>  
    </changeSet>  
</databaseChangeLog>

Note: Running Liquibase against existing databases just sees that id="1" already ran and does not commit any new script.

Option 2:
Running Liquibase against a blank database will create the cart table with all the columns right away. It's important to note that the additional flag is required. If it was not added, existing databases will throw an error stating that id="1" has changed since it was run.

- Simply apply the validCheckSum tag in the error message to notify Liquibase that you know it changed and the new value is valid.

Option 4:
If you have some databases where the promo_code and/or abandoned columns have not yet been added, follow the below instructions:

- Update the original createTable as before.

- Use preconditions with onFail="MARK_RAN" to handle cases where the old changesets ran.

- Do not add the columns again if the new changesets ran.

<databaseChangeLog
	xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
	xmlns:pro="http://www.liquibase.org/xml/ns/pro"
	xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
		http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
		http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
		http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">				  
    <changeSet  author="nvoxland"  id="1">  
        <validCheckSum>7:f24b25ba0fea451728ffbade634f791d</validCheckSum>  
        <createTable  tableName="cart">  
            <column  name="id"  type="int"/>  
            <column  name="promo_code"  type="varchar(10)"/>  
            <column  name="abandoned"  type="boolean"/>  
        </createTable>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="2">  
        <preConditions  onFail="MARK_RAN">  
            <not><columnExists  tableName="cart"  columnName="promo_code"/></not>  
        </preConditions>  
        <addColumn  tableName="cart">  
            <column  name="promo_code"  type="varchar(10)"/>  
        </addColumn>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="3">  
        <preConditions  onFail="MARK_RAN">  
            <not><columnExists  tableName="cart"  columnName="abandoned"/></not>  
        </preConditions>  
        <addColumn  tableName="cart">  
            <column  name="abandoned"  type="boolean"/>  
        </addColumn>  
    </changeSet>  

</databaseChangeLog>

Option 5:
On existing databases that have all three changesets already ran, Liquibase will just continue on as before. For existing databases that have the old cart definition, it will see that the columns don't exist for id="2" and id="3" and execute as usual. For blank databases, it will create the table with the promo_code and abandoned columns and then in id="2" and id="3" it will see that they are already there and mark that they have ran without re-adding the columns.

Preconditions

Note: Using preconditions will add a performance overhead to your update executions and are ignored in update-sql mode because Liquibase cannot know how applicable they are when changesets have not actually executed. For that reason it is best to avoid them if possible, but definitely use them when needed.

Preconditions also add complexity to your changelog which will require additional testing so keep that in mind when deciding whether to modify your changelog logic. Sometimes it is easiest and safest to wait until all your databases have the columns and then modify the changesets to avoid the preconditions.

The cart/promo_code/abandoned example shows some basic patterns you can use when modifying existing changesets. Similar patters can be used to optimize different types of bottleneck issues. Just remember when one changeset is changed, it can affect other changesets below which may need to be modified as well. This can easily spiral out of control so be mindful of what you are doing.

If you end up finding that it will work best to completely restart your changelog, see How to set up Liquibase with an Existing Project and Multiple Environments which describes how to add Liquibase to an existing project (even if that project was previously managed by Liquibase).