Run Online and Offline Tar Compaction in AEM

After the release of AEM 6 and introduction of TarMK, operation teams has faced rapid growth in the size of repository due to a know oak bug for which adobe has already provided a hotfix Oak-core 1.0.25. But after installing this hotfix also it is becoming difficult to maintain or pause the rapid growth of repository. So, adobe come up with two housekeeping activities online and offline tar compaction to control the size of aem repository. Adobe strongly suggest not to use online tar compaction as it takes very long time for compacting repository and affect the performance of site.

Is your repository size increasing very rapidly ? Looking for step by step guide on how to run online and offline tar compaction in aem ? Then this tutorial is for you.

There are two ways to run tar compaction in aem Online Tar Compaction and Offline Tar Compaction. Below topics are covered in this tutorial:-

Note:- Before running any type of compaction make sure you have taken backup of your existing repository.

Steps to Run Online Tar Compaction in AEM:-


Running online tar compaction is not recommended by adobe, and from aem 6.2 it is triggered under restrictive use. Follow below steps to run online tar compaction:-

Note:- In online compaction checkpoints are not removed, so we have to manually run script to clear the old checkpoints.

  • Download correct version of oakrun.jar (Refer faq below on how to find correct version of oakrun.jar). For Example AEM 6.0 (Oak 1.0.22), with AEM 6.1 (Oak 1.2.7), with AEM 6.1 SP1 (Oak 1.2.11)
  • Shutdown your AEM instance.
  • Find old checkpoints and remove un-referenced checkpoints as shown below in offline tar compaction.
  • Restart your AEM instance.
  • Go to System console config manager , search for Apache Jackrabbit oak Segment NodeStore Service .
  • Make sure the PausedCompaction is set to false(unchecked) as well as CloneBinaries (If want to run compaction on binary also).

online tar compaction aem

Refer SegmentNodeStoreService Class for more info about each field and Oakrun.jar commands

Steps to Run Offline Tar Compaction in AEM:-


Make sure that you are using correct version of oakrun jar that matches you repository or aem repository version (Refer faq below on how to find correct version of oakrun.jar). Offline compaction scripts is basically divided into 5 parts:-

  • Shutdown AEM Instance
  • Find Old Checkpoints
  • Remove Unreferenced Checkpoints
  • Compact Oak
  • Restart AEM Instance

Below command is for Linux/Unix Machine-


#!/bin/bash
now="$(date +'%d-%m-%Y')"
logfile="compact-$now.log"
installfolder="/data/aem"
aemfolder="$installfolder/crx-quickstart"
oakrun="$installfolder/help/oak-run-1.2.7jar"

## Shutdown AEM Instance
$aemfolder/bin/stop
echo "AEM Shutdown >> $installfolder/help/logs/$logfile

## Find old checkpoints
echo "Finding old checkpoints"
java -Dtar.memoryMapped=true -Xmx8g -jar $oakrun checkpoints $aemfolder/repository/segmentstore >> $installfolder/help/logs/$logfile

## Delete unreferenced checkpoints
echo "Deleting unreferenced checkpoints"
java -Dtar.memoryMapped=true -Xmx8g -jar $oakrun checkpoints $aemfolder/repository/segmentstore rm-unreferenced >> $installfolder/help/logs/$logfile

## Run compaction
echo "Running compaction. This may take a while"
java -Dtar.memoryMapped=true -Xmx8g -jar $oakrun compact $aemfolder/repository/segmentstore >> $installfolder/help/logs/$logfile

## Report Completed
echo "Compaction complete. Please check the log at:"
echo "$installfolder/help/logs/$logfile"

## Start AEM back up
echo "Starting up AEM"
$aemfolder/bin/start
echo "AEM Startup" >> $installfolder/help/logs/$logfile

Note:- Above command run on linux/unix machine as windows does not support -Dtar option. 

Command for Windows Machine-

Below command is for running offline tar compaction, similarly you can also use command for finding checkpoints and deleting them.



## Run compaction
echo "Running compaction. This may take a while"
java -jar oakrun.jar compact install-folder/crx-quickstart/repository/segmentstore


Note:- Use as much heap memory as possible for faster I/O operations. It is recommended to use at least eight gigabytes for most common deployments -Xmx8g. You can increase it according to available heap size.

Increase Performance of offline tar compaction:-


I have seen many teams are still using old scripts without -Dtar option, but you can increase the performance of offline tar compaction by using dtar option as suggested by adobe.

After oak version 1.0.22, the oak-run tool introduces several features with an aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible. Below are few commands that we can use:-

-Dtar.memoryMapped:- It is highly recommended that you enable this feature in order to speed up tar compaction. You can set this as true or false.

-Dupdate.limit:- Defines the threshold for the flush of a temporary transaction to disk. The default value is 5000000.

-Dcompress-interval:- Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available.

-Dcompaction-progress-log:- The number of compacted nodes that will be logged. The default value is 1500000, which means that the first 1500000 compacted nodes will be logged during the operation. Use this in conjunction with the next parameter documented below.

-Dlogback.configurationFile:- Use a configuration file for logging.

Note:- Memory mapped file operations -Dtar.memoryMapped do not work correctly on Windows. As tar is meant for linux/unix.

Frequently Asked Questions:-


Below are few questions that might arise in your mind about tar compaction in aem.

  • How frequently you should run Offline Revision Cleanup?
    • It depends on the repository growth rate. In general , it is recommended that you perform revision cleanup every 2 weeks for an author instance, and once per quarter for a publish instance(I recommend to run monthly on publish again it depends upon repository growth rate).
  • What factors determine the duration of the Offline Revision Cleanup?
    • The repository size and the amount of revisions that need to be cleaned up determines the duration of the cleanup.
  • How to find correct version of oak-run jar?
    • Go to felix console –> bundles and search for oak, note the version mention across each oak bundle. Now go to MVN Repository and download respective oak-run.jar file.
      find oak repository aem version
  • What can happen if you do not perform revision cleanup?
    • The AEM instance will run out of disk space, which will cause outages in production. It is highly recommended that you monitor the disk usage and perform tar compaction if disk space consumed is more than 60%.
  •  What is the difference between a revision clean up and version purging or clean up?
    • Oak revision: Oak organise all content in a tree hierarchy that consists of nodes and properties. Each snapshot or revision of this content tree is immutable, and changes to the tree are expressed as a sequence of new revisions. Typically, each content modification triggers a new revision. For more Info visit http://jackrabbit.apache.org/dev/ngp.html.
    • Version: Versioning creates a “snapshot” of a page at a specific point in time. Typically, a new version is created when a page is activated.
Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.