Run Online and Offline Tar Compaction in AEM
After the release of AEM 6 and introduction of TarMK, operation teams has faced rapid growth in the size of repository due to a know oak bug for which adobe has already provided a hotfix Oak-core 1.0.25. But after installing this hotfix also it is becoming difficult to maintain or pause the rapid growth of repository. So, adobe come up with two housekeeping activities online and offline tar compaction to control the size of aem repository. Adobe strongly suggest not to use online tar compaction as it takes very long time for compacting repository and affect the performance of site.
Is your repository size increasing very rapidly ? Looking for step by step guide on how to run online and offline tar compaction in aem ? Then this tutorial is for you.
There are two ways to run tar compaction in aem Online Tar Compaction and Offline Tar Compaction. Below topics are covered in this tutorial:-
- Steps to Run Online Compaction in AEM
- Steps to Run Offline Compaction in AEM
- Increase Performance of offline tar compaction
- Frequently Asked Questions
Note:- Before running any type of compaction make sure you have taken backup of your existing repository.
Steps to Run Online Tar Compaction in AEM:-
Running online tar compaction is not recommended by adobe, and from aem 6.2 it is triggered under restrictive use. Follow below steps to run online tar compaction:-
Note:- In online compaction checkpoints are not removed, so we have to manually run script to clear the old checkpoints.
- Download correct version of oakrun.jar (Refer faq below on how to find correct version of oakrun.jar). For Example AEM 6.0 (Oak 1.0.22), with AEM 6.1 (Oak 1.2.7), with AEM 6.1 SP1 (Oak 1.2.11)
- Shutdown your AEM instance.
- Find old checkpoints and remove un-referenced checkpoints as shown below in offline tar compaction.
- Restart your AEM instance.
- Go to System console config manager , search for Apache Jackrabbit oak Segment NodeStore Service .
- Make sure the PausedCompaction is set to false(unchecked) as well as CloneBinaries (If want to run compaction on binary also).
- Go to the maintenance dashboard -> Daily Maintenance Window http://localhost:4502/libs/granite/operations/content/maintenance/window.html/mnt/overlay/granite/operations/config/maintenance/_granite_daily
- Click on Add Task —> select Revision Clean up from drop down and click OK.
- By Default it will pick a time around 2 am for running daily revision clean up.
- Hover on the window and click Play button to trigger Revision Clean up.
- Green Color– Revision clean up task is scheduled.
- Orange Color– Revision clean up task is currently running.
- Red Color– Revision clean up task is failed.
Refer SegmentNodeStoreService Class for more info about each field and Oakrun.jar commands
Steps to Run Offline Tar Compaction in AEM:-
Make sure that you are using correct version of oakrun jar that matches you repository or aem repository version (Refer faq below on how to find correct version of oakrun.jar). Offline compaction scripts is basically divided into 5 parts:-
- Shutdown AEM Instance
- Find Old Checkpoints
- Remove Unreferenced Checkpoints
- Compact Oak
- Restart AEM Instance
Below command is for Linux/Unix Machine-
#!/bin/bash now="$(date +'%d-%m-%Y')" logfile="compact-$now.log" installfolder="/data/aem" aemfolder="$installfolder/crx-quickstart" oakrun="$installfolder/help/oak-run-1.2.7jar" ## Shutdown AEM Instance $aemfolder/bin/stop echo "AEM Shutdown >> $installfolder/help/logs/$logfile ## Find old checkpoints echo "Finding old checkpoints" java -Dtar.memoryMapped=true -Xmx8g -jar $oakrun checkpoints $aemfolder/repository/segmentstore >> $installfolder/help/logs/$logfile ## Delete unreferenced checkpoints echo "Deleting unreferenced checkpoints" java -Dtar.memoryMapped=true -Xmx8g -jar $oakrun checkpoints $aemfolder/repository/segmentstore rm-unreferenced >> $installfolder/help/logs/$logfile ## Run compaction echo "Running compaction. This may take a while" java -Dtar.memoryMapped=true -Xmx8g -jar $oakrun compact $aemfolder/repository/segmentstore >> $installfolder/help/logs/$logfile ## Report Completed echo "Compaction complete. Please check the log at:" echo "$installfolder/help/logs/$logfile" ## Start AEM back up echo "Starting up AEM" $aemfolder/bin/start echo "AEM Startup" >> $installfolder/help/logs/$logfile
Note:- Above command run on linux/unix machine as windows does not support -Dtar option.
Command for Windows Machine-
Below command is for running offline tar compaction, similarly you can also use command for finding checkpoints and deleting them.
## Run compaction echo "Running compaction. This may take a while" java -jar oakrun.jar compact install-folder/crx-quickstart/repository/segmentstore
Note:- Use as much heap memory as possible for faster I/O operations. It is recommended to use at least eight gigabytes for most common deployments -Xmx8g. You can increase it according to available heap size.
Increase Performance of offline tar compaction:-
I have seen many teams are still using old scripts without -Dtar option, but you can increase the performance of offline tar compaction by using dtar option as suggested by adobe.
After oak version 1.0.22, the oak-run tool introduces several features with an aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible. Below are few commands that we can use:-
-Dtar.memoryMapped:- It is highly recommended that you enable this feature in order to speed up tar compaction. You can set this as true or false.
-Dupdate.limit:- Defines the threshold for the flush of a temporary transaction to disk. The default value is 5000000.
-Dcompress-interval:- Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available.
-Dcompaction-progress-log:- The number of compacted nodes that will be logged. The default value is 1500000, which means that the first 1500000 compacted nodes will be logged during the operation. Use this in conjunction with the next parameter documented below.
-Dlogback.configurationFile:- Use a configuration file for logging.
Note:- Memory mapped file operations -Dtar.memoryMapped do not work correctly on Windows. As tar is meant for linux/unix.
Frequently Asked Questions:-
Below are few questions that might arise in your mind about tar compaction in aem.
- How frequently you should run Offline Revision Cleanup?
- It depends on the repository growth rate. In general , it is recommended that you perform revision cleanup every 2 weeks for an author instance, and once per quarter for a publish instance(I recommend to run monthly on publish again it depends upon repository growth rate).
- What factors determine the duration of the Offline Revision Cleanup?
- The repository size and the amount of revisions that need to be cleaned up determines the duration of the cleanup.
- How to find correct version of oak-run jar?
- Go to felix console –> bundles and search for oak, note the version mention across each oak bundle. Now go to MVN Repository and download respective oak-run.jar file.
- Go to felix console –> bundles and search for oak, note the version mention across each oak bundle. Now go to MVN Repository and download respective oak-run.jar file.
- What can happen if you do not perform revision cleanup?
- The AEM instance will run out of disk space, which will cause outages in production. It is highly recommended that you monitor the disk usage and perform tar compaction if disk space consumed is more than 60%.
- What is the difference between a revision clean up and version purging or clean up?
- Oak revision: Oak organise all content in a tree hierarchy that consists of nodes and properties. Each snapshot or revision of this content tree is immutable, and changes to the tree are expressed as a sequence of new revisions. Typically, each content modification triggers a new revision. For more Info visit http://jackrabbit.apache.org/dev/ngp.html.
- Version: Versioning creates a “snapshot” of a page at a specific point in time. Typically, a new version is created when a page is activated.
Leave a Reply