AEM Link Checker : Comprehensive Guide

AEM link checker is used to validate all internal and external links available on the page. The main purpose of developing link checker is that content author should not worry about bad or broken links on publish environment , it also allow authors to view list of all valid and invalid links available on his website at a single place.

After completing this tutorial you will have a clear and understanding about:-

AEM External link checker:-


AEM Link Checker is based on an eventHandler and gets triggered on creates and updates for /content and its child nodes. All content under selected root path is parsed and links are validated. All the validation of links is done asynchronously in the background and the HTML is updated based on verification results.

Note:- If you are having huge repository (/content), that includes frequent updation of links. Then it is not advised to use link checker due to performance issues. As it gets triggered periodically and traverse the whole repository for validating links. This may cause slowness in your author instance.

Now lets see how aem link checker works:-

  • As soon as author save any link on page, either using rte or any custom component. Link checker eventHandler gets triggered.
  • Link checker event Handler traverse /content node and checks for new/updated links, once found it will store that mapping under /var/linkchecker cache folder.
    var-linkchecker-aem
  • Then control goes to Day CQ Link Checker Service, It checks for the scheduler.period configuration. Once scheduler time is met, it triggers the scheduler to validate the links syntax and structure against all the given configuration like the special prefix that it has to ignore during validation and the patter that the link check should use to verify the syntax of the url.
    day-cq-link-checker-service-aem
  • Once the syntax is validated the results are then pushed to  /etc/linkchecker.html. But the links will remain in pending state, until Day CQ Link Checker Task scheduler validated these links by making an ajax GET call.
    day-cq-link-checker-task-aem
  • AEM link checker scheduler Day CQ Link Checker Task  runs periodically to check validity of valid and in valid links that are store under /etc/linkchecker.html. 
    • Administrator user can configure the frequency on which he want to run this scheduler by updating Scheduler Period property its default value is 3600 sec.
    • Once triggered it will remove all the invalid or unreachable links from /etc/linkchecker.html.

Below screenshot of http://localhost:4502/etc/linkchecker.html will provide you better understanding how the values are getting fetched from /var/linkchecker and link checker list is updated. You can also request for re-validation and refresh the status of the links here.

analyze-linkchecker

After validation Invalid External Links will be displayed as below:-

invalid-external-link-aem

AEM Link checker is configured using below four services:-

  • Day CQ Link Checker Info Storage Service – configures the link cache size. default is 500.
  • Day CQ Link Checker ServiceConfigure the frequency of background check, default interval is 5 seconds
  • Day CQ Link Checker Task – Configure the frequency of background check for validating links.
  • Day CQ Link Checker Transformer – config for all the elements that need to be transformed by the link checker and rewritten.

AEM internal link checker:-


Internal Links are validated as soon as content author add any internal links (repository links ex: /content/we-retail/ca) on page either using rte or any custom component. After validation, if url is no longer valid, then they are removed on the publisher or shown as broken links on the author.

invalid-internal-link-aem

Fixing broken links that link checker is not able to validate:-


Sometimes, you might run into a broken link situation means link is not available on publish even though it is a valid link. This might be because aem link checker automatically checks links and will not publish a broken link. Sometimes it is good as you have a self monitoring system that prevents you from publishing a broken link but what happen when you know that the link is correct even though aem is not able to publish it as it is considering it as broken, then it is a problem.

There are two types of links that link checker requires configuration for validating:-

  • Links that have a special prefix (ex: href=”tel:123-123-1234″ or href=”*|something|*”).
  • Links that after post processing having query param, which you want to mark as always valid or skip validation.
Links having special prefix:-
  • Go to http://localhost:4502/system/console/configMgr.
  • Search for “Day CQ Link Checker Service” and update Special Link prefix.
  • For example when we add “tel:” as prefix then during syntax and structure validation  it will not check or rewrite it. By default few prefix are already added over here javascript:, data:, mailto:, #, <!—, ${

add-special-link-prefix-link-checker-aem

Link consist of variable that is updated on post processing:-

These changes needs to go at coding level, where you can add one more attribute  x-cq-linkchecker to <a> tag mark up. This attributed tells aem how to process this anchor tag. Lets see in more details below:-

  • You can add x-cq-linkchecker=”valid” parameter in the <a> tag to make sure that links are always mark as valid by CQ. In this case link checker will check the link but will mark it valid. ( For Ex:- <a x-cq-linkchecker=”valid” …>)
  • You can optionally use x-cq-linkchecker=”skip” in the <a> as well. In this case link checker will not even check for validity for link.( For Ex:- <a x-cq-linkchecker=”skip” …>)

Difference between aem link checker and link rewriter:-


Link checker is built for checking validity of url’s. Link checker scheduler runs periodically to validate URLs available under /content in repository and save the result under /var/linkchecker cache folder. All the links that have been checked or pending can be seen under /etc/linkchecker.html. After validating all the urls if they are no longer valid, they are removed on the publisher or shown as broken links on the author.

Link rewriter is built if you want to rewriting url during rendering of the HTML. It parses the HTML and rewrite the urls available inside the html. If you want to do custom re writing of urls then you can write your own link rewriter by extending org.apache.sling.rewriter.Transformer interface.

link-checker-transformer-disable-link-checking-rewriting-aem

Disable link checker in AEM:-


There are two ways to disable link checker in aem, either though felix console or by overriding  Day CQ Link Checker Service regular expression. Follow below steps to disable aem link checker :-

Disabling all link checking by Felix console configuration:-
  • Go to /system/console/configMgr and login as admin
  • Find the “DAY CQ Link Checker Transformer
    link-checker-transformer-disable-link-checking-rewriting-aem
  • Check the “Disable Checking” box and save.
  • Go to /crx/explorer and login as admin
  • Open “Content Explorer
  • Once all the changes are made browse to /var/linkchecker
  • Right click the node and select “Delete Recursively”
  • Click “Save All”.

disable-aem-link-checker

Note:- Using this configuration we have an option either to disable only link checking or both link checking and link rewriting.

Disabling link checking of URLs using regular expressions:-

AEM Link checker can be configured in such a way either to ignore all links from being processed or pattern of links based on regular expression.

The following configuration is specific for the publish instance. To configure for author, change the configuration path from ../config.publish/.. to ../config.author/… . If you wish to configure it for both author and publish change the configuration path from ../config/..

disable-linkchecker-aem-specific-expression

  • Login to crx/de as admin.
  • Create a configuration node (with node type sling:OsgiConfig) in the project ( /apps/<project-name>/config.publish/{OSGi service PID}).
  • Alternatively, you can copy the one from /libs/cq/linkchecker/config/com.day.cq.rewriter.linkchecker.impl.LinkCheckerImplin the config folder of your choice (that is /apps/myapp/config.publish)
  • Change the property service.check_override_patterns from “^system/” to “^.” 
    • ^system/ :- This expression means ignore checking  and rewriting of all external links that starts with system.
    • ^. :- This expression means ignore checking and rewriting of all external links.
    • ^http://www\.google\.com/ :- This expression means ignore checking and rewriting of http://www.google.com.
  • Delete all nodes under /var/linkchecker to stop the link checker from periodically rechecking URLs
  • If the configuration was done on the author, then make a package and install it on your publish instances as well.

Note:- If you are using “^.” it will disables all link checking and link rewriting

 

Spread the love
  • 20
  • 5
  •  
  •  
  •  
  •  

Leave a Reply

Your email address will not be published. Required fields are marked *