Skip to content

Audit

The command emscli:web:audit audits a websites. This command is interruptible and resumable at any moment. This command has 2 outputs:

This command can perfomed multiple audits but not always all of them on every website's urls. See Auditors.

Prior using this command, please login to an elasticms admin: ems:admin:login http://admin.my-elasticms.tld.

You may want to just an Excel report, without indexing in elasticms, without any instance of elasticms: emscli:web:audit http://my-website.tld/ --dry-run

Report

An Excel report is generated after every tested. The report contains information about the current audit, even is the audit has been resumed. Use the --continue flag to resume an audit.

This section reports every broken links. It means http or https links that did returned something else that an 200 as status code.

This section reports links that have been ignored during the audit. As they have been explicitly excluded (see --ignore-regex option). Or as they are not crawlable (such as mailto or javascript urls).

Warnings

This section reports warning message. Such as the URL is case sensitive or if the request has been redirected.

Accessibility

This section reports, by auditors, how many accessibility issues ha ve been raised.

Security

This section reports, by auditors, how many security issues ha ve been raised.

Auditors

For performance reasons; Tika, pa11y and lighthouse (which are all optionals) audits are performed in parallels. But those auditors are heavy and affects the audit's speed. Turn them on carefully.

The request audit

This audit is performed on every crawlable URL (HTTP and HTTPS). Other URLs, such ftp, mailto, javascript are not tested. But they are reported as ignored URLs. See Report.

This audit will index in elasticms:

  • The url
  • The referer's url
  • If the URL is case sensitive
  • If the request is valid
  • If the request returned an error message
  • The status code (i.e. 200)
  • If some security headers are missing
  • The size
  • Compute a hash (in order to detect duplicates)

The HTML crawler audit

This audit is performed on every valid HTML URL's response. This audit is used to introspect internal links in order to add them to the queue.

This audit will index in elasticms:

  • External links
  • The document's title
  • The document's meta title
  • The document's canonical link

This audit is performed on every external link found. It will index in elasticms the:

  • The URL
  • The status code
  • The error message

The pa11Y audit

This auditor can be activated with the flags --pa11y or --all. It only concerns HTML urls. This auditor collect a pa11y accessibility audit and indexes it in elasticms.

The Lighthouse audit

This auditor can be activated with the flags --lighthouse or --all. It only concerns HTML urls. This auditor collect a Lighhouse audit and indexes in elasticms:

  • The accessibility score
  • The performance score
  • The SEO score
  • The best practice score
  • A desktop screenshot
  • Warning message
  • The report itself

The tika audit

This auditor can be activated with the flags --tika or --all. This auditor collect a tika audit and indexes in elasticms:

  • The locale
  • The indexable content
  • The document's title
  • The document's author

This auditor is also used to extract links/urls form non html responses (i.e. PDF, .docx, ...) in order to test if they are broken. And, for internal links, to add them to the queue.

An additional --tika-base-url parameter allows to specify a Tika Server. It's set by default to http://localhost:9998/.

So if you have docker you can easily start an instance of Tika Server compatible with this default value:

bash
docker run -d -p 9998:9998 apache/tika