UP | HOME

Tagged PDFs via Org Export

Table of Contents

The new version of the WCAG requires PDFs (and all online content posted to LMS) to be properly tagged for screen readers. However, this has been essentially impossible when using LaTeX and, therefore, Org Mode exports. Certainly, it is still not perfect today, however, we can add a few simple tweaks and get most of the way there.

Context

Web Content Accessibility Guidelines

A new version of the Web Content Accessibility Guidelines (WCAG) is here and means that many educational resources need to be accessible by default. Therefore, it can no longer be an after thought. This, as a forcing function, may result in some unfortunate "alternative texts" to check the accessibility check boxes.

Org Mode and LaTeX

Plain-text documents written in Org Mode markup can be relatively easily exported into several different output formats, including HTML, LaTeX, Beamer, Markdown, and several others. While quite not as versatile as Pandoc, it can support many formats, including punting to Pandoc.

PDF and LaTeX

The Portable Document Format (PDF) has support for embedding screen reader "tags" which instruct the screen reader how to process the document, providing a sensible ordering of the content when read. However, documents produced using LaTeX have lacked these tags. Adding them required specialized, closed source software.

Fortunately, significant progress has been made to essentially automatically add tagging to documents created using LaTeX. It remains to make this accessible to Org Mode exported documents.

Adding Tags to PDFs

The core of what needs to be done to add tagging to PDFs is to add a few lines to the preamble of the LaTeX document.

Specifically, add the following line before the documentclass:

\DocumentMetadata{%
  lang = en,
  pdfversion = 2.0,
  pdfstandard = ua-2,
  pdfstandard = a-4,
  testphase = {phase-III, title, table, math, firstaid}
}
\documentclass{article}
% ...

Additionally, it is necessary to add language information to the document as well. This can be achieved with a few additional preamble lines:

\usepackage{polyglossia}
\setdefaultlanguage[variant=US]{english}

Finally, we must use a different LaTeX compiler, specifically, this method works best with LuaLaTeX compiler:

latexmk -pdf -lualatex ${document}.tex

Adding Tags to PDFs in Org Mode

To accomplish the above for documents that start their existence as Org Mode documents, we need to add a few more things. To the best of my knowledge, it is not immediately achievable to add a preamble to the LaTeX export process.

The easiest way, as of this writing, to achieve tagged PDFs within Org Mode is to add a variable and extend org-latex-classes:

(defvar org-latex-metadata
  ""
  "LaTeX preamble metadata, file data to appear _before_ DOCUMENTCLASS...")
(setq org-latex-metadata "\\DocumentMetadata{lang = en, pdfversion = 2.0, pdfstandard = ua-2, pdfstandard = a-4, testphase = {phase-III, title, table, math, firstaid}}")

Then, update or extend the available LaTeX classes such that the preamble is prepended before the \documentclass line:

(setq org-latex-classes
      `(("beamer" "\\documentclass[11pt]{beamer}"
         ("\\section{%s}" . "\\section*{%s}")
         ("\\subsection{%s}" . "\\subsection*{%s}")
         ("\\subsubsection{%s}" . "\\subsubsection*{%s}"))
        ("article" ,(concat org-latex-metadata "\n" "\\documentclass[11pt]{article}")
         ("\\section{%s}" . "\\section*{%s}")
         ("\\subsection{%s}" . "\\subsection*{%s}")
         ("\\subsubsection{%s}" . "\\subsubsection*{%s}")
         ("\\paragraph{%s}" . "\\paragraph*{%s}")
         ("\\subparagraph{%s}" . "\\subparagraph*{%s}"))))

This way, when Emacs exports the source document to an article or a Beamer presentation, the intermediate LaTeX document contains the appropriate metadata.

Do not forget to add the above packages and settings to the preamble of your Org documents.

#+LATEX_HEADER: \usepackage{polyglossia}
#+LATEX_HEADER: \setdefaultlanguage[variant=US]{english}

LaTeX PDF Process

Additionally, you may find it useful to use the following command for generating the documents correctly, updating org-latex-pdf-process:

(setq org-latex-pdf-process
      '("latexmk -f -pdf -%latex -interaction=nonstopmode -shell-escape -output-directory=%o %f"))

This uses the LATEX_COMPILER header so that documents can specify their own compiler choices. However, for accessibility, it is likely best to use lualatex.

Conclusion

Over time, the process for creating appropriately tagged and accessible PDFs through LaTeX and Beamer has improved significantly.

I still find the Beammer tagged presentations to be slightly broken for presentation, but workable as documents when used for reference. However, a larger point of generating two separate documents is to use one for teaching and presenting, and the other as reference notes for students.