(New) Static Site Generation
Table of Contents
For a few years, I've been using Hugo for blog generation. Recently, I've decided I wanted to take static site generation into a different direction. Specifically, I wanted to use a different source markup and I wanted to write my own tool set for generating the actual HTML.
We'll walk through the motivation of changing the content into a different format and changing the generation process into a completely custom set of scripts.
Motivation
When I first set down to build a blog 5 years ago, I had a pretty basic set of requirements.
- Lightweight
- Native Markdown Support
- Minimal Dependencies
Hugo met all of these requirements quite well. The templating engine is fairly simplistic; It supports Markdown; It's a written in Go, therefore, only the built artifact is necessary for site generation.
If it fits so well, why change?
It worked well for what I was asking, however, as I wrote more and time went
on, the features of Hugo became more and more complex and created a
mismatch of how I wanted to express the text in the markup. I've felt this was
going in a direction I did not need nor wanted. More specifically, the issues
start surfacing more in the Markdown side.
Markdown simply lacks some features in its markup that is
corrected by blocks of HTML inline and other hacks that are standard in only
some specific "flavor" or translator implementation. Hugo attempts
to patch this over with its implementation of shortcode templates
, however,
these still felt unnatural.
The final nail was discovering Org-mode. I liked the weight of Markdown, but I didn't like its lack of features when needed. I liked rStructured Text, however, I felt it was always too heavy for the documents I was working on.
In finally giving Emacs a full try (another blog post), I discovered Org-mode. It was the exact middle weight I was looking for between Markdown and reST.
Thus, I was in search of a new tool for generating static HTML from a set of source files written in Org-mode.
Org-mode (within Emacs) has a native publish mode, and I had discovered several posts on how people are doing exactly this. However, Org-publish isn't exactly what I was looking for.
Therefore, let's revise the current list of requirements:
- Makefile driven
- Does not require Emacs to generate
That is, I wanted a Makefile
that could generate the site contents.
Furthermore, and no less importantly, I wanted the Makefile
to not include
lines like Emacs --quick --batch ...
. This obviously creates a bit of a
challenge since Org-mode is an Emacs mode.
I decided I could probably generate the content myself with a few scripts and invocations to Pandoc.
High-Level Implementation
The core of the implementation of the new site generation is blog posts written
in Org-mode, processed by several shell scripts, using
Pandoc to perform the translation from raw Org-mode
markup to HTML, all of which is orchestrated by a Makefile
.
I'm not going extol Org-mode's capabilities in this post. There's plenty of resources on it already, no greater authority than the Org-mode Manual itself.
There is, in fact, some limitations of Org-mode due to the
choices of not allowing the generation to include Emacs
itself.
Along the tour of the implementation, it's important to note a guiding principle in the conversion was not breaking existing links. That is, I was and am satisfied with the folders and slug usage for posts and I didn't want the new version to break existing links.
Detailed Implementation
The easy part is generating each post. This is simply an index.html
in the
correct folder. The majority of the complexities stem from the summaries and
main index
page.
Post Content Generation
To generate a blog post's index.html
page, we consider the following make
target:
blog_dir = $(shell $(SCRIPTS_DIR)/org-get-slug.sh $(1)) TEMPLATE_FILES:=$(wildcard templates/*.html) define BLOG_BUILD_DEF $(BUILD_DIR)$(call blog_dir,$T): mkdir -p $$@ $(BUILD_DIR)$(call blog_dir,$T)/index.html: $T \ $(TEMPLATE_FILES) \ Makefile \ | $(BUILD_DIR)$(call blog_dir,$T) $(SCRIPTS_DIR)/generate_post_html.sh $$< > $$@ endef $(foreach T,$(POSTS_ORG_INPUT),$(eval $(BLOG_BUILD_DEF)))
This definition is fairly opaque now. However, the definition will expand for
each post when the foreach
macro expands. For example, when run, the
following targets will be defined for this post:
$(BUILD_DIR)/blog/2019/03/static-site-generation: mkdir -p $@ $(BUILD_DIR)/blog/2019/03/static-site-generation/index.html: posts/static-site-generation.org \ $(TEMPLATE_FILES) \ Makefile \ | $(BUILD_DIR)/blog/2019/03/static-site-generation $(SCRIPTS_DIR)/generate_post.html $< > $@
This will create the correct directory for each post, e.g.,
/blog/2019/03/static-site-generation
, and place the translated HTML into this
directory as index.html
.
Note: it doesn't actually translate to
$(TEMPLATE_FILES)
. During the expansion of the definition, the variable$(TEMPLATE_FILES)
is similarly expanded. This is acceptable, however, since it's a static list of files and has no bearing on which post's target is being expanded.
The generate_post.sh
script is fairly basic:
#!/usr/bin/env bash # Generate HTML for blog post ORGIN=${1} PROJ_ROOT=$(git rev-parse --show-toplevel) source ${PROJ_ROOT}/scripts/site-templates.sh source ${PROJ_ROOT}/scripts/org-metadata.sh DISPLAY_DATE=$(date -d ${DATE} +'%a %b %d, %Y') SORT_DATE=$(date -d ${DATE} +'%Y %m %d ') cat ${HTML_HEADER_FILE} cat ${HTML_SUB_HEADER_FILE} echo -n "<h1 class=\"title\">${TITLE}</h1>" echo -n "<div class=\"post-meta\">" echo -n '<ul class="tags"><li><i class="fa fa-tags"></i></li>' echo -n "${TAGS}" | awk '{ printf "<li>%s</li>", $0}' echo -n '</ul>' echo -n "<h4>${DISPLAY_DATE}</h4></div>" pandoc --from org \ --to html \ ${ORGIN} cat ${HTML_FOOTER_FILE}
The org-metadata.sh
script, reads the Org-mode preamble, lines
starting with #+
, and puts them into different variables available for other
scripts. For example, the TITLE
, DATE
, TAGS
are pulled out and used to
generate the title section of each post. Furthermore, some templates are
pulled in to generate the headers and footers of each page. The templates are
written directly in HTML and really serve only to simplify each page with
otherwise largely duplicated content.
Summary Page Generation
The summary page is a bit more involved to generate. A few questions had to be answered before it was possible: how to generate the summary text? And how to sort and order posts?
To answer the first question, I dug into how Hugo was generating these summaries. It turns out, it really only takes the first couple hundred characters and calls it the "summary". This depends largely on the content of each post to actually describe the post in the first couple hundred characters. Obviously, this led to some awkward results, especially with links and section headings mixed in.
To achieve similar results, it would be fairly easy to write a script to simply take the first few hundred characters after the preamble and output this into something to be collected for the summary page. However, a better solution is available since we are taking full control over the generation process. Namely, we can put the preview content into a specific Org-mode block to be parsed out and used explicitly for this purpose. If the summary for a post is only a sentence or two, the summary generation process won't then start reading extra text, if the summary requires a little more detail, it won't be cut short by the arbitrary read limit.
To generate the preview content, the generate_post_preview.sh
script is used:
#!/usr/bin/env bash # Generate HTML post summary tags ORGIN=${1} PROJ_ROOT=$(git rev-parse --show-toplevel) source ${PROJ_ROOT}/scripts/org-metadata.sh echo "${LINKS}" echo "${PREVIEW}"
The LINKS
variable is included in this file because we are generating an
intermediate file for Pandoc to generate the summary content.
Without the LINKS
, any links included in the preview section would be broken.
The second question actually turns out to be pretty easy in practice: we parse
the #+ DATE:
line from the preamble and prepend it to the summary content.
From the org-metadata.sh
script:
ORIGIN=${1} DATE=$(awk -F': ' '/^#\+DATE:/ { printf "%s", $2}' ${ORGIN})
Then, from the generate_post_summary_html.sh
script:
#!/usr/bin/env bash # Generate HTML post summary tags ORGIN=${1} GENERATED_PREVIEW_FILE=${2} PROJ_ROOT=$(git rev-parse --show-toplevel) source ${PROJ_ROOT}/scripts/org-metadata.sh DISPLAY_DATE=$(date -d ${DATE} +'%a %b %d, %Y') SORT_DATE=$(date -d ${DATE} +'%Y %m %d ') PREVIEW_CONTENT=$(cat ${GENERATED_PREVIEW_FILE} | pandoc -f org -t html) echo -n "${SORT_DATE}" echo -n '<article class="post"><header>' echo -n "<h2><a href=\"${SLUG}\">${TITLE}</a></h2>" echo -n "<div class=\"post-meta\">${DISPLAY_DATE}</div></header>" echo -n "<blockquote>$(echo ${PREVIEW_CONTENT})</blockquote>" echo -n '<ul class="tags"><li><i class="fa fa-tags"></i></li>' echo -n "${TAGS}" | awk '{ printf "<li>%s</li>", $0}' echo -n '</ul>' echo -n '<footer>' echo -n "<a href=\"${SLUG}\">Read More</a>" echo -n "</footer>" echo ""
Finally, this is all put together with the generate_index_html.sh
script:
#!/usr/bin/env bash # Generate index.html page INPUT_FILES=${@} PROJ_ROOT=$(git rev-parse --show-toplevel) source ${PROJ_ROOT}/scripts/site-templates.sh cat "${HTML_HEADER_FILE}" echo "<body>" cat "${HTML_SUB_HEADER_FILE}" cat ${INPUT_FILES} | sort -r -n -k1 -k2 -k3 | awk -F' ' '{print $4}' echo "</body>" cat "${HTML_FOOTER_FILE}"
Specifically, the following line is of interest with respect to properly sorting:
cat ${INPUT_FILES} | sort -r -n -k1 -k2 -k3 | awk -F' ' '{print $4}'
Use the tab-separated date fields from before, and use them to sort each of the
post summaries onto the index.html
page.
RSS/XML Generation
I also wanted to keep the RSS/XML feeds going. However, as it turns out,
generating the RSS feed was achieved by performing essentially the same steps
used for generating the summary index.html
page.
Future Work
There is a fairly obvious limitation of the summary page generation, but only
really obvious if I write more content. There was and is no current archive
page. Moreover, all posts are put into the index.html
summary page.
If/when more posts are written and published, a solution for the first page
will be necessary. However, this was necessary regardless of whether the blog
is generated using Hugo or generated via the new process.
Parting Thoughts
Like many projects, this was started because I personally was dissatisfied with the current state of options. However, that said, I did not write these scripts to be used directly for someone else. I'm not sure I would necessarily recommend this approach to someone else, unless, of course, they wanted to do it to learn or to otherwise take control of their content. That said, I hope this captures the essence of the scripts, their major functions, and the motivations behind them. The scripts are available, WITHOUT WARRANTY, under the GNU General Public License (version 3).
If you have questions or comments, feel free to reach out to me.