Timed posts with Pelican.

Late last year I posted that I'd migrated my website to a new blogging package called Pelican, which is a static site generator. If you noticed that my site's been screamingly fast lately, that's why. My site doesn't have to be rendered one page at a time with PHP on the server, and it also doesn't use one of Dreamhost's likely overloaded database servers as its back end. However, this brings a couple of drawbacks. Logically, a site made out of static HTML5 pages doesn't have a control panel to log into, so there isn't any way of controlling how the site operates. If it doesn't get written in a Markdown flavored text file, it doesn't happen. Which also means that scheduling posts for the future, which is my preferred way of posting new articles, is now a problem that has to be solved and not a feature to be used. It's an easy to solve problem, however.

Pelican-the-site-builder recognizes only three possible statuses for a post: Draft (a post will be rendered and uploaded into a hidden directory), hidden (where a post will be rendered into HTML and uploaded, but not linked to), and published (where an entry or page goes live and can be accessed normally). However, Pelican does something interesting when it encounters a post status it doesn't recognize - it complains and skips the file. Check this out:

The status this post was set to when I took that shot was 'scheduled', which is both skipped by Pelican and easy to grep for, like so:

(pelican) {19:23:04 @ Sat Jan 02}
[drwho @ windbringer content] () $ grep '^Status: scheduled' *.md
prearranged-posts-with-pelican.md:Status: scheduled

This output is very easy to capture and use in a loop to do other things. This is part of our solution. The next part is the "Date:" line in the post's metadata, which states the calendar date and wallclock time a post went live (or was supposed to go live). This is also easy to extract from each post (reformatted for clarity):

(pelican) {19:29:31 @ Sat Jan 02}
[drwho @ windbringer content] () $ grep '^Date:' $(grep '^Status: scheduled'
    *.md | awk -F: '{print $1}')
Date: 2021-01-04 09:00

Translation: grep all files with the extension .md for the string "Status: scheduled" starting in the first column of the line. Use awk to chop up each line every time it sees a colon (:) and print everything up to the first colon found. Capture the output of that command string and use it as an argument to the outermost command. Now grep for the string "Date:" starting in the first column of each line, and print each match.

So if we add a little plumbing, these two things are what we need to implement timed posts with Pelican. Let's assume that the name of the blog is the same as the name of the repository it's kept in (for example, antarctica.starts.here/), and that the name of the blog's theme is the same as its Git repository (pelican-html5up-striped/). We have a Linux box someplace that either has the Git repositories on them or can clone them from somewhere like Github. Also on this box we have some basic tools (which we have if the machine is not completely broken) and an installed copy of Pelican.

The first thing we would want to do is git clone the repository containing the site's content into a temporary directory, and then git clone the repository for the site's theme inside the content checkout, per using Pelican normally. Then we can build a list of every file (blog post) with the status "scheduled" in the content directory; we can save that output in a text file to refer to later. Then, for every filename in that text file we can look at the date and see if it's today's date or not (to make life easier, let's say that we always post new things at the same time every day). If the dates match, we can use sed to rewrite the "Status" line to say "published". After we've done that we can use Pelican to build a new version of the site (make html) and deploy it to the webhost (make deploy). Because of the way git works, we then commit the edited and posted files so that we don't lose the fact that we've just edited a file and push the changes back to our primary copy (and so that we don't re-push built files if we don't have to).

Now that we've worked out the theory for timed posts, we can take each step of the process and turn it into a shell script. You are more than welcome to write your own (and in fact, I encourage you to do so as a programming exercise). However, if you just want your Pelican site to have timed posts, I've committed the shell script that I use to my public Experiments/ repositories (Github) (Gitlab) (git.hackers.town) so anyone who wants to can download and run it as a cronjob. It's very low maintenance and there isn't any configuration file. Open the file in your favorite text editor and change the values of the variables to suit your setup. If there's anything else that you'd need to do to make this script do what you want please feel free to add it. Like any reasonably useful tool it's readily customizable to your needs.

Happy hacking, and happy new year.