Rolling My Own RSS Feed

Published on

Estimated reading time: .

My previous post was about writing blogs. This one is about reading them.

Contents

Introduction

The first social media timeline was driven not by a siloed push service like Facebook (2005) or Twitter (2006), but by each individual reader pulling from websites they liked. Websites that provided rolling content published manifest files that machines could track and read, and the most well-known such manifest was RSS.

I have had this blog for two years, and at first I didn’t bother setting up an RSS feed because I didn’t have any posts, and then I didn’t bother because I didn’t have an audience, and now I have an opus and an audience and figured it was time.

I build this website with Middleman and Middleman does not have built-in support for producing an RSS manifest. It doesn’t, technically, even have support for blogging; the blog engine is an extension that got adopted by the main application but is still very much separate from it. Middleman is first and foremost a website compiler, not a blog engine.

So rather than take the time to learn the Middleman APIs and write an extension in Ruby that complements the blog engine, I opted to do things the ugly way. I Googled <ins>DuckDuckWent</ins> for the RSS XML spec, found a bunch of useless websites and finally gave in and opened the W3Schools1 page.

RSS is pretty simple XML. There aren’t that many fields, and what fields there are, are reasonably2 simple to fill out. And thankfully for me, all the information I needed was available programmatically, ready for me to pull from Middleman’s Ruby API at compile time.

The RSS Format

The bare minimum RSS file looks like this:

<? xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
  <channel>
    <title>My Blog</title>
    <link>https://example.com/</link>
    <description>My very cool blog</description>

    <item>
      <title>My First Post</title>
      <link>https://example.com/my-first-post</link>
      <description>My very first post on my very cool blog</description>
    </item>
  </channel>
</rss>

That’s it. It’s verbose, yes, because XML is a bad format, but you only need six tags to make a valid file, and only three of those hold real information. Let’s walk through them.

<? xml version="1.0" encoding="utf-8" ?>

This is just bookkeeping to tell the XML parser what is going on. You’d better be serving all your content in UTF-8, for reasons I don’t need to delve here. Just, c’mon. It’s date +%Y. Use UTF-8.

<rss version="2.0">
  <channel>
    <!-- all your content -->
  </channel>
</rss>

These two tags set up an RSS feed. You can only have one <channel> in an <rss> and only one <rss> in a file, so, I don’t know why we have both of those tags when the only information they provide is which version of the RSS spec we’re using, but, whatever.

Those tags describe the entire feed.

<rss>
  <channel>
    <item>
      <!-- Individual article details -->
    </item>
  </channel>
</rss>

The <item> tag describes one individual article in your blog. One or more <item>s make up a <channel>. It’s valid to have an empty <channel>, but not terribly interesting!

That’s all the framework to get things set up. Let’s actually, you know, give out some information.

Both <channel> and <item> require the following three tags:

<title>The name of the thing</title>
<link>The location of the thing</link>
<description>A summary of the thing</description>

You must provide these three tags for your <channel>, and for each <item>. Fortunately, they’re easy to find: all articles have a title, as does your website (<title> is one of the few attributes required by HTML5 to exist on every page); all web pages have a link location, and the description is just a short blurb that your blog engine probably encourages you to write.

For me, all this information is bundled in each article by Middleman, and I can just pull it out with Ruby, which makes generating <item> groups for each article on here very easy3.

Extra Information

That’s the bare minimum to make a working RSS file. But you probably have more information about your blog and your articles, and the RSS format will happily take it!

Here’s what my feed.rss looks like (the required stuff is stripped because we already covered it):

<rss xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link
      href="https://myrrlyn.net/feed.rss"
      rel="self"
      type="application/rss+xml" />
    <category>Programming</category>
    <copyright>2016-present myrrlyn (Alexander Payne)</copyright>
    <generator>Middleman (Ruby)</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 28 Aug 2018 03:35:42 -0000</lastBuildDate>

    <item>
      <guid>same as link</guid>
      <author>me@example.com (Alexander Payne)</author>
      <pubDate>The date attached to the article</pubDate>
      <category>The tag attached to the article</category>
    </item>
  </channel>
</rss>

The xmlns:atom attribute in <rss>, and the <atom:link> element, allow this RSS file to be digested by an Atom reader, which is a similar standard that fizzled out when RSS did. It’s a nice thing to have, and just lets some more clients read this file.

The <category> tag describe what kind of content is in your channel, and in each individual item. I just copy the tags that show up in the nav section to the right (desktop viewers) or the bottom (phone viewers) into the <item> <category> elements (you can have more than one <category> per <item>) and call it good.

The <copyright> tag is just what it says. Doesn’t need to be anything fancy, or even present, but it’s always worthwhile in this environment.

The <generator> tag says what software made the RSS file.

The <language> tag says in which human language the content is written. It should be one of the standard language-region shortcodes. Because I write in English from the United States, mine is en-us.

The <lastBuildDate> tag is the date (in RFC 822 format, which is bad and wrong but whatever) of the last time the RSS file was generated. I use DateTime.now which turns into the moment of compilation when compiling.

The <guid> tag is just a <link> that really means it. Websites can have more than one path to content, and the commonly distributed path might change to have new things, so the <guid> goes specifically to that text. If this post is informative to you, then you probably have a system where <link> and <guid> are identical.

The <author> tag must be an email, followed by a parenthetical name, the exact opposite of the way Git wants a name followed by a bracketed email. Who needs single standards, am I right?

The <pubDate> tag is the date when the specific item was published.

These are all optional, but provide more context about what you’re serving, and are all probably information you have handy for your whole blog and for each article, so why not include them?

Generation

I will provide code that is specific to my environment: Ruby language, Middleman framework. This will not be directly applicable to you unless you also use this, but it should give you a good starting point for writing your own.

Middleman supports using Embedded Ruby, and will execute the Ruby code placed inside other text files for text manipulation.

Here are the interesting parts my actual feed.rss.erb, which gets compiled down to just feed.rss:

<rss>
  <channel>
    <lastBuildDate><%= DateTime.now.utc.rfc822 %></lastBuildDate>

    <% sitemap.resources %>
    <% .select { |r| r.is_a? Middleman::Blog::BlogArticle } %>
    <% .select { |b| b.path.start_with? "blog" } %>
    <% .sort { |a, b| a.date <=> b.date }.reverse %>
    <% .each do |article| %>
    <item>
      <title><%= article.title %></title>
      <link>https://myrrlyn.net/<%= article.destination_path %></link>
      <guid><!-- same --></guid>
      <description><%= article.data["summary"] %></description>
      <pubDate><%= article.date.utc.rfc822 %></pubDate>
      <% article.data["tags"].each do |tag| %>
      <category><%= tag %></category>
      <% end %>
    </item>
    <% end %>
  </channel>
</rss>

Let me walk through this mess of nonsense. For the record, none of this was documented in Middleman, and I had to open up my app in a Ruby console and just blindly wander around in the guts of it until I figured it out piece by little piece. I don’t want to do that again, thus, this post.

The first thing that happens in that Middleman records the current time and sticks that in <lastBuildDate>.

Then, Middleman looks up the global sitemap object, which holds the entire context of my website, and queries its resources member, which is everything in my source folder. This includes all the CSS and JavaScript and images and other pages, and I don’t want any of that.

This is where Middleman’s second-class blog engine hurts me. When the blog extension is active, I have access to my entire blog as a Ruby object, so I can manipulate the blog HTML templates nicely. But it is not active during the compilation of this RSS file, so I have to fake it.

.select { |r| r.is_a? Middleman::Blog::BlogArticle }

uses the fact that, since the blog engine was actively running when loading all my text, and is still around but dormant now, to find only the resource objects that are articles in a blog. This ignores all the other stuff I listed earlier.

.select { |b| b.path.start_with? "blog" }

I have two Middleman blogs on this website. The other one is my Elder Scrolls fanfiction, and is not part of this file (I have an identical file just for those). So I only keep the blog articles that have a blog/... path.

I then organize them by date, and reverse the list, so the newest is first and the oldest is last. Once done, I begin iterating through each of them.

The text in between .each do |article| and end gets duplicated for each article in the pared-down and sorted collection, so each blog post gets an <item> stack. In that stack, I have access to the article variable and all its associated data. I can query my frontmatter with data[...] and get information directly with .title and .date, and that’s how all the parts get filled in.

Once this is done, I have a completely built RSS XML file, ready to deploy!

Discovery

Just having a text file on your server that has RSS content in it isn’t enough. There’s no canonical filename (I used feed.rss so nginx would know to serve it with the right MIME type, but it could also be rss.xml or anything you want, really) so browsers and RSS readers don’t know how to find it like they do index.html and favicon.ico.

In the <head> section of my HTML templates, I inserted a new <link> element:

<html>
  <head>
    <link
      rel="alternate"
      href="//myrrlyn.net/blog/feed.rss"
      type="application/rss+xml"
      title="Insufficiently Magical"
    />
  </head>
</html>

This <link> is a universally-recognized element, so your web browser will see it and (at least on desktop) light up the RSS feed icon and let you subscribe to it. This link, //myrrlyn.net/blog/feed.rss, is also the path you would put into a feed reader app in order to get notifications every time I publish something.

On your site, you would change it to be the URL of your RSS file. It can live anywhere (I also have a copy of my main blog’s RSS at //myrrlyn.net/feed.rss and my TES work at //myrrlyn.net/oeuvre/feed.rss), as long as your HTML pages know where it is.

Conclusion

It took me a while to get here, but now I have a free, neutral, personal means of advertising my work to people who are interested to match my free, neutral, personal means of hosting it at all. RSS feeds are a philosophically important alternative to Twitter feeds and reddit aggregations and Facebook timelines, just as personal websites are philosophically important alternatives to profiles on those same services.

It’s harder to set up a website and get it seen and distributed than it is to set up a social-media profile and write posts there, but in my opinion it’s well worth the effort.

And you can always post your own links in your social media. I certainly do and will.

References

I’ll link the W3Schools page, because it’s a concise list of all the RSS tags, but keep in mind that it’s bare-bones and they’re a bad site.

  • W3Schools RSS
  • Mozilla Developer Network
  • RSS Validator – once you’ve uploaded your RSS file to your site, aim this website at it and it will tell you all kinds of little problems, both errors and warnings. It’s how I learned all the details about the element requirements!
  • RSS Specification
  • xul.fr – the first site where I found the <link rel="alternate" ... /> syntax that makes the RSS file useful. Very thankful to them for it.

  1. Wherever possible, use anyone else but them. They’re a bad business and a scam, and the Mozilla docs are better for everything webdev related. Except W3S has better discoverability for their RSS docs than MDN does, so I found them first. Oh well.

  2. The W3S documentation is not at all comprehensive, and it wasn’t until I tried validating my manifests against a compliant service that I found a whole lot of errors about which W3S did not bother to inform me. But it got me to a point where I had something to validate, at least.

  3. For loose definitions of the word easy.