Distill for R Markdown: Blog Post Workflow

Overview

Distill supports a variety of workflows for blog post authoring. For a blog authored by a single individual, the following simple conventions are typically all that’s required:

Author post R Markdown documents within subdirectories of the _posts directory.
Mark posts under development as drafts until they are ready to be published.

For more sophisticated requirements, the following workflows are also supported:

Authoring posts within their own standalone Git repositories.
Importing posts staged elsewhere on the web (e.g. published to RPubs, located in a Git repository, or published within another Distill blog).

Distill also includes tools for verifying that imported posts are available under an appropriate Creative Commons license and ensuring that posts syndicated from one blog to another give credit within the Google search index to the originating blog. These tools are described below in the importing posts section.

Managing drafts

There are two distinct ways to manage work in progress on a Distill post:

Using the draft option to prevent the article from being listed in the article index and in the sitemap; and
Working on the post within a Git branch and merging/publishing it via pull request.

Draft option

If you want to work on a post for a period of time without having it be added to the listing page, add draft: true to the post’s metadata. For example:

---
title: "The Sharpe Ratio"
description: |
  In this post we present a classic finance use case using the
  PerformanceAnalytics, quantmod, and dygraphs packages. 
  We'll demonstrate importing stock data, building a portfolio,
  and then calculating the Sharpe Ratio. 
draft: true
---

When you are ready to publish the post, either remove the draft option or set it to false, then re-Knit the post or build the website using render_site().

Git branches

If the source code for you website is managed within a Git repository then another way to organize work on a new post is to create a Git branch.

Once you’ve done enough work on the post within your branch, you can create a pull request and then ask others for feedback on the post before it is merged.

This workflow has some nice benefits:

You can use the commenting and feedback system of your Git host (e.g. GitHub) to discuss the post in a structured fashion with clear linkages back to the source code. Depending on the service used to publish your website you might get a preview associated with the PR (e.g. with Netlify and GitHub).
You can manage contributions to your blog from others that don’t have commit access to the blog (anyone can create a pull request for a public repository).
You can squash and merge to have a clean commit history on master.

The mechanics of creating branches and pull requests are beyond the scope of this article. Follow the links above or reference other online resources to learn more.

Renaming posts

Posts are stored within the _posts sub-directory of your site, and have a directory name that reflects the date that you created the post along with the post’s title slug. For example:

_posts/2016-11-08-sharpe-ratio

Note that the date prefix is not strictly required, but is done by default as a convenience so that posts appear in chronological order within the filesystem.

If you work on a post over the course of a few days and/or if you change your post’s title after you begin working on it, you may want to rename the post directory. You can use the rename_post_dir() function to update the date and/or title slug reflected in the directory name. For example:

# rename to reflect the title and date in the post YAML front-matter
rename_post_dir("_posts/2016-11-08-sharpe-ratio")

You can also specify an explicit slug and/or date_prefix if you prefer not to use values derived from the post’s YAML.

Note that you should be sure to rebuild your site after renaming a post so that it’s updated URL is reflected in the index page and RSS feed.

Importing posts

You may prefer a workflow where posts are worked on separately from your website/blog and then imported when they are ready to be published. This might be the case for a few different reasons:

You want posts to be contained within their own Git repositories
It’s more convenient for contributors to work on posts independently and then be incorporated into your blog. Example in the wild: RStudio AI blog.
You want to re-publish posts originally published on another blog.

Below we will cover how to create a standalone post, how to import posts, synchronize to subsequent updates of those posts, as well as address related copyright and search indexing concerns.

Creating a post

You want to have a public GitHub repository that contains the html output of a distill post. Here’s a possible workflow using RStudio IDE and usethis, and not taking into account your using private data (that’d you need to add to .gitignore)

Create a new RStudio project,
In that project from the new file menu choose “New file”, “R Markdown”,“From template”: distill article;
usethis::use_git();
usethis::use_github();
Edit the Rmd, knit, commit.

Note that with this workflow your name won’t be guessed based on previous posts.

Importing a post

You can import a post using the import_post() function, passing the URL where the post is published to. For example:

import_post("https://rpubs.com/jjallaire/visualizing-asset-returns")
import_post("https://example.com/visualizing-asset-returns.html")

You can also import a post from a GitHub repository. For example:

import_post("https://github.com/jjallaire/distill-article")

Note that importing a post does not require the original R Markdown document used to author the post—you only need access to the published HTML of a post to import it.

Post dates

When you import a post, the date for the post will be automatically set to the current day (the idea being that whenever the post was authored, the day you import it is the day it’s been published to your blog). You can modify this behavior using the date parameter. For example:

import_post("https://github.com/jjallaire/distill-article", 
            date = as.Date("2017-07-12"))

Post slugs

The “slug” for a post determines the URL for the post within your blog. By default, slugs are automatically computed using the date and title of the post. For example, the slug for the example just above would have been 2017-07-12-distill-article.

You can also override the automatically generated slug entirely using the slug parameter. For example:

import_post("https://github.com/jjallaire/distill-article", 
            slug = "an-article-about-distill")

Updating posts

If a post that you have imported is subsequently modified and you want to synchronize to the changes, you can use the update_post() post function. For example:

update_post("_posts/2018-07-09-distill-article")

You can also just use the post slug (rather than including the _posts directory prefix):

update_post("2018-07-09-distill-article")

Creative Commons

The Distill tools for importing posts make it very easy to aggregate posts published elsewhere on the web into your blog. Note however that you need to ensure that you have appropriate permission to re-publish these posts!

To facilitate this, Distill will scan any imported post for Creative Commons copyright metadata, and in the case that none is found confirm that you still want to import the post. Consequently, you should ask that authors writing posts for your blog always include the creative_commons field in their post metadata. For example:

---
title: "Distill for R Markdown"
description: | 
  Scientific and technical writing, native to the web
output: distill::distill_article
creative_commons: CC BY
---

Learn more about the creative_commons field in the documentation on post medatata.

If you know that you have permission to import and republish a post you can suppress this prompt by passing check_license = FALSE to the import_post() function.

Canonical URLs

When you specify a creative_commons license for a blog post you make it easier for others to re-publish your post, giving you a broader audience for your work. In this case you may also want to ensure that the original URL where your post appeared be the one that appears in search engine results.

To facilitate this, Distill automatically specifies a canonical url for posts published within a blog. That way, when a post is imported into another blog, search indexes still point back to the original article.

If you prefer to disable this behavior, you can add canonical: false to your blog’s configuration in _site.yml. For example:

name: "reproducible-finance-with-r"
title: "Reproducible Finance with R"
description: |
  Exploring reproducible finance with the R statistical 
  computing environment.
base_url: https://beta.rstudioconnect.com/content/11424/
collections:
  posts:
    canonical: false

While canonical URLs are provided automatically for blog posts, they aren’t for standalone articles. You can explicitly provide a canonical URL for a standalone article using the canonical_url metadata field:

---
title: "Distill for R Markdown"
description: | 
  Scientific and technical writing, native to the web
date: May 4, 2018
author:
  - name: Nora Jones 
    url: https://example.com/norajones
canonical_url: https://rstudio.github.io/distill
---

Blog Post Workflow