This article was originally posted on my blog (original link). I’m reposting here in the hopes that it helps more people get started with Gatsby!
Migrating a blog to Gatsby
Abstract: Gatsby is a great tool for building a blog. In part 1 I did the more basic task of migrating an existing React site to Gatsby. This time I migrated my blog, which was a lot more involved and required a lot more Gatsby-specific knowledge.
Here’s the gist of what I’m going to cover:
- Preparing an existing blog for migration
- Configuring Gatsby to handle markdown
- Querying your markdown files using GraphQL
- Adding custom data to the generated GraphQL schema
- Turning all your markdown files into static pages
Let’s jump in.
Preparing your existing blog for migration
NOTE: If you don’t already have a blog or want to create one from scratch there’s a tutorial for exactly that right here.
Let’s move some files around. Gatsby gives you a good amount of flexibility when it comes to file structure, but for consistency with the docs I’m going to use the suggested file structure for migrating my blog. How you handle this step will depend on what you’re migrating from. I am migrating from Hexo, which is very similar to Jekyll in how it structures files.
Clean up your source repo
For the first step, clear everything other than your actual post content out of
the repo. For me, this meant everything that wasn’t under the source/
directory (that’s a Hexo convention). One way to do this is to take everything
not relevant to the upcoming Gatsby blog and move it into its own directory that
doesn’t interfere with anything. I chose to create hexo.bak/
where all my old
blog files would live (except for the content).
You could also simply delete everything other than your raw content. It’s up to you. But once you’re done with this cleanup you should have made a decision on where to hold your content, and moved everything else away or removed it.
Here’s what that looks like for me:
For the rest of this post I’ll ignore the hexo.bak/
directory because it’s not
relevant to Gatsby.
Set up Gatsby
You need to copy all the standard Gatsby boilerplate into your directory. There are many ways you could do this but I’ll go over what I did.
To get all the Gatsby files you can use the Gatsby CLI.
However you get Gatsby initialized in your repository root, afterwards you should have a file structure that looks something like this:
Now run the Gatsby dev server to make sure everything works:
NOTE: If you open up package.json
you can see what the develop
script is
doing.
Boom 💥! The default site is up.
Rendering a list of posts
Let’s customize that landing page to render a list of posts. You will also probably want to customize the header and overall layout.
Customizing the layout
This is pretty simple. Just modify the primary layout file that was generated:
You can also customize the styles in src/layouts/index.css
. Stylus, Sass,
Less, etc are also supported if you add the appropriate plugin.
Here’s the list
(there’s a page on the website too, but the source is more up to date).
Sidenote: You can also create your own plugin to do whatever you want, which I talked about in part 1.
Customizing the landing page
Also straightforward, just edit:
This file is where we’ll actually render out the list of posts. So where the hell does that data come from??
Querying data with GraphQL
Now we’re getting in to the meat of Gatsby and one of the areas where it really shines: Data sources. You can pull in data from anywhere to be rendered in your blog, but for our use case the only data source will be the file system (aka the markdown files stored on your hard drive).
But first, let’s check out GraphiQL. It’s an excellent playground for testing out GraphQL queries in any GraphQL project. Gatsby ships with it enabled by default, thank goodness. GraphQL can actually be oddly opaque without this excellent tool.
Visit http://localhost:8000/___graphql
in the browser and you’ll be greeted
with this lovely dev tool:
I recommend getting to know this tool if you’re not already familiar. You will be coming back to this often to find the right query to pull data for your pages.
Querying the file system
If you play around with GraphiQL you’ll notice there’s not that much there. Let’s fix that. We need to teach Gatsby how to query the file system. Luckily this is so common it’s been done for you. Install the file system source plugin:
Now modify gatsby-config.js
to both use the plugin and tell it what directory
to source files from. Add this to the plugins
array:
As you can see on my system I keep all my markdown files under content/_posts/
which is reflected in the path
option for the plugin.
Now restart the dev server and open GraphiQL up again. You should have access to
the allFile
root type. Try running this query:
This will list all the files in the directory you specified to the plugin. You
can query all sorts of information about the files. Just investigate the fields
available to you under node
in GraphQL.
Pro tip: Hit ctrlspace to trigger autocomplete and bring up the list of all available fields.
Handling Markdown
Being able to query files is a big win, and if you have a directory of HTML files this is all you will need. But if you want to render markdown files as HTML you will need another plugin. Let’s add that now:
As before, add it to the plugins
field in gatsby-config.js
:
This particular plugin can also take its own plugins via the plugins
option.
I’ve left it empty but this is where you can add things like syntax highlighting
or auto-linking of headers. Here’s the current list:
https://www.npmjs.com/search?q=gatsby-remark
Save and restart your dev server, then go into GraphiQL and try out the new
allMarkdownRemark
field:
This query gives you the full HTML for all your markdown files. If you are using
frontmatter you can also access that here. I’m assuming you have a title
field
in your frontmatter:
Now you have access to the full HTML of your posts as well as the titles. With this we have enough information to render a list of posts on the front page.
Getting GraphQL data into your components
Gatsby has the concept of the pageQuery
. For every page you create you can
specify a pageQuery
that will pass data into the default export of that page.
This is a simplified example, but there are a few things going on that might not be intuitive.
- In the render method we first check for errors, and return early if any are found
- If no error are found we render a link for each item in the array:
this.props.data.allMarkdownRemark.edges
- We export a
pageQuery
variable that is constructed using the magicgraphql
global
The error handling is pretty straightforward, if a bit verbose, as long as you
know what graphql responses look like. In case you didn’t know, if you get an
error in a graphql query the response will contain the errors
array. We check
for this array and handle it accordingly.
Now let’s looks specifically at where we render a link for each blog post:
Notice that the data shape is exactly what we specified in the GraphQL query. This may seem like a lot of nesting just to get at an array of data, but GraphQL emphasizes clarity over conciseness. You’ll notice that if you run your GraphQL query in GraphiQL the data will have the exact shape described above.
And that brings us finally to the page query:
This is how you get data from Gatsby into your react components. Make sure you
don’t misspell pageQuery
otherwise you won’t get what you want.
Also note that graphql
is just some magic global variable. Your linter will
probably complain about it being undefined and you will just have to ignore it.
Personally I think it would be more clear if graphql
was imported from Gatsby,
but the project is still young so the API could change at some point ¯\( ツ
)/¯
Linking to blog posts
But the links don’t link anywhere… where’s that
href
?
Let’s remedy that. Import the Link
component and swap it for the simple
<a>
tag that was in there before:
But what does it link to? What is the URL of each blog post?
That’s an open question because it depends on your data and how you structured it before. For example, if you included the intended URL in the frontmatter of each post it’s a simple matter of updating your query to include that:
Many existing Gatsby examples use path
within each markdown file’s frontmatter
to designate the url. For example:
In this case node.frontmatter.path
would be used to construct URLs. If this is
the case for you then you’re probably all set for the index page.
But what if the URL for each post is NOT in the frontmatter?
This was exactly my situation. The URL was actually derived from the title of the post so I had to figure out how to augment the GraphQL fields with my own data. Namely the URL of the post derived from the post title.
Adding custom data to the GraphQL schema
If I have a post named “Isn’t this a fun title” then I want the URL to be “isnt-this-a-fun-title”. Notice that spaces turn into hyphens and special characters are removed. This is simple enough to do in JavaScript, but it felt wrong to do it on the fly when rendering components. This is data so I wanted to be able to query it through GraphQL.
Enter setFieldsOnGraphQLNodeType
.
Aside: Gatsby is super extensible. It’s the primary reason I switched from Hexo which worked well enough for my use case.
In order to extend this particular part of Gatsby you need to create a
gatsby-node.js
file. This file let’s you work with all of Gatsby’s plugin
hooks that are run in node. The GraphQL server is run in node, so this is where
we add custom fields. Example:
Source code for gatsby-node.js here.
If you’ve worked with GraphQL before this should look very familiar. In fact, as you can see the string type is imported directly from GraphQL and not from Gatsby.
You check the type of node and if it’s a type youʼre interested in you
resolve with some fields. Fields in GraphQL require a type
and a way to
resolve
the value.
I’ve omitted the implementation of getURL
here, but you can see the
source code here
(NOTE: in the source it’s called getSlug
instead of getURL
).
You can use this technique to add any field you want to your GraphQL schema. Now
you should be all set to render Link
components that actually point somewhere
interesting 👍.
Generating pages from markdown files
This is where it all comes together. If you finished the last section you would
have ended up with a bunch of links that point to the correct URL but when you
tried visiting the URL there was nothing there 😕. This is because Gatsby hasn’t
yet generated an additional pages. It’s still just rendering whatever is in your
src/pages/
directory.
By default, Gatsby will create a static HTML page for everything under
src/pages/
. At this point we’ve discussed src/pages/index.js
extensively. It
will be the index.html
page of your site, and thus your landing page.
For any stand-alone pages, simply create a corresponding JavaScript file in the pages/
directory and you are good to go. For example, src/pages/about.js
would
generate an about.html
page. Simple.
But almost everyone will want to generate some pages based on data, not on the files in the pages directory. Gatsby let’s us do this.
Generating custom pages
The key here is again to hook in to one of Gatsby’s many plugin hooks. In this
case, createPages
. In the same gatsby-node.js
file as before:
At the most basic level this method of page creation is quite simple: Grab the
createPage
function from the API and call it with some props.
path
is required. This is the path that your page will have as a generated HTML file. It’s the URL of your final page.component
is also required. It’s the file containing the react component that will be used to render this particular page.context
is optional but I’ve included it here because it will be important soon. This lets you pass data down to the react component specified in thecomponent
option as well as thepageQuery
(if any).
The API is actually pretty simple: To generate a new page call createPage
with
some props. So in pseudo code:
I’ve included the pseudo code to highlight the fact that nothing too magical is going on here. We just need to call create page for every post we want to create. The implementation is a bit more verbose, but that’s still all it’s doing.
So in order to make this work we also need to be able to query GraphQL just like
we do in the page query. Gatsby let’s us do exactly that by giving us access to
the graphql
object and letting us return a promise so that we can do async
work.
Notice that the query is very similar to the pageQuery
in index.js but it’s
not identical. This time we actually want the id
because it will allow the
post template to use the ID to query one single blog post.
Rendering individual posts
If you’ve made it to this point rendering individual posts is quite straightforward. You need to:
- Create the
postTemplate
file referenced increatePages
above - Export your template component as the default export
- Add a
pageQuery
that will fetch the blog post to render
Here it is in all it’s glory:
If you’re not used to GraphQL syntax the pageQuery
might be a little
intimidating, but it’s all standard GraphQL so if you take the time to learn
GraphQL on its own you will be able to use that knowledge here. I.e. it is not
Gatsby-specific.
The important thing to note here is that $id
is passed in via context
in
gatsby-node.js
. That’s how the post data and processed HTML string make their
way into props. Then it’s just a matter of rendering as you would with any other
component.
Where to go from here
There’s a lot more you can do with Gatsby and it’s not always obvious how to proceed, but you have the full power of JavaScript at your disposal. So as long as you don’t mind reading a bit of source code to figure out how something works there’s no limit to what you can implement.
Here are some ideas:
- Add previous and next buttons to each post
- Create a remark plugin to add custom block types
- Aggregate tags from your frontmatter and generate pages for all posts of a specific tag
Some of these—such as pagination—are implemented on my blog. You can find the source code here:
Closing thoughts
In my opinion Gatsby provides a few killer features:
- Extensible through a powerful plugin API.
- Supports arbitrary data sources that can be easily queried using GraphQL.
- Splits your code automatically so you don’t have to worry about bundle size increasing as a function of the number of pages you render.
It’s not a perfect project (looking at you global graphql
object) and it’s
still under heavy development, so you may run in to bugs, but in my view the
pros heavily outweigh the cons. It’s a best-in-class static site generator and
well worth the adoption time if you want to customize your blog.
If anything was unclear or you have more questions feel free to ask me on Twitter.