13 February 2015

Introduction

Today, I migrated all the posts from highonphp.com to here. Since Jekyll is all about the static files, I had to get a way to export the posts from Wordpress, and convert them to flat files.

The exporting was the easy part. All you have to do in Wordpress is click the Tools -> Export link, and choose All Posts. When you click Export, your browser will download an .xml file. This file has pretty simple schema which can be used to import it into another Wordpress instance, or in our case, to Jekyll.

Starting the Migration

To accomplish this, I wanted to make a really simple Ruby script, and while at that, I wanted to play around with Nokogiri some more. Here are the gem's needed for the conversion to work smoothly.

source 'https://rubygems.org'
gem 'nokogiri'
gem 'json'
gem 'reverse_markdown'
view raw Gemfile hosted with ❤ by GitHub

The following script is very simple. You pass it a command line argument, the xml file name, and let it run.

require 'reverse_markdown'
require 'nokogiri'
require 'date'
require 'json'
# Make sure a file was supplied
if ARGV[0].nil?
puts "No Export File Given"
exit
end
# Create a slug method
def slug(string)
string.downcase.strip.gsub(' ', '-').gsub(/[^\w-]/, '')
end
def to_jekyll(p)
post = []
post.push("---")
post.push("layout: post")
post.push("permalink: #{p[:permalink]}")
post.push("title: \"#{p[:title]}\"")
post.push("category: #{p[:category]}")
post.push("tags: #{p[:tags].join(" ")}")
post.push("---")
post.push(p[:content])
post.join("\n")
end
# Open the XML file
doc = Nokogiri::XML(File.open(ARGV[0]))
# Get all Items
doc.xpath('//channel/item').each do |post|
p = {}
p[:title] = post.xpath('title').text
p[:slug] = slug(p[:title])
date = Date.parse(post.xpath('wp:post_date').text).strftime("%F")
if date.eql? "-0001-11-30"
date = "2011-11-01"
end
p[:date] = date
p[:permalink] = post.xpath('link').text.gsub(/https?\:\/\/\w+.*?\//, '/')
p[:content] = ReverseMarkdown.convert post.xpath('content:encoded').text
#Nokogiri::XML.fragment().text
p[:tags] = post.xpath('category[@domain="post_tag"]').map{|tag| tag.attr('nicename')}
p[:category] = post.xpath('category[@domain="category"]').map{|category| category.attr('nicename')}
p[:status] = post.xpath('wp:status').text
if p[:status].eql? "publish"
type = "posts"
else
type = "drafts"
end
filename = "_#{type}/#{p[:date]}-#{p[:slug]}.md"
puts "Creating \"#{filename}\""
File.open(filename, 'w') { |file| file.write( to_jekyll( p ) ) }
end
view raw convert.rb hosted with ❤ by GitHub

Each Wordpress post is stored in a <item /> schema block. Within the item blocks, all the post attributes can be found, such as tags, categories, status, content, etc. We assign the important ones to a hash, and use the hash to generate the post file.

For the post content, I used a Wordpress Markdown plugin for writing all my posts. Unfortunately on the back-end, this was converted back to HTML. To get it back into Markdown, I used Reverse_Markdown which did a great job.

To generate the filenames, the script will pull the publish date of the post, and convert it to a Jekyll timestamp (YYYY-MM-DD), and then create a slug from the title. When you also take into account we have the current post status (publish, draft, etc), we can easily drop the new Jekyll post into either _posts or _drafts.

Permalinking and Nginx

One last thing to note is the permalinking. I wanted to make sure all of the http://www.highonphp.com/links directed to http://www.mikemackintosh.com/links without an issue. Wordpress included a permalink in the export which we can then toss into the Front-Matter of the Jekyll post, to keep the pages uniform. From the server side, I had to redirect the old domain to the new one, with the URI query string intact. You can easily do that with nginx:

server {
server_name www.highonphp.com www.highonphp.net www.splug.io www.highonruby.com www.highonphp.net www.bakeryphp.com www.bakeryframework.com highonphp.com highonphp.net splug.io highonruby.com highonphp.net bakeryphp.com bakeryframework.com *.highonphp.com *.highonphp.net *.splug.io *.highonruby.com *.highonphp.net *.bakeryphp.com *.bakeryframework.com;
location / {
rewrite ^ http://www.mikemackintosh.com$request_uri? permanent;
}
}
view raw nginx.conf hosted with ❤ by GitHub

Instead of installing a plugin in Wordpress, we get to convert the posts on our own terms.

Tagged under wordpress, jekyll, convert, xml, export, nokogiri, ruby, and others
Mike Mackintosh

This post was written by Mike Mackintosh, a decorated security professional.




Related Posts