Sunday, July 8, 2012

The JATS-To-Mediawiki Project

Lately I've been spending a bit of time working on a project that was started by Daniel Mietchen, to develop a tool that will convert JATS documents (like the open-access biomedical journal articles on PubMed Central) into Mediawiki format, so that they can be uploaded onto a wiki. This is just one sub-task of his Encyclopedia of Original Research (EOR) project.

The essence of the EOR is distilled, I think, by a quote from John Wibanks from a post in 2009 , "Science is already a wiki if you look at it a certain way. It’s just a really, really inefficient one". He was referring to the way that science progresses through a massive collaborative effort, where researchers amend, revise, and extend the work of those who came before them. So the EOR, which is based on Mediawiki (the same wiki software that runs Wikipedia) is designed to make it much more efficient. Note that Wikipedia itself would be inappropriate for this system, because of their "No original research" policy.

You can read more about the rationale and scope of the EOR and JATS-To-Mediawiki in a paper proposal that we wrote for JATSCon 2012.

The JATS-To-Mediawiki project is hosted on Github, and is still in a pre-alpha stage. Jeremy Morse and Konrad Foersner are collaborators, and Jeremy has done almost all of the work on the XSLT stylesheet, which is the heart of the project.

Here are some step-by-step instructions for trying it out. It's assumed you have access to a Unix box, and a Mediawiki installation somewhere that you can use for testing. If you want to use my test wiki installation at chrisbaloney.com, go ahead.

First, download and extract the Github project files, and set an environment variable to point to that directory

  wget --no-check-certificate \
    https://github.com/konrad/JATS-to-Mediawiki/zipball/master
  unzip master   # this creates konrad-JATS-to-Mediawiki-...
  export JTM=`pwd`/konrad-JATS-to-Mediawiki-...

Download the sample files, using the fetch-samples.sh script

  $JTM/scripts/fetch-samples.sh

This creates a directory called "samples", and downloads seven articles from the PMC Open Access Subset. The XML files are the only ones that we are interested in (so far). They all have an ".nxml" extension, and you can find them easily:

  $ find -name '*.nxml'
  ./samples/PMC1762412/pone.0000133.nxml
  ./samples/PMC2270912/pone.0001908.nxml
  ./samples/PMC2467486/pone.0002804.nxml
  ./samples/PMC3003633/1475-2859-9-89.nxml
  ./samples/PMC3040697/1741-7015-9-17.nxml
  ./samples/PMC3192425/ZooKeys-119-037.nxml
  ./samples/PMC3231133/sensors-10-06861.nxml

Pick any one you want, or download your own sample, and convert it with something like:

  cd samples
  xsltproc --novalid $JTM/jats-to-mediawiki.xsl PMC1762412/pone.0000133.nxml \
    > PMC1762412.mw.xml

Next, in your browser, go to the import page of your test wiki; for example, http://chrisbaloney.com/wiki/index.php/Special:Import. Click "Choose file", and select the ...mw.xml file you just created, and then click "Upload file". That should return with a link to the new wiki page; in this case, The Sound Generated by Mid-Ocean Ridge Black Smoker Hydrothermal Vents.

As I mentioned, the project is still at an early stage, and there are still many things to do.

1 comment:

  1. Hi,
    I am fresher to MediaWiki, in a learning stage..
    definitely i try JATS-To-MediaWiki of your step by step instructions.. :) MediaWiki Tutorial For Beginners

    ReplyDelete

Comments welcome!

If you are new here, and don't have a Google account (or would rather not use it), then please use the "Name/URL" profile (next to "Comment as" below). You con't have to give your real name -- any nickname will do. And you can leave the URL field blank if you want.

If you want to be notified of comment updates, then you can either: use your Google account, and, after you have signed in, click "Subscribe by email"; or subscribe to the comment feed by clicking on "Subscribe to: Post Comments (Atom)" below.