Friday, May 4, 2012

Fixing Wikipedia links to GeneReviews

Yesterday I was finishing up the final bit of homework for a course I'm taking, and came across a problem with a Wikipedia page. I think I fixed it, but it was not particularly easy, and I'd like to write about it here.

The problem was with one of the Wikipedia infobox templates, and the way that it generates links to the NCBI GeneReviews site.

The immediate problem that I discovered was on the page Hyperkalemic periodic paralysis. On that page, there is an infobox on the right that has a link to GeneReviews, but the link is broken. It goes to

http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=gene&part=NBK1496

This URL unfortunately mixes an old-style NCBI bookshelf URL with a new-format ID number. The correct URL is any of the following.

http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=gene&part=hyper-pp
http://www.ncbi.nlm.nih.gov/books/n/gene/hyper-pp/
http://www.ncbi.nlm.nih.gov/books/NBK1496/

This Wikipedia page uses a template to generate this box -- one of many infobox templates. This infobox in particular uses the Template:Infobox_disease. From the documentation, you can see that this template provides fields that editors of Wikipedia pages can easily fill in. When the page is rendered, the template code gets executed to generate the nice tabular display, with links, from the values in those fields.

This particular infobox has an accompanying documentation page, test cases page, and a sandbox. The template itself is protected (can't be edited by the average user) because it is used on hundreds or thousands of pages. So, the sandbox provides a place where someone (me) can try out changes, before submitting an "edit request" to one of the admins.

The infobox template code is pretty intimidating-looking, but I stared at it for long enough, that it eventually started to make sense to me.

The immediate problem stems from the fact that there are two types of IDs to refer to these GeneReviews resources: one "human readable" (e.g. "hyper-pp") and the other alphanumeric ("NBK1496"). The problem that I discovered was due to the fact that someone entered the alphanumeric ID, when she should have entered the human readable one. In other words, the template fields were given like this:

GeneReviewsID = NBK1496 |
GeneReviewsName = Hyperkalemic Periodic Paralysis Type 1 |

but they should have been given like this:

GeneReviewsID = hyper-pp |
GeneReviewsName = Hyperkalemic Periodic Paralysis Type 1 |

I'm sure that there are lots of places where this template is used correctly, with the human-readable ID, and it wouldn't be possible for me to find all of them and change the human-readable ID values to alphanumeric. So it's important that the human-readable ID values continue to work.

On the other hand, when we redesigned the NCBI bookshelf, we made it very hard for anyone to find this human-readable ID, and so it is understandable that the editor of this page used the NBK number. The NBK number is really the only visible ID. So it's important that these new ID numbers should be able to be used.

Therefore, the only way to fix this problem that I could think of was to define a new field, which I called GeneReviewsNBK, and to "deprecate" the old one. The old one should continue to be supported, and should generate correct URLs, but people should be encouraged to use the new NBK number going forward.

The logic to generate the new links for this infobox item turned out to be a bit complicated. It does different things depending on whether the old-style GeneReviewsID was given or the new-style GeneReviewsNBK, and the link text also depends on whether or not the label GeneReviewsName was given. Here it is:

| label10     = [[GeneReviews]]
| data10      =
  {{#if: {{{GeneReviewsNBK|}}}
    | {{#if: {{{GeneReviewsName|}}}
        | [http://www.ncbi.nlm.nih.gov/books/{{{GeneReviewsNBK}}}/ 
{{{GeneReviewsName}}}] | [http://www.ncbi.nlm.nih.gov/books/{{{GeneReviewsNBK}}}/
{{{GeneReviewsNBK}}}] }} | {{#if: {{{GeneReviewsID|}}} | {{#if: {{{GeneReviewsName|}}} | [http://www.ncbi.nlm.nih.gov/books/n/gene/{{{GeneReviewsID}}}/ {{{GeneReviewsName}}}] | [http://www.ncbi.nlm.nih.gov/books/n/gene/{{{GeneReviewsID}}}/ {{{GeneReviewsID}}}] }} }} }}

The syntax for this is described in the MediaWiki Help:Extension:ParserFunctions page.

I edited the sandbox with this, and created a test page in my user area to try it out, and finally was able to get it to work. Then I:

Assuming my edit request gets implemented, then, I'll go back and fix the original problem page with the new NBK number.

2 comments:

  1. Update: My changes were accepted as-written -- fast, too! http://en.wikipedia.org/w/index.php?title=Template%3AInfobox_disease&diff=490767171&oldid=444761945

    ReplyDelete
  2. Nice work, Chris! Z. must be sleeping well for you to have the brain cells needed to do this!

    ReplyDelete

Comments welcome!

If you are new here, and don't have a Google account (or would rather not use it), then please use the "Name/URL" profile (next to "Comment as" below). You con't have to give your real name -- any nickname will do. And you can leave the URL field blank if you want.

If you want to be notified of comment updates, then you can either: use your Google account, and, after you have signed in, click "Subscribe by email"; or subscribe to the comment feed by clicking on "Subscribe to: Post Comments (Atom)" below.