Monday, June 13, 2011

Wikis in science and science education

I'm writing this blog post as an open letter to the faculty of Johns Hopkins, where I'm enrolled in the MS in Bioinformatics program. It's about wikis.

Lately I've stumbled upon a few presentations and websites, from independent directions, which highlight the growing importance of the role of wikis in science. Here are a few of my encounters.

TaxPub, ZooKeys, and Species-ID

At JATS-Con last year, one of the best talks was given by Terry Catapano, on TaxPub. TaxPub is a file format for journal articles. TaxPub is interesting and novel because it allows very domain-specific, structured data to be included in with the main article content. In this case, the domain is taxonomic information -- data related to the names and descriptions of species.

ZooKeys is a new, peer-reviewed, open-access journal from Pensoft. ZooKeys uses the TaxPub format, and it's licensed under the Creative Commons - attribution license (CC-BY). This means that anyone is allowed to copy and repurpose the content, for anything whatsoever, as long as he or she gives credit to the authors.

What's special about ZooKeys is, that at the same time an article is published in the journal, it is also used to produce a wiki page on the Species ID wiki. One of the first articles produced with this new workflow describes the process better than I could. Here is the article in ZooKeys, and here's the corresponding wiki page. This allows for the content to be updated as new information is found. At the same time, referencing the original journal article is always possible.

Note that the Species-ID wiki is powered by MediaWiki, the same software behind Wikipedia. This is important, because if a scientist learns to edit Wikipedia, then he or she could also contribute to Species-ID -- and vice-versa, of course.

Encyclopedia of original research

The next encounter was with a project proposal recently put forward by Daniel Mietchen for a wiki of scientific articles. You can read more about this idea, and the philosophy behinds it, in a couple more of his blog posts here and here.

His idea is to seed the wiki with a large set of journal articles that are either in the public domain, or have been published with a license at least as permissive as CC-BY. PMC (where I work) serves a set of such articles: the Open Access Subset. These articles can be downloaded from our FTP site in their original XML format. The piece that's still missing is a JATS-to-Mediawiki converter, which would be fairly easy to write.

Pfam, Rfam, and Wikipedia

Last week I had the pleasure of attending a seminar given by Rob Finn, hosted by the NCBI Computational Biology Branch. The seminar was about Pfam (on Wikipedia), Rfam (on Wikipedia), and Wikipedia. Pfam is a database of protein families, and Rfam is a database of RNA families.

What is super cool and and unique about these databases is that they both leverage the power of Wikipedia to enhance the value of the results delivered to their users. For example, the page in Pfam for the linker histone (histone H1) contains vast amounts of data (which I don't pretend to understand), which is accessible from the various menu options both across the top and down the left side of the page. But most prominently, on the summary page, they pull out the data from Wikipedia and display it within the Pfam site itself.

Rob Finn explained that, as part of their curation of the protein data, they also help to maintain and control the quality of the relevant Wikipedia entries. So it's a very synergistic relationship. The scientists themselves, and the database curators, collaborate to improve the quality of the Wikipedia content, and then Wikipedia enhances the site.

JHU AAP Online Bioinformatics program

These are all exciting trends, and of course they just scratch the surface of the science-wiki landscape. Now I want to contrast this with my experience so far in the Bioinformatics program at Johns Hopkins. I am sorry to say it, but they are woefully behind. I want to stress that my purpose in writing this post is not to bash Johns Hopkins. I think the program is great, and I have had very good experiences in my classes there so far. I have enormous respect for my advisor, and the courses that I've taken have been well designed. And the bottom line is that I am learning a hell of lot of biology. No, the purpose of this post is to encourage the administrators and faculty there to embrace the idea of preparing future scientists to collaborate in the wonderful medium of wikis. Pointing out current inadequacies is part of that.

Last semester, I proposed to my instructor and my advisor that it would be nice if at least one assignment for each class was to make a substantive contribution to Wikipedia. The assignments I was given were mostly writing assignments on particular topics. We had one big "presentation" assignment due at the end of the semester. Each student was given a different topic to research and describe. But, when the class ended that effort was then locked away in the class' archives, never to be read by anyone again? Isn't that a shame? Here is part of what I wrote to my advisor:
You help to run a Bioinformatics program, [which is about merging the technologies] of biology and computers. Wikis are emerging as a very powerful, rich, and important medium for collaboration and providing scientific results to the public.

So I think the classes in the Bioinformatics program should all encourage students to contribute to Wikipedia. Wouldn't it be nice if at least one assignment from each class were to make a substantive contribution?

My professor from last semester said that she was interested in the idea, and would try it out this semester. I don't know if she did or not -- I still have to follow up with her and ask.

But, of course, this semester I moved on to a different class. And the whole school moved on to a new software platform for their online courses. Perhaps you've heard of it: Blackboard. So far my impression of it is ... ahem ... not good.

There are many problems with Blackboard, but I'll only focus on the problems with the integrated wiki. They are legion. This past week, I just finished participating in our first group assignment, in which we were asked to collaborate on writing the answers to various questions about two research papers. I proposed that we use a wiki for this, and suggested that we could either use the Blackboard's integrated wiki, or we could use a site that lets you create a free wiki in the MediaWiki platform -- Wikia. I pointed out that the main advantage of using Wikia would be that it would introduce everyone to the MediaWiki format -- which could be a springboard into editing Wikipedia.

Not surprisingly, my teammates chose to use the Blackboard wiki. I don't blame them at all -- because on the face of it, it would seem that Wikia is too risky. But, the problems with the Blackboard wiki started to become apparent right away. Here is a list of a few of the problems we encountered:

1. It is impossible to compare two versions of a page, to see what has changed. This is a fundamental feature of a wiki. Especially in a long article, it is absolutely required that an author be able to compare the versions, to see at a glance what has changed, so that he or she can focus just on the changes. To be fair, BB has a very limited (see the next item) ability to do this, but JHU's installation seems to be defective. It is missing a "diff.css" file, which, presumably, adds the all-important highlighting of the changes.

2. The "limited" ability I refered to above is (please be warned that the following might cause your head to explode) that you can only compare revisions corresponding to edits that you yourself have made. In other words, if I want to see my own changes ("my contributions") that's fine -- I can do that. But if I want to see what somebody else changed on a page that I myself authored -- that's impossible.

3. The wiki uses a crap browser-based wysiwyg editor that produces absolutely nightmare-inducing markup. All of my teammates found themselves frustrated by the random bizarre formatting quirks of the editor. You could tweak things to look correct in the edit view, but as soon as you clicked "submit", the formatting would be all screwed up again. It does have a "view - source" view, but if you make changes in that view, to clean things up, and then switch back to wysiwyg view, all your work is lost.

But the biggest problem is -- that it is not MediaWiki. As I mentioned above, if we're going to spend the time to learn to use a wiki, then why not learn something that can really add to our value as scientific collaborators?

My hope is that in my current class, in future group assignments, that the group I'm in, and the others, all decide to use and collaborate with the new wiki I've just set up on Wikia, the Molecular Biology, AAP, Summer 2011 wiki. I expect there might be some resistance because it is "off-site", and not part of the JHU system. If so, those would be unjustified complaints. We're living in a deeply interconnected, collaborative world, now, and more and more businesses are moving their entire infrastructures into the cloud. I think as a general principle, we should make use of the best tools wherever we find them. If this objection is a show-stopper, then the next-best thing would be for JHU to set up an installation of MediaWiki itself. It is very easy to do!

Another recommendation I have is that all the professors who teach courses, be required to obtain a minimum Wikipedia-literacy. They should be required to register their Wikipedia usernames (mine is Klortho) so that anyone can go and see their list of contributions -- and this could become a part of their CV (as it should be). We all rely so heavily on Wikipedia these days, that it is a crime that we don't do more, as individuals and institutions, to support it.

Daniel Mietchen pointed out to me that the journal RNA Biology actually requires a Wikipedia article (search for "A short guide to creating your first Wikipedia article") along with the submission of any new manuscript submissions to "RNA Families Track".

No comments:

Post a Comment

Comments welcome!

If you are new here, and don't have a Google account (or would rather not use it), then please use the "Name/URL" profile (next to "Comment as" below). You con't have to give your real name -- any nickname will do. And you can leave the URL field blank if you want.

If you want to be notified of comment updates, then you can either: use your Google account, and, after you have signed in, click "Subscribe by email"; or subscribe to the comment feed by clicking on "Subscribe to: Post Comments (Atom)" below.