I'm writing this blog post as an open letter to the faculty of Johns Hopkins, where I'm enrolled in the
MS in Bioinformatics program. It's about wikis.
Lately I've stumbled upon a few presentations and websites, from independent directions, which highlight the growing importance of the role of wikis in science. Here are a few of my encounters.
TaxPub, ZooKeys, and Species-ID
At
JATS-Con last year, one of the best talks was given by Terry Catapano, on
TaxPub. TaxPub is a file format for journal articles. TaxPub is interesting and novel because it allows very domain-specific, structured data to be included in with the main article content. In this case, the domain is taxonomic information -- data related to the names and descriptions of species.
ZooKeys is a new, peer-reviewed, open-access journal from
Pensoft. ZooKeys uses the TaxPub format, and it's licensed under the
Creative Commons - attribution license (CC-BY). This means that anyone is allowed to copy and repurpose the content, for anything whatsoever, as long as he or she gives credit to the authors.
What's special about ZooKeys is, that at the same time an article is published in the journal, it is also used to produce a wiki page on the
Species ID wiki. One of the first articles produced with this new workflow describes the process better than I could. Here is the
article in ZooKeys, and here's the
corresponding wiki page. This allows for the content to be updated as new information is found. At the same time, referencing the original journal article is always possible.
Note that the Species-ID wiki is powered by
MediaWiki, the same software behind Wikipedia. This is important, because if a scientist learns to edit Wikipedia, then he or she could also contribute to Species-ID -- and vice-versa, of course.
Encyclopedia of original research
The next encounter was with a
project proposal recently put forward by
Daniel Mietchen for a wiki of scientific articles. You can read more about this idea, and the philosophy behinds it, in a couple more of his blog posts
here and
here.
His idea is to seed the wiki with a large set of journal articles that are either in the public domain, or have been published with a license at least as permissive as
CC-BY.
PMC (where I work) serves a set of such articles: the
Open Access Subset. These articles can be downloaded from our FTP site in their original XML format. The piece that's still missing is a JATS-to-Mediawiki converter, which would be fairly easy to write.
Pfam, Rfam, and Wikipedia
Last week I had the pleasure of attending
a seminar given by Rob Finn, hosted by the
NCBI Computational Biology Branch. The seminar was about
Pfam (on
Wikipedia),
Rfam (on
Wikipedia), and Wikipedia. Pfam is a database of protein families, and Rfam is a database of RNA families.
What is super cool and and unique about these databases is that they both leverage the power of Wikipedia to enhance the value of the results delivered to their users. For example, the page in Pfam for the
linker histone (histone H1) contains vast amounts of data (which I don't pretend to understand), which is accessible from the various menu options both across the top and down the left side of the page. But most prominently, on the summary page, they pull out the data from Wikipedia and display it within the Pfam site itself.
Rob Finn explained that, as part of their curation of the protein data, they also help to maintain and control the quality of the relevant Wikipedia entries. So it's a very synergistic relationship. The scientists themselves, and the database curators, collaborate to improve the quality of the Wikipedia content, and then Wikipedia enhances the site.
JHU AAP Online Bioinformatics program
These are all exciting trends, and of course they just scratch the surface of the science-wiki landscape. Now I want to contrast this with my experience so far in the
Bioinformatics program at Johns Hopkins. I am sorry to say it, but they are woefully behind. I want to stress that my purpose in writing this post is not to bash Johns Hopkins. I think the program is great, and I have had very good experiences in my classes there so far. I have enormous respect for my advisor, and the courses that I've taken have been well designed. And the bottom line is that I am learning a hell of lot of biology. No, the purpose of this post is to encourage the administrators and faculty there to embrace the idea of preparing future scientists to collaborate in the wonderful medium of wikis. Pointing out current inadequacies is part of that.
Last semester, I proposed to my instructor and my advisor that it would be nice if at least one assignment for each class was to make a substantive contribution to Wikipedia. The assignments I was given were mostly writing assignments on particular topics. We had one big "presentation" assignment due at the end of the semester. Each student was given a different topic to research and describe. But, when the class ended that effort was then locked away in the class' archives, never to be read by anyone again? Isn't that a shame? Here is part of what I wrote to my advisor:
You help to run a Bioinformatics program, [which is about merging the technologies] of biology and computers. Wikis are emerging as a very powerful, rich, and important medium for collaboration and providing scientific results to the public.
So I think the classes in the Bioinformatics program should all encourage students to contribute to Wikipedia. Wouldn't it be nice if at least one assignment from each class were to make a substantive contribution?
My professor from last semester said that she was interested in the idea, and would try it out this semester. I don't know if she did or not -- I still have to follow up with her and ask.
But, of course, this semester I moved on to a different class. And the whole school moved on to a new software platform for their online courses. Perhaps you've heard of it:
Blackboard. So far my impression of it is ... ahem ... not good.
There are many problems with Blackboard, but I'll only focus on the problems with the integrated wiki. They are legion. This past week, I just finished participating in our first group assignment, in which we were asked to collaborate on writing the answers to various questions about two research papers. I proposed that we use a wiki for this, and suggested that we could either use the Blackboard's integrated wiki, or we could use a site that lets you create a free wiki in the MediaWiki platform --
Wikia. I pointed out that the main advantage of using Wikia would be that it would introduce everyone to the MediaWiki format -- which could be a springboard into editing Wikipedia.
Not surprisingly, my teammates chose to use the Blackboard wiki. I don't blame them at all -- because on the face of it, it would seem that Wikia is too risky. But, the problems with the Blackboard wiki started to become apparent right away. Here is a list of a few of the problems we encountered:
1. It is impossible to compare two versions of a page, to see what has changed. This is a
fundamental feature of a wiki. Especially in a long article, it is absolutely required that an author be able to compare the versions, to see at a glance what has changed, so that he or she can focus just on the changes. To be fair, BB has a very limited (see the next item) ability to do this, but JHU's installation seems to be defective. It is missing a "diff.css" file, which, presumably, adds the all-important highlighting of the changes.
2. The "limited" ability I refered to above is (please be warned that the following might cause your head to explode) that you can only compare revisions corresponding to edits that
you yourself have made. In other words, if I want to see my own changes ("my contributions") that's fine -- I can do that. But if I want to see what somebody else changed on a page that I myself authored -- that's impossible.
3. The wiki uses a crap browser-based wysiwyg editor that produces absolutely nightmare-inducing markup. All of my teammates found themselves frustrated by the random bizarre formatting quirks of the editor. You could tweak things to look correct in the edit view, but as soon as you clicked "submit", the formatting would be all screwed up again. It does have a "view - source" view, but if you make changes in that view, to clean things up, and then switch back to wysiwyg view, all your work is lost.
But the biggest problem is -- that it is not MediaWiki. As I mentioned above, if we're going to spend the time to learn to use a wiki, then why not learn something that can really add to our value as scientific collaborators?
My hope is that in my current class, in future group assignments, that the group I'm in, and the others, all decide to use and collaborate with the new wiki I've just set up on Wikia, the
Molecular Biology, AAP, Summer 2011 wiki. I expect there might be some resistance because it is "off-site", and not part of the JHU system. If so, those would be unjustified complaints. We're living in a deeply interconnected, collaborative world, now, and more and more businesses are moving their entire infrastructures into the cloud. I think as a general principle, we should make use of the best tools wherever we find them. If this objection is a show-stopper, then the next-best thing would be for JHU to set up an installation of MediaWiki itself. It is very easy to do!
Another recommendation I have is that all the professors who teach courses, be required to obtain a minimum Wikipedia-literacy. They should be required to register their Wikipedia usernames (mine is
Klortho) so that anyone can go and see their
list of contributions -- and this could become a part of their CV (as it should be). We all
rely so heavily on Wikipedia these days, that it is a crime that we don't do more, as individuals and institutions, to support it.
UPDATE:
Daniel Mietchen pointed out to me that the journal
RNA Biology actually
requires a Wikipedia article (search for "A short guide to creating your first Wikipedia article") along with the submission of any new manuscript submissions to "RNA Families Track".