bims: tutorial

tutorial

About these notes

These notes are a lengthy tutorial for selectors. For the impatient folks, we have the instructions.

Composing issues involves a system called ernad. These notes talk about how to use it.

Basic concepts

report: A report is an array of documents on a particular topic, as determined by the work of a selector. Each report has an identifier that starts with “bims-” and that has six or letters or numbers that follow.
issue: An issue is a set new documents added to PubMed. New issues appear each week. Each issue is identified by the date on which it becomes available. Each issue contains all new additions to PubMed during the week prior to release that have been published less than two year ago.

Bims Issues

Bims releases issues on Sundays after midnight UTC. When a new issue is ready, you get an alerting mail from ernad@biomed.news. Kindly whitelist this address. In that way you avoid these emails landing in your spam folder.

Any particular issue may be delayed by technical problems. But the delay should only be for a few hours. If you do not see an alerting mail by Monday contact Thomas immediately. This is likely an email-delivery issue that we will have to take seriously.

Login

You login at http://ernad.biomed.news. You use your report code as the login. If you do not have a report code and password contact us.

Issue selection

After login, you may see a message that you do not have an issue waiting. Most likely, this is because an issue is not yet done. If we create a report for you during the week, it will not have an issue until the Sunday that follows. We do not release past issues to starting selectors.

Generally, it is a bad idea to have more than one issue pending. Assume you have two issues pending, an old one, and a new one. Then the presorting of the new issue will only have the information available before the old issue. For the time being, when you finish working on the old issue, the system will not update the new issue. That is when you are done with the old issue, it will not update the learning using the additional data available from the old issue.

Paper selection

Paper selection is your only mandatory task. Ernad will help you.

The paper selection screens take a tablular form. There are two cells in each row. The left cell is the paper cell. It has the description of a paper in PubMed. The right cell is the selection cell. It contains a checkbox. When you have not taken any action on a paper, the selection cell has a black background. You use the checkbox to select or deselect the paper. If your browser supports Javascript, you can click anywhere in the selection cell. Thus you do not need to do the extra work to move your mouse to the exact location of the checkbox. Again, if your browser supports Javascript, the color of the selection cell is green or red depending on whether a paper is selected or deselected, respectively.

Each paper is prefaced by a horizontal rule, known as the separator rule. By default, the separator rule has a black color.

Preselecting

Since there are about 25,000 to 30,000 papers in every issue, it is not practicable for you look at all of them.

For the first issue, preselection is done through a process called seeding. This process only uses the papers you have given us as sample papers. The more you example papers you gave us the better. Still the result of seeding is generally not good.

After the first issue, ernad uses machine learning. Machine learning takes data from all the papers in the issue. It tries to guess what papers are most likely to be included in this issue. It puts these papers at the top of the table.

When presorting takes place, the seperator rule takes colors that range from green (#00FF00) to red (#FF0000). The color is an indication of how probable the presorting estimates that the paper is relevant to the topic of the report. You can find out the exact value by hovering the mouse over the selection cell.

Once ernad has learned about your topic, you can be fairly confident that all the papers that are relevant to your report are at the top.

Pickup screen and filtering screens

Papers selection happens in two screens. The first is called the “pickup” screen. The second is called the “filtering” screen. It is very important that you understand the difference between the two.

The pickup screen

The pickup screen shows you all the papers in the issue. If you don’t see a relevant paper by scrolling, use the search interface of your browser. If you don’t see a paper that is strictly relevant to your report, but somewhat relevant, you may wish to select it in the pickup screen.

It is very important to understand that machine learning will take the papers that you select in the pickup stage as being relevant. You can accept papers that you don’t include in the report to guide machine learning. Thus, try to pick up all papers you feel are close to the report’s topic. This is particularly important if you run a “thin” report, meaning one where there are few relevant papers. When you are starting a thin report, you need to be generous at this stage to guide the machine learning towards what to look for.

If you want to take a restrictive view of your topic, exclude the papers in the filtering screen.

The filtering screen

This screen is used to exclude papers that are on the topic from your public report. A paper may be on the topic, but you may want to exclude it because

it’s got nothing really new;
it’s not a research paper, but maybe an survey;
you think it contains crackpot ideas;
you don’t like the authors.

To summarize, the proper use of the filtering screen is to exclude papers for any reasons other than the topic.

The filtering screen can also be “abused” to filter papers you selected in the pickup screen for the purpose of guiding machine learning, but that you find not all that relevant after al. When you start a thin report, it is perfectly reasonably to abuse the filtering screen in this way.

The difference between the pickup and filtering screens.

We are sorry to dwell on this. We suspect that can not stress this enough. Let’s give you another example.

Assume that you see a paper, and then the next week or so you see something that is essentially the same paper. The authors were just crafty enough to get the same stuff published twice. You included the first version of the paper in your report last week. You do not want to include it again. You think your report readers will not appreciate the repetition. You still have to select the second version at the pickup screen. Why? Well consider what will happen if you don’t. The two versions of the paper will have different PubMed handles but as far as their features are concerned, they are virtually identical. If you accept one paper, but reject the second paper, the machine learning will be confused. A confused machine learning is not likely to do a good job for you.

Keyboard interface

The selection screens feature a special javascript-powered interface. It resassign the arrow keys. Arrow down will bring the next document

Scream and sausage

The representation of a paper in the item cell of the selection screens imitates its appearance in the HTML version of final report issue. There are two exceptions.

First, on the right of the first line, a couple of blue exclamation marks “!” surround the number of the paper in the current issue. This feature is called the scream. The scream gives you are idea how far down with the issue you are. You can also use it to find the paper with a browser search, because it is unlikely that anywhere else a exclamation mark is followed by a number.

Second, on the right next to the paper’s DOI, a couple of blue vertical bars “|” surround one, two, or event three characters in the range of [0-9a-z]. This is the paper’s sausage. Technically, the sausage is a hexatrigesimal coding of the position of a paper in the issue. In each issue, each paper has its own sausage. It acts as a very short identifier of the paper. If you work on several reports, you may discover a paper that is relevant for the second report while you are working on the first report. Write down the sausage of that paper. When you come to work on the same issue for the second report, you can find that paper very quickly using a browser search.

Optional tasks

Papers selection is the only mandatory task. Here we discuss the optional tasks.

The sorting screen

The sorting screen allows you to change the order of the papers in your issue. You can use the up “↑” and down “↓” arrows in the interface to move an individual paper up and down. You can use the text boxes to enter a series of numbers—positive or negative—and then press on of the two bottom of the input boxes. “↥” will sort by numbers using ascending order. “↧” will sort by number using descending order. Boxes that you have not filled in a number for will count for zero.

The preview screens

These allow you to preview the issue both in its text and HTML version. If you spot an error just close your browser and contact Thomas. Since you already have done the selections, Thomas can finish it for once he has gotten round to fixing the error.