bims: NGI Zero Discovery application 2021
Abstract: Can you explain the whole project and its expected outcome(s).
“Bims: Biomed News” and “NEP: New Economics Papers” are expertise sharing systems. They support discovery without any search. And that’s why they are very hard to understand.
Experts on a subject need to know about latest documents published on that subject on a regular basis, say once a week. Repeated searches are a masochist’s pleasure. Bims and NEP allow experts to maintain reports on a subject. They select the relevant documents every week. A bespoke software called ernad processes the data about both the selected and the non-selected documents. It then ranks next week’s documents by likelihood of relevance. Our experts find that this is more flexible, more precise, and more fun than searches. Yes, that is wonderful for these experts. But the wider societal benefits only arrive when the reports by these experts are disseminated. It’s not much of a problem in NEP. It uses the wider array of RePEc services. But is an issue for Bims. Individual experts have tweeted report issues. That’s not enough. I want a non-proprietary system based on email. This is what this proposal intends to fund.
Have you been involved with projects or organisations relevant to this project before? And if so, can you tell us a bit about your contributions?
In 1993, I published the first online economics working paper my gopher server. In 1997, I lead the creation of the RePEc digital library. RePEc enabled the working paper culture in economics to transit to digital distribution. In 1998, I created NEP. In the early 2000s, it became clear that the growth of new items in RePEc would mean that NEP experts would be overloaded with work. In 2004, I pioneered machine learning in the digital library space with the ernad software written for NEP. In 2017, I finally found a biomedical expert to direct me for a project to run ernad on PubMed data. Thus Bims was born. Bims’ data is openly available in bulk via anonymous rsync.
Explain what the requested budget will be used for? Does the project have other funding sources, both past and present?
The budget will be spent on me working on the project for about nine months full-time. I know it’s ridiculously little but I rather take little than nothing. I’m accustomed to living in poverty.
Since other stuff arises, I expect to finish in about 12 to 15 months. The result will be a software that be based on (1) periodically appearing data stocks in XML (2) a non-periodically appearing set of mapping between document handles to reports in JSON, and (3) some subscriber data in a relational database. Subscribers will have a web-based interface to subscribe or unsubscribe to reports. Each email to a subscriber will be customized. Subscribers that receive several reports will not see the same document mentioned a second time round, if it was included in a different report sent earlier. This is to encourage subscriptions to many, potentially highly-specialized reports. The system will be written in Python, JavaScript and XSLT. The subscriber data will be held in a relational database. The system will be able to handle any type incoming XML. The XSLT customizations for NEP and Bims will be included as examples.
As for other funding: An initial version of ernad was written in 2004 using leftover funds from the JISC-funded WoPEc project. But none of that code is left in contemporary versions that power NEP and Bims. NEP on occasions featured sponsored advertising. Bims has had no funding ever. Getting some external financial support, regardless of the amount, will boost Bims’ credibility. As it stands, it’s just quite literally incredible.
Compare your own project with existing or historical efforts.
The precedent for Bims is NEP. There is no precedent for NEP. When I introduced machine learning to NEP in 2004 it was the first time that machine learning was used on bibliographic data. Right now, the closest there is to Bims is the LitSuggest system that the National Library of Medicine in the US have brought out this year. But LitSuggest features no dissemination, which is what this proposal is about.
There are email list systems, such as Mailman and Sympa. I have been using Mailman for NEP for over 15 years. NEP does not use most of the features of Mailman, and many of these unused features, such a owner addresses, are in fact a nuisance. I suspect this is one of the reasons why smaller organizations rely on companies like ConstantContact, MailChimp and suchlike. Basically, this proposal aims to come up with a replacement of the software built and used by these companies.
What are significant technical challenges you expect to solve during the project, if any?
I suppose that for most people running your own email server is a challenge. The problem with building a software to run on top of email is to work around requirements to configure the mailer. Just for me, it’s not rocket science. I’ve been running my own email and Mailman mailing list systems for over twenty years. I will use Django and see what I can use from the Mailman3 code. I will try to incorporate the software that I write into Mailman3. But at this stage, I can’t confirm that I will be able to do that with a reasonable workload.
Describe the ecosystem of the project, and how you will engage with relevant actors and promote the outcomes?
NEP and Bims are enabling blocks for reform of scholarly communication. Scholarly communication suffers from the stranglehold of publishers. They extract obscenely large rents from intermediating between academics. Given that power structure, swift change is impossible. NEP and Bims aim at disintermediation. For Bims that effect will become rather powerful as a preprint culture is developing in the biomedical sciences. Up until now, the recruitment of experts for Bims has been very slow. I believe that setting up the email system as proposed here will be a turning point in the development of Bims. Once we have more researchers on board we will be in a more convincing position to attract other experts, such as journalists, sufferers of chronic diseases and support organisations for rare diseases. Research will get to readers much faster than before, and the oligopoly currently enjoyed by highly visible outlets will come under pressure.
Having said that, this proposal is for an email system. I will keep the system configurable. It will be able to run on various collections. How we will get others to use the code its code is something that I’m not sure of at this point.
I hope all this was not too boring a read!