bims-librar Biomed News
on Biomedical librarianship
Issue of 2020–10–04
seventeen papers selected by
Thomas Krichel, Open Library Society



  1. J Clin Epidemiol. 2020 Sep 29. pii: S0895-4356(20)31110-0. [Epub ahead of print]
       OBJECTIVE: To assess the feasibility of a modified workflow that uses machine learning and crowdsourcing to identify studies for potential inclusion in a systematic review.
    STUDY DESIGN AND SETTING: This was a sub-study to a larger randomised study; the main study sought to assess the performance of single screening search results versus dual screening. This sub-study assessed the performance in identifying relevant RCTs for a published Cochrane review of a modified version of Cochrane's Screen4Me workflow which uses crowdsourcing and machine learning. We included participants who had signed up for the main study but who were not eligible to be randomised to the two main arms of that study. The records were put through the modified workflow where a machine learning classifier divided the dataset into "Not RCTs" and "Possible RCTs". The records deemed "Possible RCTs" were then loaded into a task created on the Cochrane Crowd platform and participants classified those records as either "Potentially relevant" or "Not relevant" to the review. Using a pre-specified agreement algorithm we calculated the performance of the crowd in correctly identifying the studies that were included in the review (sensitivity) and correctly rejecting those that were not included (specificity).
    RESULTS: The RCT machine learning classifier did not reject any of the included studies. In terms of the crowd, 112 participants were included in this sub-study. Of these, 81 completed the training module and went on to screen records in the live task. Applying the Cochrane Crowd agreement algorithm, the crowd achieved 100% sensitivity and 80.71% specificity.
    CONCLUSIONS: Using a crowd to screen search results for systematic reviews can be an accurate method as long as the agreement algorithm in place is robust.
    TRIAL REGISTRATION: Open Science Framework: https://osf.io/3jyqt.
    Keywords:  accuracy; agreement algorithm; crowdsourcing; human computation; literature screening; machine learning; systematic reviews
    DOI:  https://doi.org/10.1016/j.jclinepi.2020.09.024
  2. Res Synth Methods. 2020 Sep 28.
       INTRODUCTION AND AIM: NICE guideline surveillance determines whether previously published guidelines need updating. The surveillance process must balance time constraints with methodological rigour. It includes a rapid review to identify new evidence to contradict, reinforce or clarify guideline recommendations. Despite this approach, the screening burden can still be high. Applying additional search techniques may increase the precision of the database searches.
    METHODS: A retrospective analysis was conducted on five surveillance reviews with less than 2% of the studies included after screening. Modified searches were run in MEDLINE, Embase and PsycINFO (where appropriate) to test the impact of additional search techniques: focused subject headings, subheadings, frequency operators and title only searches. Modified searches were compared to original search results to determine: the retrieval of included studies, the precision of the search and the number needed to read. Studies not retrieved by the modified search were checked to determine if the surveillance decision would have been affected.
    RESULTS: The additional search techniques tested indicated that a combination of focused subject headings and frequency operators could improve the precision of surveillance searches. The modified search retrieved all the original studies included in the surveillance review for three of the reviews tested. Some of the original included studies were not retrieved for two reviews but the missing studies would not have affected the surveillance decision.
    CONCLUSIONS: Combining focused subject headings and frequency operators is a viable option for improving the precision of surveillance searches without compromising recall and without impacting the surveillance decision. This article is protected by copyright. All rights reserved.
    Keywords:  databases, bibliographic; information storage and retrieval; search precision; search recall; search strategy
    DOI:  https://doi.org/10.1002/jrsm.1461
  3. Heliyon. 2020 Sep;6(9): e04662
      This study aims to identify the steps in which users of academic libraries search for information and interact with the libraries' web interfaces. The study draws on models from the disciplines of human-computer interaction (HCI) and information-seeking behaviour (ISB) to create and investigate a Unified Model. Interpretive case studies were conducted at two universities, one in the UK and one in Kuwait. Qualitative data was collected using observations with postgraduate students and analysed by a content analysis approach. The findings revealed seven steps taken in searching for information and interacting with academic libraries' web interfaces, but these steps are overlapped as users can change, move and go from one step to another based on the difficulties they encounter and the options they have.
    Keywords:  Academic library; Case study; Content analysis; Human-computer interaction; Information retrieval; Information technology; Information-seeking behaviours; Interaction design; User interface
    DOI:  https://doi.org/10.1016/j.heliyon.2020.e04662
  4. Int J Environ Res Public Health. 2020 Sep 24. pii: E6988. [Epub ahead of print]17(19):
      Italy was the first European country to be affected by COVID-19, facing an unprecedented situation. The reaction required drastic solutions and highly restrictive measures, which severely tested the trust of the Italian people. Nevertheless, the effectiveness of the introduced measures was not only linked to political decisions, but also to the choice of the Italian people to trust and rely on institutions, accepting such necessary measures. In this context, the role of information sources was fundamental, since they strongly influence public opinion. The central focus of this research was to assess the information seeking behavior (ISB) of the Italian citizens, to understand how they related to information and how their specific use of information influenced public opinion. By making use of a survey addressed to 4260 Italian citizens, we identified extraordinarily virtuous behavior in the population: people strongly modified their ISB in order to address the most reliable sources. In particular, we found a very high reliance on scientists, which is particularly striking, if compared to the past. Moreover, starting from the survey results, we used social simulation to estimate the evolution of public opinion. Comparing the ISB during and before COVID-19, we discovered that the shift in the ISB, during the pandemic, may have actually positively influenced public opinion, facilitating the acceptance of the costly restrictions introduced.
    Keywords:  COVID-19; SARS-CoV-2; coronavirus; fake news; information-seeking behavior; misinformation; misleading information; social simulation; trust
    DOI:  https://doi.org/10.3390/ijerph17196988
  5. J Sch Nurs. 2020 Sep 29. 1059840520957069
      Adolescents are more likely to engage in risky health practices related to COVID-19. Their compliance with infection control measures is a key factor to mitigate the spread of the disease. The purpose of this study was to explore the knowledge, attitudes, and practices toward COVID-19 and their correlates among Jordanian adolescents. An online cross-sectional survey was utilized. A total of 1,054 Jordanian adolescents aged 12-18 completed and returned the survey. Overall, Jordanian adolescents showed a good base of knowledge regarding COVID-19 (regardless of their demographic characteristics) and tended to hold positive attitudes toward the country's curfew and other protective measures. The majority of adolescents reported that television and social media were their main source of information on COVID-19, while few reported receiving such information from their schools. The majority reported practicing effective health protective behaviors to prevent the spread of COVID-19, which was significantly predicted by their knowledge and attitudes toward these measures. However, there was a relatively small, yet clinically significant, percentage of adolescents who showed poor knowledge on COVID-19, had negative attitudes toward protective measures, and reported being engaged in risky practices related to infection spread. Tailored efforts are needed to improve the levels of knowledge, attitudes, and practices among adolescents. Raising awareness and promoting positive attitudes are vital to change adolescents' health practices. Policy makers should ensure that school nurses are available in all schools and working to their full scope. School nurses are the eyes and ears of public health and primary care. They are essential members on pandemic preparedness, reopening and reentry planning teams, and can lead health care in schools and practice in a holistic culturally competent proactive manner to address the needs of students.
    Keywords:  COVID-19; Jordan; KAPs; adolescents; coronavirus; online survey; school nursing
    DOI:  https://doi.org/10.1177/1059840520957069
  6. Am J Orthod Dentofacial Orthop. 2020 Oct;pii: S0889-5406(20)30376-0. [Epub ahead of print]158(4): 612-620
       INTRODUCTION: The evaluation of online information regarding orthodontic temporary anchorage devices (TADs) is lacking despite the increase in their use by orthodontists. This cross-sectional study aimed to investigate the quality of information regarding TADs available on the Internet to the general public.
    METHODS: Two search terms ("orthodontic temporary anchorage device" and "orthodontic miniscrew") were entered separately into a total of 5 search engines. The DISCERN instrument, Journal of the American Medical Association (JAMA) benchmarks, and Health on the Net Foundation Code of Conduct were used to evaluate the quality of information contained within Web sites that satisfied the inclusion and/or exclusion criteria. Web site readability was assessed via the Simple Measure of Gobbledygook and Flesch Reading Ease Score tools. Descriptive statistical analyses and Cohen's kappa intrarater reliability tests were performed.
    RESULTS: Thirty-one Web sites were evaluated. Most were authored by orthodontists (77.4%) and originated from the U.S. (38.7%). The mean (standard deviation [SD]) DISCERN score was 41.87 (8.45) out of 80, with a range of 27-57. Intrarater reliability testing for DISCERN scores was excellent (0.84). Four Web sites achieved all 4 JAMA benchmarks, and 2 achieved none. Referencing of content sources throughout the Web sites scored least via DISCERN (mean 1.49 out of 5 per Web site [SD, 0.77]) and JAMA (19.35% of Web sites). One Web site contained the Health on the Net Foundation Code of Conduct seal. The mean (SD) Simple Measure of Gobbledygook score was 8.75 (1.25), with a range of 6.5-11.3. The mean (SD) Flesch Reading Ease Score was 59.81 (7.17), with a range of 47.6-73.8.
    CONCLUSIONS: The quality of information related to TADs on the Internet is moderate. The usefulness of the information may be further reduced because it was beyond the readability of the average member of the general public. Web site authors should consider the use of additional expertise, quality of information tools, and readability formulas to ensure high-quality and easily readable content.
    DOI:  https://doi.org/10.1016/j.ajodo.2020.02.008
  7. World Neurosurg. 2020 Sep 26. pii: S1878-8750(20)32132-X. [Epub ahead of print]
       BACKGROUND: Patients-including those with chronic pain conditions-increasingly turn to the Internet for health information. To facilitate comprehension, this information ought to be written at, or below, the 8th grade reading level, which is the average American adult's reading level. This study measures the reading level of popular online sources for trigeminal neuralgia.
    METHODS: The top 10 search results from the "trigeminal neuralgia" search term on Google and Bing were selected for inclusion. The Flesch Reading Ease (FRE), Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), Automated Readability Index (ARI), and Linsear Write Formula (LWF), were used to assess readability. A one-way ANOVA was utilized to test for statistical differences in average readability scores among the different web pages.
    RESULTS: Across the web pages, the average readability scores were as follows: FRE: 42.1 ± 7.7; FKGL: 10.9 ± 0.9; GFI: 15 ± 1.5; SMOG: 10.9 ± 1.2; CLI: 12.1 ± 1.3; ARI: 11.9 ± 1.4: LWF: 12.4 ± 1.7. Results from a one-way ANOVA demonstrated no statistically significant difference in overall readability scores among the web pages (F12,78=0.008; P>0.05).
    CONCLUSIONS: The writing of popular online education materials for trigeminal neuralgia is likely too complex to comprehend. We recommend that this material be revised to be readable at or below the 8th grade reading level. A variety of easily readable online education materials for trigeminal neuralgia can assist these patients in understanding their illness, and potentially improve patient decision-making and outcomes.
    Keywords:  health literacy; online; patient education; readability; trigeminal neuralgia
    DOI:  https://doi.org/10.1016/j.wneu.2020.09.123
  8. BMC Ophthalmol. 2020 Oct 02. 20(1): 391
       BACKGROUND: Age-related macular degeneration (AMD) is a chronic eye condition that leads to permanent vision loss in the central visual field. AMD makes reading challenging and inefficient. People with AMD often find it difficult to access, process and understand written patient education materials (PEMs). To promote health literacy, the demands of written PEMs must match the literacy capacities of the target audience. This study aims to evaluate the readability (grade level) and suitability (appropriateness) of online PEMs designed for people with AMD.
    METHODS: Online PEMs were sourced from websites of national organizations providing patient education materials designed for people with AMD. The Flesch-Kincaid Grade Level formula and the Suitability Assessment of Materials instrument were used to assess the readability and suitability of PEMs. Descriptive statistics were used to compare online PEMs by organization based on national guidelines for readability level (≤ sixth grade) and the recommended suitability score (≥ 70%) for "superior" material.
    RESULTS: One hundred online PEMs were evaluated from websites of 16 professional organizations. The mean readability level was 9.3 (range 5.0-16.6). The mean suitability score was 53% (range 18-78%). Only six (6%) of PEMs achieved the recommended guidelines for readability level and suitability score.
    CONCLUSION: The majority of online PEMs designed for people with AMD were written above the recommended readability level, and below the suggested suitability score. To promote health literacy, the demands of written health information must match the reading capacities of the target audience. Heeding to evidence-based guidelines for providing written information to patients with low health literacy and low vision is beneficial for both patients and health care providers. Future research is warranted.
    Keywords:  Age-related macular degeneration; Health literacy; Patient education materials; Readability; Suitability
    DOI:  https://doi.org/10.1186/s12886-020-01664-x
  9. Patient Educ Couns. 2020 Sep 17. pii: S0738-3991(20)30510-3. [Epub ahead of print]
       OBJECTIVE: This study aimed at evaluating the quality and readability of online information about breast cancer written in Chinese.
    METHODS: An Internet search was conducted for "breast cancer" in Chinese using the Baidu search engine. Website quality was evaluated using the DISCERN instrument, and readability was evaluated using the Chinese Readability Index Explorer (CRIE). Higher DISCERN score indicated higher quality of websites, while higher CRIE score indicated lower readability of the content of the websites. We also investigated the effects of website producer category, and the associations of search engine ranking with DISCERN and CRIE scores.
    RESULTS: A total of 49 websites were included. The mean overall DISCERN score was 50.27 ± 4.14, and the mean CRIE score was 6.78 ± 0.16. Websites produced by non-profit organizations had the highest overall DISCERN scores, while those produced by private individuals had the lowest CRIE scores. Search engine ranking had no significant correlation with website quality or readability.
    CONCLUSIONS: The quality and readability of breast cancer websites in Chinese were not satisfactory, and they varied among different website producer categories.
    PRACTICE IMPLICATIONS: Website producers should seek to provide more accurate, comprehensive, and easy-to-understand information to better meet the needs of breast cancer patients. In addition, search engines should revise algorithms to promote websites with higher quality and accessibility.
    Keywords:  Breast cancer; Chinese websites; Quality evaluation; Readability; Search engine ranking
    DOI:  https://doi.org/10.1016/j.pec.2020.09.012
  10. J Stroke Cerebrovasc Dis. 2020 Sep 21. pii: S1052-3057(20)30727-8. [Epub ahead of print]29(12): 105309
       BACKGROUND AND OBJECTIVES: Studies using YouTube data for various diseases are rapidly increasing. This study aimed to investigate the educational quality, reliability and accuracy of the YouTube videos concerning repetitive transcranial magnetic stimulation (rTMS) applications in patients with stroke.
    METHODS: This is a descriptive study. A video based search on YouTube was performed on April 18th, 2020 by using keyword 'stroke repetitive transcranial magnetic stimulation'. The videos were queried using the default settings on YouTube and the results were listed according to relevance. Video parameters and sources were recorded. Quality, reliability and accuracy of the videos were determined with Global Quality Score (GQS), Journal of American Medical Association (JAMA) Benchmark Criteria and Modified DISCERN Questionnaire, respectively.
    RESULTS: A total of 21 videos were included in the study. The median number of views for videos was 884 (range: 89-28589) and the median duration was 135 seconds. None of the videos had a negative interaction index. The median value was found to be 3 for all three measurements (GQS, JAMA, and DISCERN). Most of the videos were of intermediate quality (47.6%) and had partial sufficient data (61.9%). In the high-quality group, the number of views, dislikes, the duration of the videos, JAMA and DISCERN scores were higher than the low-quality group (p < 0.05). At the same time, viewing rates of the high-quality group were better than the low and the intermediate-quality group (p < 0.05). There was a significant positive correlation between GQS and number of the views, video duration, number of likes, number of dislikes, viewing rate and modified DISCERN questionnaire scores (p < 0.05).
    CONCLUSION: Our results showed that most of the rated videos were of intermediate quality and had partially sufficient data. It has also been found that high-quality videos have higher viewing rates, more dislikes, longer video durations as well as better reliability and accuracy scores. YouTube videos of higher quality and accuracy are needed to increase awareness of rTMS by stroke patients.
    Keywords:  Stroke; Transcranial Magnetic Stimulation; Video; YouTube
    DOI:  https://doi.org/10.1016/j.jstrokecerebrovasdis.2020.105309
  11. Am J Mens Health. 2020 Sep-Oct;14(5):14(5): 1557988320945461
      Information seeking is essential for effective patient-centered decision-making. However, prostate cancer patients report a gap between information needed and information received. The importance of different information sources for treatment decision remains unclear. Thus, using the Comprehensive Model of Health Information (CMIS) framework, we assessed the antecedent factors, information carrier factors, and information-seeking activities in localized prostate cancer patients. Data were collected via semistructured one-on-one, interviews and structured survey. Men with localized prostate cancer were recruited from two urban health-care centers. Following the interview, participants completed a survey about sources that were helpful in learning about prostate cancer treatment and decision-making. The interviews were audio-recorded, transcribed, and subjected to a thematic analysis using NVivo 10. Fifty localized prostate cancer survivors completed the interviews and surveys. Important antecedent factors that were observed were age, marital status, uncertainty, anxiety, caregiver burden, and out-of-pocket expenses. We identified complexity, magnitude, and reliability as information carrier characteristics. Preferred sources for information were health providers, medical websites, and pamphlets from the doctor's office. These sources were also perceived as most helpful for decision-making. Urologists, urological oncologists, and radiation/radiation oncologists were important sources of information and helpful in decision-making. Prostate cancer patients obtained information from multiple sources. Most prostate cancer patients make patient-centered choices by incorporating personal factors and medical information. By considering factors that influence patients' treatment decisions, health-care providers can enhance the patient-centeredness of care. Multiple strategies and interventions are necessary for disseminating valid, reliable, and unbiased information to prostate cancer patients to facilitate informed decisions.
    Keywords:  information seeking; localized prostate cancer; patient interviews; shared decision-making; sources of information
    DOI:  https://doi.org/10.1177/1557988320945461
  12. Ital J Pediatr. 2020 Sep 29. 46(1): 141
       BACKGROUND: People increasingly search online for health information. Particularly, parents of patients often use the Internet as a source for health information. We conducted a survey to investigate the online searching behavior of parents of patients < 18 years, admitted for surgery in an Italian pediatric hospital.
    METHODS: The cross-sectional survey was nested in a prospective cohort study on surgical procedures. Parents of patients undergoing surgical procedures at Bambino Gesù Children's Hospital, Rome, Italy, were enrolled and contacted by phone after the procedure. We recorded socio-demographic data, sex, length of stay following surgery, proximity of residence to the hospital, use of the internet to search for information on the surgery before and after the intervention and effect of information found online.
    RESULTS: The majority (91%) of parents of children undergoing surgical intervention used the internet. Of these, 74.3% of parents searched for information before surgery, and 26.1% searched for information after. Most parents searched for information on the care provider's website. Two thirds of parents reported that information found online had increased their understanding of the child's condition. Multivariate analyses indicated that families living far from the hospital (> 43 km) were more likely to search for health information (OR 2.3; 95% CI 1.34-4.00), as were families of patients undergoing a major surgery (OR = 2.1; 95% CI 1.04-4.11).
    CONCLUSIONS: Parents of children undergoing surgery often search online for information on their child's intervention, in particular those whose child is scheduled for a major surgery and those living far from the hospital. A survey like the present one allows to understand parents' information needs, to better guide them in online information seeking and to better tailor information provided on the care provider's website.
    Keywords:  Children; Information search; Internet; Surgery
    DOI:  https://doi.org/10.1186/s13052-020-00884-7
  13. J Med Internet Res. 2020 Sep 15.
       BACKGROUND: First detected in Wuhan, China in December 2019, the novel coronavirus (i.e., "COVID-19") pandemic stretched the medical system in Wuhan and posed an immense challenge to the state's risk communication efforts. Timely access to quality healthcare information during outbreaks of infectious diseases can be effective to curtail the spread of disease and feelings of anxiety. While these existing studies have greatly extended our knowledge about online health information seeking behavior, processes and motivations, rarely have the findings been applied to an outbreak. Moreover, there is relatively little recent research on how people in China are using the Internet for seeking health information in a time of a pandemic.
    OBJECTIVE: The objective of our study was to explore how people in China are using the Internet for seeking health information in a time of a pandemic. Drawing on previous research of online health information seeking, this study asks the following research questions: How was the "#COVID-19 Patient Seeking Help" hashtag being used by patients in Wuhan seeking health information on Weibo at the peak of the outbreak?; What kinds of health information were patients in Wuhan seeking on Weibo at the peak of the outbreak?
    METHODS: Using entity identification and textual analysis on 10908 posts on Weibo, we identified 1496 Coronavirus patients using "#COVID-19 Patient Seeking Help" and explored their online health information seeking behavior.
    RESULTS: The curve of the hashtag posting provided a dynamic picture of public attention to the COVID-19 pandemic. Many patients faced difficulties accessing offline health care services. In general, our findings confirmed that the Internet is used by the Chinese public as an importance source of health information. The lockdown policy was found to cut off the patients' social support network, preventing them from seeking help from family members. The ability to seek information and help online, especially for those with young children or elderly members was especially essential during the pandemic. A high proportion of female users were seeking health information and help for their parents or for the elder at home. The most searched information included accessing medical treatment; managing self-quarantine; and offline to online support.
    CONCLUSIONS: Overall, the findings contribute to our understanding of health information seeking behaviors during an outbreak and highlight the importance of paying attention to the information need of vulnerable groups and the role social media may play.
    DOI:  https://doi.org/10.2196/22910
  14. Sante Publique. 2020 Sep 15. Vol. 32(2): 171-182
       INTRODUCTION: Pregnant women are heavy users of Internet and this has an impact on their medical follow-up. The purpose of this study is to highlight the ethical issues related to the use of the Internet by women in their medical care.Methode: Through a systematic literature review conducted on PubMed/Medline, Web of Science, CINAHL and Embase between June and July 2019, 10&#160;670 results were obtained, and 79 articles were included in the post-selection study. A thematic analysis was conducted on these articles.
    RESULTS: More than 90% of pregnant women use Internet, particularly to find medical information and social support, mainly on pregnancy and childbirth. This research allows them more equitable access to knowledge and develops their empowerment, which modifies the relationship between caregiver and patient, through the acquisition of greater autonomy for women and the development of experiential knowledge. This access offers a central and active role to pregnant women in their medical care. However, many authors also agree on the possible abuses of this use: misinformation, disproportionate information and the presence of judgment that undermine empowerment, but also digital divide and inequity in understanding information, stigmatization of women, and risks of privacy breaches on data acquired online.
    CONCLUSION: In order to provide pregnant women with the central and active place they seek, the authors recommend involving caregivers in the referral to reliable sites, encouraging them to develop online content, and educating pregnant women in the search for health information on Internet.
    DOI:  https://doi.org/10.3917/spub.202.0171
  15. Math Biosci Eng. 2020 Jun 08. 17(4): 4098-4114
      With the rapid development of biomedical technology, amounts of data in the field of precision medicine (PM) are growing exponentially. Valuable knowledge is included in scattered data in which meaningful biomedical entities and their semantic relationships are buried. Therefore, it is necessary to develop a knowledge representation model like ontology to formally represent the relationships among diseases, phenotypes, genes, mutations, drugs, etc. and achieve effective integration of heterogeneous data. On basis of existing work, our study focus on solving the following issues: (i) Selecting the primary entities in PM domain; (ii) collecting and integrating biomedical vocabularies related to the above entities; (iii) defining and normalizing semantic relationships among these entities. We proposed a semi-automated method which improved the original Ontology Development 101 method to build the Precision Medicine Ontology (PMO), including defining the scope of the PMO according to the definition of PM, collecting terms from different biomedical resources, integrating and normalizing the terms by a combination of machine and manual work, defining the annotation properties, reusing existing ontologies and taxonomies, defining semantic relationships, evaluating PMO and creating the PMO website. Finally, the Precision Medicine Vocabulary (PMV) contains 4.53 million terms collected from 62 biomedical vocabularies, and the PMO includes eleven branches of PM concepts such as disease, chemical and drug, phenotype, gene, mutation, gene product and cell, described by 93 semantic relationships among them. PMO is an open, extensible ontology of PM, all of the terms and relationships in which could be obtained from the PMO website (http://www.phoc.org.cn/pmo/). Compared to existing project, our work has brought a broader and deeper coverage of mutation, gene and gene product, which enriches the semantic type and vocabulary in PM domain and benefits all users in terms of medical literature annotation, text mining and knowledge base construction.
    Keywords:   biomedical ontology ; controlled vocabulary ; precision medicine ; semantic web ; taxonomy
    DOI:  https://doi.org/10.3934/mbe.2020227
  16. Database (Oxford). 2020 Oct 01. pii: baaa064. [Epub ahead of print]
      It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers' workload in identifying the right dataset and increase the reusability of biomedical datasets. Database URL: http://genestudy.org/recommends/#/.
    DOI:  https://doi.org/10.1093/database/baaa064