Issue of 2021‒05‒02
on Biomedical librarianship
Issue of 2021‒05‒02
twenty-one papers selected by
Thomas Krichel
Open Library Society

  1. Medwave. 2021 Mar 30. 21(2): e8144
      The increasing amount of evidence has caused an increasing amount of literature reviews. There are different types of reviews systematic reviews are the best known, and every type of review has different purposes. The scoping review is a recent model that aims to answer broad questions and identify and expose the available evidence for a broader question, using a rigorous and reproducible method. In the last two decades, researchers have discussed the most appropriate method to carry out scoping reviews, and recently the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for scoping reviews (PRISMA-ScR) reporting guideline was published. This is the fifth article of a methodological collaborative series of narrative reviews about general topics on biostatistics and clinical epidemiology. This review aims to describe what scoping reviews are, identify their objectives, differentiate them from other types of reviews, and provide considerations on how to carry them out.
    Keywords:   evidence-based medicine; literature review as topic; systematic mapping; scoping reviews as topic
  2. Glob Bioeth. 2021 Apr 05. 32(1): 67-84
      Aim: This study is a systematic review that aims to assess how healthcare professionals manage ethical challenges regarding information within the clinical context.Method and Materials: We carried out searches in PubMed, Google Scholar and Embase, using two search strings; searches generated 665 hits. After screening, 47 articles relevant to the study aim were selected for review. Seven articles were identified through snowballing, and 18 others were included following a system update in PubMed, bringing the total number of articles reviewed to 72. We used a Q-sort technique for the analysis of identified articles.
    Findings: This study reveals that healthcare professionals around the world generally employ (to varying degrees) four broad strategies to manage different types of challenges regarding information, which can be categorized as challenges related to confidentiality, communication, professional duty, and decision-making. The strategies employed for managing these challenges include resolution, consultation, stalling, and disclosure/concealment.
    Conclusion: There are a variety of strategies which health professionals can adopt to address challenges regarding information management within the clinical context. This insight complements current efforts aimed at enhancing health professional-patient communication. Very few studies have researched the results of employing these various strategies. Future empirical studies are required to address this.
    Abbreviations: CIOMS: Council of International Organization of Medical Sciences; WHO: World Health Organization; AMA: American Medical Association; WMA: World Medical Association; PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analysis; ISCO: International Standard Classification of Occupations; ILO: International Labour Office; SPSS: The Statistical Package for the Social Sciences.
    Keywords:  Ethical challenges; clinical context; empirical literature; information management; systematic review
  3. Database (Oxford). 2021 Apr 29. pii: baab021. [Epub ahead of print]2021
      High-quality metadata annotations for data hosted in large public repositories are essential for research reproducibility and for conducting fast, powerful and scalable meta-analyses. Currently, a majority of sequencing samples in the National Center for Biotechnology Information's Sequence Read Archive (SRA) are missing metadata across several categories. In an effort to improve the metadata coverage of these samples, we leveraged almost 44 million attribute-value pairs from SRA BioSample to train a scalable, recurrent neural network that predicts missing metadata via named entity recognition (NER). The network was first trained to classify short text phrases according to 11 metadata categories and achieved an overall accuracy and area under the receiver operating characteristic curve of 85.2% and 0.977, respectively. We then applied our classifier to predict 11 metadata categories from the longer TITLE attribute of samples, evaluating performance on a set of samples withheld from model training. Prediction accuracies were high when extracting sample Genus/Species (94.85%), Condition/Disease (95.65%) and Strain (82.03%) from TITLEs, with lower accuracies and lack of predictions for other categories highlighting multiple issues with the current metadata annotations in BioSample. These results indicate the utility of recurrent neural networks for NER-based metadata prediction and the potential for models such as the one presented here to increase metadata coverage in BioSample while minimizing the need for manual curation. Database URL:
  4. J Pers Med. 2021 Apr 14. pii: 300. [Epub ahead of print]11(4):
      BACKGROUND: Searching through the COVID-19 research literature to gain actionable clinical insight is a formidable task, even for experts. The usefulness of this corpus in terms of improving patient care is tied to the ability to see the big picture that emerges when the studies are seen in conjunction rather than in isolation. When the answer to a search query requires linking together multiple pieces of information across documents, simple keyword searches are insufficient. To answer such complex information needs, an innovative artificial intelligence (AI) technology named a knowledge graph (KG) could prove to be effective.METHODS: We conducted an exploratory literature review of KG applications in the context of COVID-19. The search term used was "covid-19 knowledge graph". In addition to PubMed, the first five pages of search results for Google Scholar and Google were considered for inclusion. Google Scholar was used to include non-peer-reviewed or non-indexed articles such as pre-prints and conference proceedings. Google was used to identify companies or consortiums active in this domain that have not published any literature, peer-reviewed or otherwise.
    RESULTS: Our search yielded 34 results on PubMed and 50 results each on Google and Google Scholar. We found KGs being used for facilitating literature search, drug repurposing, clinical trial mapping, and risk factor analysis.
    CONCLUSIONS: Our synopses of these works make a compelling case for the utility of this nascent field of research.
    Keywords:  COVID-19; drug repurposing; knowledge graph; natural language processing
  5. J Am Med Inform Assoc. 2021 Apr 24. pii: ocab078. [Epub ahead of print]
      OBJECTIVE: Clinical trials are an essential part of the effort to find safe and effective prevention and treatment for COVID-19. Given the rapid growth of COVID-19 clinical trials, there is an urgent need for a better clinical trial information retrieval that supports searching by specifying criteria including both eligibility criteria and structured trial information.MATERIALS AND METHODS: We built a linked graph for registered COVID-19 clinical trials: the COVID-19 Trial Graph, to facilitate retrieval of clinical trials. Natural language processing (NLP) tools were leveraged to extract and normalize the clinical trial information from both their eligibility criteria free texts and structured information from We linked the extracted data using the COVID-19 Trial Graph and imported it to a graph database, which supports both query and visualization. We evaluated trial graph using case queries and graph embedding.
    RESULTS: The graph currently (as of 10-05-2020) contains 3,392 registered COVID-19 clinical trials, with 17,480 nodes and 65,236 relationships. Manual evaluation of case queries found high-precision and recall scores on retrieving relevant clinical trials searching from both eligibility criteria and trial-structured information. We observed clustering in clinical trials via graph embedding, which also showed superiority over the baseline (0.8704 vs. 0.8199) in evaluating whether a trial can complete its recruitment successfully.
    CONCLUSIONS: The COVID-19 Trial Graph is a novel representation of clinical trials that allows diverse search queries and provides a graph-based visualization of COVID-19 clinical trials. High-dimensional vectors mapped by graph embedding for clinical trials would be potentially beneficial for many downstream applications, such as trial end recruitment status prediction, and trial similarity comparison. Our methodology also is generalizable to other clinical trials, such as cancer clinical trials.
    Keywords:  clinical trial; covid-19; eligibility criteria; graph representation
  6. Adv Med Educ Pract. 2021 ;12 383-392
      Background: The International Committee of Medical Journal Editors has published clear guidelines on the authorship of scientific papers. It is the research team's responsibility to review and ensure those guidelines are met. Authorship ethics and practices have been examined among healthcare professionals or among particular health science students such as medical students. However, there is limited evidence to assess the knowledge of authorship roles and practices among health science students.Methods: We conducted a cross-sectional study to assess the knowledge of authorship guidelines practices among health science students at King Saud bin Abdulaziz University for Health Sciences in Riyadh, Saudi Arabia. A survey was developed and distributed. It covered several domains, including demographic characteristics, participant's knowledge and attitude of authorship practices, knowledge and experience with ghost and guest authorships, and knowledge of institutional authorship policies. Moreover, a score was computed to reflect the respondents' knowledge about authorship practices.
    Results: Among the 321 participants who agreed to take the survey, two-thirds agreed with and supported that multi-authored articles' credit allocation should be based on the most significant contribution and contributions to the manuscript writing. Almost 47% agreed that team relationships would influence authorship allocation. The majority of the participants were not aware of their institutional research and publication policies. Also, around 50% of participants were not aware of guest or ghost authorships. Finally, the knowledge score about authorship credits, allocation, contribution, order, and guidelines was higher among students who were assigned as corresponding authors and those who were aware of their institutional authorship guidelines and policies.
    Conclusion: In conclusion, our findings suggest that health science students may have limited knowledge about authorship guidelines and unethical behaviors involved in a scientific publication. Universities and research centers should make more efforts to raise the awareness of health science students regarding authorship guidelines while ensuring that they comply with those guidelines.
    Keywords:  education; ethics; knowledge; publications; research article
  7. J Med Internet Res. 2021 Apr 18.
      BACKGROUND: Before the advent of an effective vaccine, non-pharmaceutical interventions such as mask wearing, social distancing and lockdown have been the primary measures to combat the COVID-19 pandemic. Such measures are highly effective when there is high population wide adherence, which requires information on current risks posed by the pandemic alongside a clear exposition of the rules and guidelines in place.OBJECTIVE: Here we analyze online news media coverage of COVID-19. We quantify the total volume of COVID-19 articles, their sentiment polarization and leading subtopics, to act as a reference to inform future communication strategies.
    METHODS: We collected 26 million news articles from the front pages of 172 major online news sources in 11 countries (available at Using topic detection we identified COVID-19-related content to quantify the proportion of total coverage the pandemic received in 2020. Sentiment analysis tool Vader was employed to stratify the emotional polarity of COVID-19 reporting. Further topic detection and sentiment analysis was performed on COVID-19 coverage to reveal the leading themes in pandemic reporting and their respective emotional polarizations.
    RESULTS: We find that COVID-19 coverage accounted for approximately 25% of all front-page online news articles between January and October 2020. Sentiment analysis of English-speaking sources reveals that overall COVID-19 coverage is not exclusively negatively polarized, suggesting a wide heterogeneous reporting of the pandemic. Within this heterogenous coverage, 16% of COVID-19 news articles (or 4% of all English-speaking articles) can be classified as highly negatively polarized, citing issues such as death, fear or crisis.
    CONCLUSIONS: The goal of COVID-19 public health communication is to increase understanding of distancing rules and maximize the impact of governmental policy. The extent to which the quantity and quality of information from different communication channels (e.g. social media, government pages and news) influence public understanding of public health measures remains to be established. Here we conclude that quarter of all reporting in 2020 covered COVID-19, which is indicative of information overload. In this capacity, our data and analysis form a quantitative basis for informing health communication strategies along traditional news media to minimize the risks of COVID-19 while vaccination is rolled out.
  8. Am J Trop Med Hyg. 2021 Apr 26. pii: tpmd210216. [Epub ahead of print]
      Google health-based Knowledge Panels were designed to provide users with high-quality basic medical information on a specific condition. However, any errors contained within Knowledge Panels could result in the widespread distribution of inaccurate health information. We explored the potential for inaccuracies to exist within Google's health-based Knowledge Panels by focusing on a single well-studied pathogen, Ebola virus (EBOV). We then evaluated the accuracy of those transmission modes listed within the Google Ebola Knowledge Panel and investigated the pervasiveness of any misconceptions associated with inaccurate transmission modes among persons living in Africa. We found that the Google Ebola Knowledge Panel inaccurately listed insect bites or stings as modes of EBOV transmission. Our scoping review found 27 articles and reports that revealed that 9 of 11 countries where misconceptions regarding insect transmission of EBOV have been reported are locations of current (i.e., Democratic Republic of Congo and Guinea) or previous EBOV outbreaks. We found reports that up to 26.6% (155/582) of study respondents in Democratic Republic of Congo believed mosquito bite avoidance would prevent EBOV; in other locations of previous large-scale EBOV outbreaks (e.g., Guinea), up to 61.0% (304/498) of respondents believed insects were involved in EBOV transmission. Our findings highlight the potential for errors to exist within the health information contained in Google's health-based Knowledge Panels. Such errors could perpetuate misconceptions or misinformation, leading to mistrust of health workers and aid agencies and in turn undermining public health education or outbreak response efforts.
  9. J Mov Disord. 2021 May 03.
      Objective: To evaluate the accuracy and quality of Korean videos associated with restless legs syndrome (RLS) on YouTube.Methods: A YouTube search was performed on April 1, 2020 using the term "restless legs syndrome" in the Korean language. Two reviewers coded the source, content, and demographics of the included videos. Video quality was assessed using the modified DISCERN (mDISCERN) instrument.
    Results: Among the 80 videos analyzed, 44 (55.0%) were reliable, and 36 (45.0%) were misleading. There was a trend toward a higher number of mean daily views in the misleading videos than in the reliable videos. Most of the misleading videos (72.2%) advocated complementary and alternative medicine as a primary treatment for RLS. Although the reliable videos had higher mDISCERN scores than the misleading videos, the overall quality of the reliable videos was low.
    Conclusion: Many Korean videos regarding RLS on YouTube involve a risk of exposure to misinformation and are of unsatisfactory quality.
    Keywords:  Internet; Korea; Restless legs syndrome; YouTube
  10. Cureus. 2021 Mar 24. 13(3): e14085
      OBJECTIVE: In our study, we aim to evaluate in terms of patients the quality and reliability of the most relevant and most-watched videos uploaded on YouTube about pancreatic cancer.METHOD: Before starting the study, YouTubeTM search terms were determined by consensus by two General Surgeons. Then, on 10/10/2020, the terms such as "pancreatic cancer", "diagnosis of pancreatic cancer" and "treatment of pancreatic cancer" were entered separately in the search bar of YouTube, "relevance" was selected among the filtering options and the most viewed videos were listed. The videos were evaluated with the Global Quality Scale (GQS), the DISCERN scoring system (Quality Criteria for Consumer Health Information,, and video power index.
    RESULTS:  Among the 50 videos analysed, 19 videos were uploaded by hospital channels, 17 videos by health channels, seven videos by patients, four videos by blog channels, and three videos by doctors. The mean GQS score of the first researcher was 3.24 ± 0.99 and the mean GQS score of the second researcher was 3.18 ± 0.88 with a significantly high agreement between them (r= 0.628). The mean DISCERN score of the first researcher was 3.48 ± 0.77 and the mean DISCERN score of the second researcher was 3.46 ± 1.09 with a significantly high agreement between them (r= 0.814).
    CONCLUSION:  In our study, the majority of the videos were found to be of moderate quality. Healthcare professionals should be encouraged to upload more videos with useful content. However, we think that the uploaded videos should definitely go through a professional peer-review process before they are published.
    Keywords:  discern; gqs; pancreatic cancer; vpi; youtube
  11. Int J Spine Surg. 2021 Feb;15(1): 179-185
      BACKGROUND: YouTube is a readily accessible, non-peer-reviewed video-based platform serving as a major source of online medical information presently. The aim of the current article is to analyze the comprehensiveness and reliability of the videos related to lumbar spinal fusion available on YouTube.METHODS: A YouTube search was conducted to analyze videos on lumbar spinal fusion using the search terms lumbar fusion, spinal fusion, and lumbar interbody fusion. Consequently, 107 videos met the inclusion criteria and were short-listed. Videos were analyzed for video information data, including views, likes and dislikes, views per day, likes per day and likes per view, and reliability and comprehensiveness scores.
    RESULTS: Of the 107 videos included in the study, a majority (75.7%) were found to be poor in comprehensiveness. There was no correlation found between video information data and reliability and comprehensiveness scores.
    CONCLUSIONS: Patients browsing YouTube for additional medical information on lumbar spinal fusion will be presented with large volumes of poor-quality data with a majority of videos lacking important preoperative and postoperative information.
    CLINICAL RELEVANCE: The current study provides both patients and physicians with an opportunity to understand the limitations of online content on lumbar spinal fusion available on YouTube. This knowledge about online medical information may further enhance the quality of patient-physician interaction and understanding.
    Keywords:  Internet; YouTube; online medical information; reliability; spinal fusion; spine surgery
  12. Digit J Ophthalmol. 2021 ;27(1): 6-12
      Purpose: To identify the information sources for patients undergoing laser vision correction.Methods: Individuals who underwent corneal refractive surgery at a private practice from December 2017 to August 2018 and agreed to complete an anonymous questionnaire were included. The manifest refraction and surgical method was recorded and correlated with the questionnaire results.
    Results: Data collected from 126 patients (mean age, 32.8 ± 8.6 years; 55.6% women) were analyzed. Of 121 patients, 120 (99.2%) identified the Internet as a source for information on refractive surgery, and 71 of 119 (59.7%) noted that the clinic's website influenced their choice of clinic. Patients with high myopia more commonly used contact lenses and had considered undergoing refractive surgery for a longer time compared with patients with other refractive errors (P < 0.01 and P < 0.01, resp.). Patients with hyperopia were less likely to know their own refractive error (P = 0.02).
    Conclusions: In this patient cohort, the Internet was the main source of information for those undergoing refractive surgery.
  13. Eye Contact Lens. 2021 Apr 27.
      OBJECTIVES: To evaluate the quality, reliability, and educational content of YouTube videos related to soft contact lenses (CL).METHODS: An online YouTube search was performed for the terms contact lens and other common CL-related terms contact lens insertion and removal, contact lens wearing, and contact lens care. The first 50 videos were evaluated for each term. Videos were evaluated using three checklists (the modified DISCERN criteria, the Journal of the American Medical Association [JAMA] criteria, and Global Quality Score [GQS]). Video popularity was also evaluated using the video power index (VPI). Videos were classified into three groups according to the source of the upload; group 1: universities/occupational organizations, group 2: medical ad/profit-oriented companies, and group 3: independent users.
    RESULTS: From among the 200 videos analyzed, 79 were included. The mean mDISCERN score of the videos was 2.34±1.39, the mean JAMA score was 1.20±0.99, and the mean GQS value was 3.47±1.28. There were positive correlations between the three checklists (P<0.001). Video power index was not correlated with each score. The videos in group 1 (13.9%) had the highest scores whereas videos in group 3 (41.8%) had the lowest scores. There was no significant difference between the video sources according to the VPI.
    CONCLUSION: Although some YouTube videos contain useful information for CL wearers, most videos have poor quality and reliability and contain insufficient information. Eye care providers should be aware of these sources and steer CL users to information sources that provide accurate and reliable information and do not contain misleading information.
  14. Spine J. 2021 Apr 23. pii: S1529-9430(21)00194-7. [Epub ahead of print]
      BACKGROUND CONTEXT: The NASS spine fellowship directory is an established resource that provides applicants with access to important information about different fellowship programs. Additionally, some programs have created websites to provide information about their fellowship program. There has been limited research on the amount and breadth of information provided by these different resources.PURPOSE: To assess and compare the scope of information provided by the North American Spine Society (NASS) fellowship directory and individual fellowship program websites.
    STUDY DESIGN/SETTING: Web Content Accessibility Study PATIENT SAMPLE: There were no patient data used in this study. All reported data were accessed from public websites and the NASS fellowship directory (August 2022 fellowships).
    OUTCOME MEASURES: Outcome measures were reported as the presence or lack thereof of 22 topics pertaining to the specifics of each individual spine fellowship program on both the NASS fellowship directory and individual fellowship program websites.
    METHODS: The NASS fellowship directory (August 2022 fellowships) and individual program websites were evaluated by two independent reviewers. Program websites were identified via Google search with a systematic protocol. Within each platform, the availability of various data were recorded. Twenty-four different data points were assessed for each program and were categorized into four main categories - general program information, fellow profiles, clinical roles, and non-clinical roles of the fellow. Chi-squared tests were used to compare differences in the availability of specific data provided by the NASS fellowship directory and individual program websites.
    RESULTS: Seventy-four fellowship programs were identified. The NASS fellowship directory was more likely to provide information about the application process, a description of the program, fellow salary, faculty members, case descriptions, and research requirements (p<0.05). The program websites were more likely to provide information about current and previous fellows - including a list of current fellow(s), their education/training, and a list of the previous fellows and their job choice (p<0.05). Program websites were also more likely to discuss rotation schedules, clinic expectations, research opportunities, journal club, institutional meetings, sponsored national meetings, and current/previous research (p<0.05). However, certain information, including specific clinical responsibilities (e.g. rotation schedule, call expectations, clinic expectations) and the profiles of current and previous fellows, were not well represented on either platform.
    CONCLUSIONS: There were significant differences in the type of information provided by the NASS fellowship directory and program websites. Furthermore, there were key pieces of information that were not well represented on either platform.
    Keywords:  COVID-19; Education; North American Spine Society; Spine fellowship; Spine surgery; Web Accessibility
  15. J Vet Med Educ. 2021 Mar 10. e20200075
      Online resources are being increasingly used by veterinary students to complement their learning. However, their use by veterinary students, especially for cardiology learning, remains poorly understood. This article investigates the extent to which clinical veterinary students use online resources to study cardiology and whether this is affected by factors of gender, age, year of study or entry status. This was a questionnaire-based study distributed to clinical veterinary students across eight UK universities achieving 213 respondents. The lecturer was the most preferred resource except for direct entry students and students aged 27 or more who preferred recommended textbooks. 95.3% of students use search engines to research cardiology topics and 93.4% indicated that they would first search for answers online rather than contacting their instructor. Online video clips were popular as 71.8% of students accessed them at least once per week for cardiology learning. 89.3% of those students found online videos useful for understanding cardiological concepts. Social media was only rarely used (6.6%) to discuss cardiology information. Nonetheless, most students (64.3%) stated that they would enjoy interacting with course material on an instructor-led social media page. Despite most students (62%) not automatically trusting online resources only 46.9% of students indicated that they verify online cardiology information. Online resources play an important role in complementing traditional resources in cardiology learning and suggest that some level of academic oversight may be necessary to ensure students use these resources in an appropriate manner.
    Keywords:  cardiology; clinical veterinary education; digital literacy; e-learning; online resources; open educational resources; social media as a teaching tool
  16. Int J Environ Res Public Health. 2021 Apr 11. pii: 4009. [Epub ahead of print]18(8):
      Amid the COVID-19 pandemic, digital health literacy (DHL) has become a significant public health concern. This research aims to assess information seeking behavior, as well as the ability to find relevant information and deal with DHL among university students in Pakistan. An online-based cross-sectional survey, using a web-based interviewing technique, was conducted to collect data on DHL. Simple bivariate and multivariate linear regression was performed to assess the association of key characteristics with DHL. The results show a high DHL related to COVID-19 in 54.3% of students. Most of the Pakistani students demonstrated ~50% DHL in all dimensions, except for reliability. Multivariate findings showed that gender, sense of coherence and importance of information were found to be significantly associated with DHL. However, a negative association was observed with students' satisfaction with information. This led to the conclusion that critical operational and navigations skills are essential to achieve COVID-19 DHL and cope with stress, particularly to promote both personal and community health. Focused interventions and strategies should be designed to enhance DHL amongst university students to combat the pandemic.
    Keywords:  COVID-19; COVID-HL-Q; Pakistan; digital health literacy; eHealth literacy; sense of coherence
  17. Database (Oxford). 2021 Apr 30. pii: baab022. [Epub ahead of print]2021
      To date, research on inflammatory bowel disease (IBD, encompassing Crohn's disease and ulcerative colitis), a chronic complex disorder, has generated a large amount of data scattered across published literature (1 06 333) listed in PubMed on 14 October 2020, and no dedicated database currently exists that catalogues information on genes associated with IBD. We aimed to manually curate 289 genes that are experimentally validated to be linked with IBD and its known phenotypes. Furthermore, we have developed an integrated platform providing information about different aspects of these genes by incorporating several resources and an extensive text-mined knowledgebase. The curated IBD database (IBDDB) allows the selective display of collated 34 subject-specific concepts (listed as columns) exportable through a user-friendly IBDDB portal. The information embedded in concepts was acquired via text-mining of PubMed (manually cleaned and curated), accompanied by data-mining from varied resources. The user can also explore different biomedical entities and their co-occurrence with other entities (about one million) from 11 curated dictionaries in the indexed PubMed records. This functionality permits the user to generate and cross-examine a new hypothesis that is otherwise not easy to comprehend by just reading the published abstracts and papers. Users can download required information using various file formats and can display information in the form of networks. To our knowledge, no curated database of IBD-related genes is available so far. IBDDB is free for academic users and can be accessed at
  18. Entropy (Basel). 2021 Apr 11. pii: 449. [Epub ahead of print]23(4):
      Traditional information retrieval systems return a ranked list of results to a user's query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become ambiguous. For a search query, the user has to skim over the document to infer if the word has the same meaning they are after, which is a time-consuming task. It is hoped that clustering the retrieved documents will collate documents into clear and meaningful groups. In this paper, we use an enhanced k-means clustering algorithm, which yields a faster clustering time than the regular k-means. The algorithm uses the distance calculated from previous iterations to minimize the number of distance calculations. We propose a system to cluster Arabic search results using the enhanced k-means algorithm, labeling each cluster with the most frequent word in the cluster. This system will help Arabic web users identify each cluster's topic and go directly to the required cluster. Experimentally, the enhanced k-means algorithm reduced the execution time by 60% for the stemmed dataset and 47% for the non-stemmed dataset when compared to the regular k-means, while slightly improving the purity.
    Keywords:  Arabic; clustering algorithms; enhanced k-means; information retrieval; text mining; web search
  19. Microsc Microanal. 2021 Apr 28. 1-17
      This work introduces NexusLIMS, an electron microscopy laboratory information management system designed and implemented by the Office of Data and Informatics and the Materials Science and Engineering Division at NIST for a multi-user electron microscopy co-op facility. NexusLIMS comprises network infrastructure, shared information technology resources, a custom software package to harvest and extract experimental information and construct experimental metadata records, and an intuitive web-based user-facing interface for searching, browsing, and examining research data. These metadata records conform to the Nexus Experiment schema, which is introduced in this work. The NexusLIMS suite of tools requires minimal input and adjustments to user behavior, instead relying on existing organizational procedures and the collection of information from a multitude of sources to construct a complete picture and record of a research experiment. The underlying infrastructure and design considerations for a multi-user data management system are discussed. The core codebase for NexusLIMS is made publicly available alongside this work, and its modular design encourages the adaptation of the presented methods at other research organizations.
    Keywords:  data management; electron microscopy; laboratory information management; metadata extraction; open source software; user facility
  20. JAMIA Open. 2021 Apr;4(2): ooab025
      Objective: We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available research on Information Extraction from German medical texts.Materials and Methods: BRONCO consists of 200 manually deidentified discharge summaries of cancer patients. Annotation followed a structured and quality-controlled process involving 2 groups of medical experts to ensure consistency, comprehensiveness, and high quality of annotations. We present results of several state-of-the-art techniques for different IE tasks as baselines for subsequent research.
    Results: The annotated corpus consists of 11 434 sentences and 89 942 tokens, annotated with 11 124 annotations for medical entities and 3118 annotations of related attributes. We publish 75% of the corpus as a set of shuffled sentences, and keep 25% as held-out data set for unbiased evaluation of future IE tools. On this held-out dataset, our baselines reach depending on the specific entity types F1-scores of 0.72-0.90 for named entity recognition, 0.10-0.68 for entity normalization, 0.55 for negation detection, and 0.33 for speculation detection.
    Discussion: Medical corpus annotation is a complex and time-consuming task. This makes sharing of such resources even more important.
    Conclusion: To our knowledge, BRONCO is the first sizable and freely available German medical corpus. Our baseline results show that more research efforts are necessary to lift the quality of information extraction in German medical texts to the level already possible for English.
    Keywords:  German language; corpus annotation; medical information extraction