bims-librar Biomed News
on Biomedical librarianship
Issue of 2020‒05‒31
twenty papers selected by
Thomas Krichel
Open Library Society


  1. Nature. 2020 05;581(7809): 371-374
      
    Keywords:  Information technology; Media; SARS-CoV-2; Vaccines
    DOI:  https://doi.org/10.1038/d41586-020-01452-z
  2. Syst Rev. 2020 May 26. 9(1): 116
      Meta-analysts rely on the availability of data from previously conducted studies. That is, they rely on primary study authors to register their outcome data, either in a study's text or on publicly available websites, and report the results of their work, either again in a study's text or on publicly accessible data repositories. If a primary study author does not register data collection and similarly does not report the data collection results, the meta-analyst is at risk of failing to include the collected data. The purpose of this study is to attempt to locate one type of meta-analytic data: findings from studies that neither registered nor reported the collected outcome data. To do so, we conducted a large-scale search for potential studies and emailed an author query request to more than 600 primary study authors to ask if they had collected eligible outcome data. We received responses from 75 authors (12.3%), three of whom sent eligible findings. The results of our search confirmed our proof of concept (i.e., that authors collect data but fail to register or report it publicly), and the meta-analytic results indicated that excluding the identified studies would change some of our substantive conclusions. Cost analyses indicated, however, a high price to finding the missing studies. We end by reaffirming our calls for greater adoption of primary study pre-registration as well as data archiving in publicly available repositories.
    DOI:  https://doi.org/10.1186/s13643-020-01376-9
  3. Rev Alerg Mex. 2020 Jan-Mar;67(1):67(1): 62-72
      Systematic reviews are secondary investigations that compile published results that have been obtained from studies involving human subjects. Meta-analysis is the term used to describe the carrying out of statistical analysis of the combination of the results of two or more original studies, which had to be selected from a systematic review. In this way, a meta-analysis cannot exist without a systematic review. Systematic reviews arise due to the exponential increase in the information; to provide all health personnel with a study that critically analyzes the results and discriminates those that may be useful in clinical practice. Systematic reviews are one of the fundamental tools in evidence-based medicine, in which two of the main steps refer to both the search and the critical analysis of the studies, which shall support medical decisions on aspects that are mainly related to diagnosis, treatment, or prognosis. On the other hand, systematic reviews have been essential for some time now when developing evidence-based clinical practice guidelines and they can be used to make decisions on health policies. The methodology for performing and interpreting systematic reviews and meta-analysis is described in this article.
    Keywords:  Evidence-based medicine; Meta-analysis; Systematic review
    DOI:  https://doi.org/10.29262/ram.v67i1.733
  4. J Evid Based Med. 2020 May 24.
      OBJECTIVE: In the Middle East and North Africa (MENA), data are produced in languages other than English and available through gray literature sources. We assessed the comprehensiveness of literature search strategies of systematic reviews (SRs) reporting population health primary data on MENA.METHODS: Utilizing the registered protocol (PROSPERO CRD42017076736), we conducted a meta-research analysis on a cohort of SRs (systematic PubMed search: from 2008 to 2016) and evaluated their search strategies following AMSTAR recommendations.
    RESULTS: A total of 379 SRs were included. Few SRs (10.3%, n = 39) conducted a comprehensive literature search including at least two databases, reference lists of included primary studies, gray literature sources, and no language restriction. Nevertheless, 90.5% (n = 343) searched at least two databases and 67.0% (n = 254) searched gray literature sources. Authors from MENA searched statistically more for gray literature than authors from Western countries (P = 0.022). Reference lists of the included studies were searched in 40.4% (n = 153) of the SRs. Searching the reference lists was positively associated with searching for gray literature (P < 0.001). Only 38.8% (n = 147) of the SRs had no language restriction or searched in English and in at least one language relevant to MENA, whereas 27.2% (n = 103) did not report this information.
    CONCLUSIONS: Literature searches for SRs reporting population health data on MENA were limited in reporting quality, language restrictions, and lack of reference list searches. This was probably due to lack of adherence to the reporting guidelines. To ensure compilation of optimum evidence, expanding literature searches to reference list search and for additional languages relevant to MENA are required.
    Keywords:  Africa; gray literature, Middle East, research design, systematic reviews
    DOI:  https://doi.org/10.1111/jebm.12394
  5. Health Info Libr J. 2020 May 25.
      BACKGROUND: Examination stress is a prevalent mental health disorder among college students in response to academic life.OBJECTIVES: The study aimed to explore cognitive-behavioural therapy (CBT) and study skills training based bibliotherapy as an effective way to help female undergraduates to better cope with examination stress.
    METHODS: A total of 121 students were randomly allocated to an experimental group or control group. Students in the experimental group used self-help materials, as a bibliotherapy intervention, over 16 weeks. Students in the control group received no treatment. The students' examination stress levels were assessed, before and after the intervention using the Revised Test Anxiety scale. Data collected were investigated and analysed using t-tests.
    RESULTS: There was a significant decrease in examination stress scores of students from the experimental group as compared with the control group.
    DISCUSSION: The intervention model efficiently diminished the symptoms of examination stress of undergraduates in practice. Findings can be used as a reference for developing non-clinical techniques to overcome examination anxiety.
    CONCLUSION: Findings have revealed that combined CBT with academic skills improvement based bibliotherapy may be efficient in lowering examination stress for female undergraduates. Librarians can contribute to improving the health of their societies.
    Keywords:  libraries, health science; mental health; students
    DOI:  https://doi.org/10.1111/hir.12312
  6. Int J Med Inform. 2020 May 19. pii: S1386-5056(19)30874-3. [Epub ahead of print]140 104175
      OBJECTIVE: This research examines how YouTube recommends vaccination-related videos.MATERIALS AND METHODS: We used a social network analysis to evaluate how YouTube recommends vaccination related videos to its users.
    RESULTS: More pro-vaccine videos (64.75%) than anti-vaccine (19.98%) videos are on YouTube, with 15.27% of videos being neutral in sentiment. YouTube was more likely to recommend neutral and pro-vaccine videos than anti-vaccine videos. There is a homophily effect in which pro-vaccine videos were more likely to recommend other pro-vaccine videos than anti-vaccine ones, and vice versa.
    DISCUSSION: Compared to our prior study, the number of recommendations for pro-vaccine videos has significantly increased, suggesting that YouTube's demonization policy of harmful content and other changes to their recommender algorithm might have been effective in reducing the visibility of anti-vaccine videos. However, there are concerns that anti-vaccine videos are less likely to lead users to pro-vaccine videos due to the homophily effect observed in the recommendation network.
    CONCLUSION: The study demonstrates the influence of YouTube's recommender systems on the types of vaccine information users discover on YouTube. We conclude with a general discussion of the importance of algorithmic transparency in how social media platforms like YouTube decide what content to feature and recommend to its users.
    Keywords:  Disinformation; Misinformation; Network analysis; Recommender algorithm; Social media; Vaccination; YouTube
    DOI:  https://doi.org/10.1016/j.ijmedinf.2020.104175
  7. Adv Exp Med Biol. 2020 ;1196 63-72
      Cancer is considered as one of the main challenges of modern healthcare systems. Cancer patients are obliged to cope with the uncertainty of disease progression. Their anxiety regarding said uncertainty is intensified because they need to constantly make decisions concerning the management of their disease. Information and communication are considered important in cancer management. As a result, the research associated with the impact of healthcare information-seeking behavior on numerous cancer management aspects has intensified and grown in astonishing rates. This work concentrates on the interplay of oncological patients' information-seeking behavior regarding their long-term prognosis. Therefore, a conceptual framework is proposed that identifies and associates several clinical, socio-demographic, psychological, and information-seeking behavioral factors that are likely to be linked with patients' health outcomes.
    Keywords:  Cancer patients; Information-seeking behavior; Long-term prognosis
    DOI:  https://doi.org/10.1007/978-3-030-32637-1_6
  8. Rev Recent Clin Trials. 2020 May 24.
      BACKGROUND: In the past, most people sought medical information by consulting heath care professionals. Nowadays, many people started to use online resources to access medical information.OBJECTIVE: The study aims to investigate whether YouTube videos on hemorrhoids and hemorrhoid surgery can be a useful e-learning source for the general population, surgical trainees and specialists.
    METHODS: A YouTube search was performed in October 2019 using the keywords "hemorrhoids" and "hemorrhoid surgery", and the videos were divided into 2 groups according to the keywords. Three independent researchers assessed the metadata and classified them according to the level of accuracy (hemorrhoid group) and to the level of usefulness (hemorrhoid surgery group). Cohen's test and Kappa (K) value was used to evaluate the inter-investigators agreement.
    RESULTS: A total of 200 videos were analyzed, 100 for each keyword. Regarding hemorrhoid group, 43 videos (48.3%) were misleading, 9 were accurate (10.1%), 18 were approximate (20.2%), and 19 were considered a personal experience (21.4%). Regarding hemorrhoid surgery group, around 60% of the videos were lacking clear explanation, while about 16% were inaccurate. Only the remaining 24% were considered useful for teaching.
    CONCLUSION: Around half of the YouTube videos regarding hemorrhoids topic are misleading or inaccurate and present a risk of harmful consequences. Credible videos with accurate information need to be uploaded by medical professionals and medical institutions and some sort of filtering using categories by the staff of YouTube appear to be necessary. Care must be taken to produce clear highquality operative clips with generous scientific commentary.
    Keywords:  YouTube; e-learning; heath care professionals.; hemorrhoids; medical education; social media
    DOI:  https://doi.org/10.2174/1574887115666200525001619
  9. J Pediatr. 2020 Jun;pii: S0022-3476(20)30289-4. [Epub ahead of print]221 215-223.e5
      OBJECTIVE: To assess the role of trust when adolescents search for and appraise online health information.STUDY DESIGN: A systematic search of online databases (MEDLINE, EMBASE, PsycINFO, and ERIC) was performed. Google Scholar and reference lists for included studies were manually searched for additional articles. Studies were included if they examined the role of trust when adolescents (in the 13- to 18-year-old age range) searched for and/or appraised online health information. Findings were synthesized using thematic analysis.
    RESULTS: There were 22 studies that met the inclusion criteria. Four key themes were identified: adolescents generally distrust the Internet but use it anyway (subthemes were why adolescents distrust online health information; why adolescents still use online health information), adolescents use heuristics to appraise the trustworthiness of online health information (subthemes were different heuristics used by different adolescents, range of heuristics used by adolescents), adolescents trust websites more than social media or social networking sites, and adolescents' level of trust in online health information guides their actions and responses.
    CONCLUSIONS: Adolescents often distrust health information from the Internet, but continue to use it. Adolescents are aware of the need to evaluate the trustworthiness of online health information; however, their approaches vary in sophistication. As the reach and content of the Internet expands, it is important to equip adolescents with effective eHealth literacy to assess the trustworthiness of online health information.
    Keywords:  digital health literacy; eHealth literacy; health education; information seeking behavior; internet
    DOI:  https://doi.org/10.1016/j.jpeds.2020.02.074
  10. Asian Pac J Cancer Prev. 2020 May 01. pii: 89071. [Epub ahead of print]21(5): 1357-1362
      OBJECTIVE: Cancer survivors have various health care needs and are willing to be proactive with their health maintenance. Online information would be a useful resource to guide cancer survivors and their family members. Therefore, identifying the factors that influence Internet searching behaviors among cancer survivors and their family members is a first step toward providing better health care services for cancer care.METHODS: We performed focus group interviews that were based on the Theory of Planned Behavior, with thirty-one participants to explore factors related to Internet search behaviors among cancer survivors and their family members.
    RESULTS: Six themes were identified in the analysis of participant interviews. Attitudes toward searching for health information on the Internet included the themes "Fulfilling unmet needs" and "Confirmation through second opinion." Themes related to social norms included "a required step for sure" and "helping each other." In terms of perceived behavioral control, themes included "difficult to choose because of being 'overwhelmed with information,'" and "complex searching milieu."
    CONCLUSION: It was clear that cancer survivors and their family members had unmet needs for maintaining their health status. They wanted to be informed and actively involved in the decision-making process regarding health management. Consultation and education provided to patients by doctors should not only include information on diet and nutrition but also information on the resulting complications to satisfy their need for reliable health information.
    Keywords:  Health Information; Theory of Planned Behavior  ; cancer survivors; the Internet
    DOI:  https://doi.org/10.31557/APJCP.2020.21.5.1357
  11. J Med Internet Res. 2020 May 27.
      BACKGROUND: In case of a population-wide infectious disease outbreak, such as the novel coronavirus disease (COVID-19), people's online activities could significantly affect the public concerns and health behaviors due to difficulty in accessing credible information from reliable sources, which in turn causes people to seek necessary information on the web. Therefore, measuring and analyzing online health communication and public sentiment is essential for establishing effective and efficient disease control policies, especially in the early stage of the outbreak.OBJECTIVE: This study aimed to investigate the trends of online health communication, analyze the focus of people's anxiety in the early stages of COVID-19, and evaluate the appropriateness of online information.
    METHODS: From NAVER, the most popular Korean web portal, 13,148 questions and 29,040 answers related to COVID-19 from 01/20/2020 to 03/02/2020 were collected. Three main methods were used in this study: 1) the structural topic model was used to examine the topics in the online questions, 2) word network analysis was conducted to analyze the focus of people's anxiety and worry in the questions, and 3) two medical doctors assessed the appropriateness of the answers to the questions, which were primarily related to people's anxiety.
    RESULTS: Fifty topics and six cohesive topic communities were identified from the questions. Among them, topic community No. 4 (suspecting COVID-19 infection after developing a particular symptom) accounted for the largest portion of the questions. As the number of confirmed patients increased, the proportion of topics belonging to topic community No. 4 also increased. Additionally, the prolonged situation led to a slight increase in the proportion of topics related to job issues. People's anxieties and worries were closely related with physical symptoms and self-protection methods. While relatively appropriate to suspect physical symptoms, a high proportion of answers related to self-protection methods were assessed as misinformation or advertisements.
    CONCLUSIONS: Search activity for online information regarding the COVID-19 outbreak has been active. Many of the online questions were related to people's anxieties and worries. A considerable portion of corresponding answers had false information or were advertisements. The study results could contribute reference information to various countries that need to monitor public anxiety and provide appropriate information in the early stage of an infectious disease outbreak, including COVID-19. Our research also contributes to developing methods for measuring public opinion and sentiment in an epidemic situation based on natural language data on the Internet.
    CLINICALTRIAL:
    DOI:  https://doi.org/10.2196/19455
  12. Pediatr Crit Care Med. 2020 May 27.
      OBJECTIVE: To describe the impact of a strategy for international collaboration and rapid information dissemination on Twitter among the pediatric critical care community during a global pandemic.DESIGN: Analysis of #PedsICU and coronavirus disease 2019 Twitter data in the Symplur Signals Database between February 1, 2020, and May 1, 2020.
    SETTING: Social media platform Twitter.
    PATIENTS: None.
    INTERVENTIONS: Promotion of the joint usage of #PedsICU and #COVID19 throughout the international pediatric critical care community in tweets relevant to the coronavirus disease 2019 pandemic and pediatric critical care.
    MEASUREMENTS AND MAIN RESULTS: We collected data on all tweets containing the hashtag #PedsICU in addition to those containing both #PedsICU and coronavirus disease 2019 hashtags. Tweets including #PedsICU were shared 49,865 times on six continents between February 1, 2020, and May 1, 2020; between February 1 and March 13, only 8% of #PedsICU tweets included a coronavirus disease 2019 hashtag. After a sharp rise during the week of March 14, 2020, coronavirus disease 2019 content has dominated the #PedsICU conversation on Twitter, comprising 69% of both #PedsICU tweets and impressions (p < 0.001). The most commonly used coronavirus disease 2019 hashtag over the study period was #COVID19 (69%). Proportionately, a greater percentage of #PedsICU tweets including the coronavirus disease 2019 hashtag (vs not) had images or videos (45% vs 41%; p < 0.001). In addition, non-physician healthcare providers were the largest group of users (46%) of the combination of #PedsICU and coronavirus disease 2019 hashtags. The most popular tweets shared on Twitter were open-access resources, including links for updated literature, narrative reviews, and educational videos relevant to coronavirus disease 2019 clinical care. Concurrent hashtags and words in tweets containing #PedsICU and coronavirus disease 2019 hashtags spanned several different disciplines and topics in pediatric critical care.
    CONCLUSION: Twitter has been used widely for real-time information sharing and collaboration among the international pediatric critical care community during the coronavirus disease 2019 pandemic. Targeted use of #PedsICU and #COVID19 for engagement on Twitter is a conduit to combat misinformation and optimize reach to pediatric critical care stakeholders across the globe when rapid dissemination is needed.
    DOI:  https://doi.org/10.1097/PCC.0000000000002474
  13. J Cancer Educ. 2020 May 29.
      The incidence of thyroid cancer continues to increase worldwide. The challenge facing the treatment of thyroid cancers is related to the fact that this disease exhibits a broad range of clinical behaviour from indolent tumour to very aggressive malignancies. Therefore, the public and patient education about thyroid cancer are becoming a crucial step in facing the challenge imposed by this cancer. Currently, social media channels such as YouTube, a video-sharing website on the Internet, play a significant role in public and patient education. Research on the part of YouTube in public education about cancer is on the rise, including a paper on thyroid cancer published recently in the Journal of Cancer Education. However, researchers conducting studies should use tools that are designed to assess videos, not written information or websites. DISCERN instrument and JAMA benchmark tools are not intended to evaluate videos such as those of YouTube. An ideal instrument for assessing YouTube should cover several parameters to identify educationally useful videos, including (i) scientific accuracy of video content, (ii) clarity of the massage given, (iii) authority (creator), (iv) pedagogy and educational basis and (v) technical design, including quality images and good visuals, production style, quality scripts, clear sounds and no noises in the background. While video images help in reinforcing the words and the message, both picture and sound quality are vital in creating a strong mental impression and engaging the audience. This commentary calls for the development of a standardised protocol that can help researchers to enhance their research publications and ensure that adequate and accurate data have been collected and analysed. Such a move will help us in establishing the right literature in this area and will help researchers conducting reviews and meta-analysis on YouTube videos.
    Keywords:  DISCERN instrument; JAMA benchmark tools; Methodology; Patient education; Thyroid cancer; YouTube videos
    DOI:  https://doi.org/10.1007/s13187-020-01763-9
  14. Cephalalgia. 2020 May 27. 333102420927027
      INTRODUCTION: The most common and multifaceted migraine aura symptoms are visual disturbances. Health information is one of the most popular topics on the internet but the quality and reliability of publicized information is unknown. The aim of this study was to analyze images of migraine aura on Google to determine the frequency of correct presentations of visual aura and distribution of visual aura phenotypes.METHODS: Two authors screened the 100 highest indexed migraine aura related images on Google. The content of the images was categorized into elementary visual symptoms.
    RESULTS: Forty out of 100 images were accurate representations of visual migraine aura. Such images included 31 different visual aura phenotypes. The majority had more than one elementary visual symptom (median 2, IQR 1-3), most commonly "bean-like" forms (45%), zigzag lines (40%), and foggy/blurred vision (33%).
    DISCUSSION: Forty percent of images were accurate portrayals of visual migraine aura symptoms, but these presented limited phenotypes. The information derived from the internet photos may hinder the effective recognition of aura symptoms. Thus, there is a need to provide a more comprehensive representation of visual migraine aura symptoms on the internet.
    Keywords:  Google; Scotoma; cortical spreading depression; images; representations; symptoms
    DOI:  https://doi.org/10.1177/0333102420927027
  15. Headache. 2020 May 28.
      BACKGROUND: Although migraine is recognized as one of the most common and disabling diseases in the world, it is nonetheless still underestimated, underdiagnosed, and undertreated. The fact that migraine patients often tend to access the Web to search for headache-related information hinders patient-doctor relationships and one should also bear in mind that, unfortunately, text readability and medical literacy in the overall population may be the reason why patients' understanding of health information is compromised.AIM: We aimed to assess the readability of the home page of the top 10 patient - oriented migraine-related websites and the educational level required to be in a position to broach them.
    METHODS: On April 15, 2018, we conducted a descriptive study on the international version of Google by entering the words "headache" and "migraine." We then analyzed the overall level of readability of texts of the home pages of the top 10 patient-oriented websites, by means of the Simple Measure of Gobbledygook Readability Calculator.
    RESULTS: Entering "headache" on the home pages of the top 10 patient-oriented websites on Google we found that to understand these particular websites with ease, an average grade level of 12.4 (±1.5 standard deviation, SD) and an average 13.3 years of formal education (±1.7 SD) were required. Similarly, typing "migraine" on Google we found an average grade level of 10.8 (±1.2 SD) and an average of 12.5 years of formal education (±1.9 SD) were required. The most frequently viewed websites all failed to meet the USA National Institutes of Health guidelines, which recommend a range between 6th and 7th grade level readability.
    DISCUSSION: The present study shows the low readability level resulting from the top 10 patient-oriented headache/migraine websites and the consequent barrier this creates in the dissemination of headache/migraine-related medical information. Although the actual physicians, both primary care physicians and headache specialists are the principal source of understandable headache-related information, only a minority of people consult these professionals. Given the foregoing, the majority of migraine patients is, therefore, unable to obtain adequate comprehensible health information on the Web. Furthermore, the existing gap between migraine-related website content readability and the unmet need for migraine patients to obtain pertinent and correct information might well contribute to the worldwide neglect of migraine as a major public health problem.
    CONCLUSION: Physician experts in headache and migraine should actively cooperate in planning informative material to establish what information patients need to know, how they should use it, and how readable that material actually is. Readability ought to be established before the final website publication. Plain language ought to be used and written messages should be supplemented with visual content such as simple drawings. We recommend the setting up of a new dynamic, modern, plain-talking, and efficient approach in communication aimed at catching the public's attention with its readability and thus satisfying a migraine and headache web scenario.
    DOI:  https://doi.org/10.1111/head.13818
  16. J Am Med Inform Assoc. 2020 May 29. pii: ocaa056. [Epub ahead of print]
      Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.
    Keywords:  NLP tools; lexical tools; lexicon; natural language processing; unified medical language system
    DOI:  https://doi.org/10.1093/jamia/ocaa056
  17. JMIR Med Inform. 2020 Apr 25.
      UNSTRUCTURED: Background: Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at extracting complex semantic relationships between entities in documents, which contain intra- and inter-sentence relations. Most previous methods do not consider dependency syntactic information across the sentences, which are very valuable for the relations extraction task, in particular for extracting the inter-sentence relations accurately. Methods: In this paper, we propose a novel end-to-end neural network based on the graph convolutional network (GCN) and multi-head attention. To improve the performance of inter-sentence relation extraction, we construct the document-level dependency graph to capture the dependency syntactic information across sentences. GCN is applied to capture the feature representation of the document-level dependency graph. The multi-head attention mechanism is employed to learn the relative important context features from different semantic subspaces. To enhance the input representation, the deep context representation is used in our model instead of traditional word embedding. Results: The experimental results show that our method achieves an F-score of 63.5% which is superior to other state-of-the-art methods. The GCN model can effectively exploit the across sentence dependency information to improve the performance of inter-sentence CDR extraction. Both the deep context representation and multi-head attention are helpful in CDR extraction task.
    DOI:  https://doi.org/10.2196/17638
  18. BMC Bioinformatics. 2020 May 27. 21(1): 217
      BACKGROUND: Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types.RESULTS: We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature abstracts. For this work, we defined a chemical reaction relationship as the transformation of chemical A to chemical B. We built and evaluated our system on small annotated sets of chemical reaction relationships from two corpora: curated bacteria-related abstracts from the MetaCyc database (MetaCyc_Corpus) and a more general set of abstracts annotated with MeSH (Medical Subject Headings) term Bacteria (Bacteria_Corpus; a superset of MetaCyc_Corpus). For the MetaCyc_Corpus, we obtained 84% precision and 41% recall (55% F1 score). Extending to the more general Bacteria_Corpus decreased precision to 62% with only a four-point drop in recall to 37% (46% F1 score). Overall, the Bacteria_Corpus contained two orders of magnitude more candidate chemical reaction relationships (nine million candidates vs 68,0000 candidates) and had a larger class imbalance (2.5% positives vs 5% positives) as compared to the MetaCyc_Corpus. In total, we extracted 6871 chemical reaction relationships from nine million candidates in the Bacteria_Corpus.
    CONCLUSIONS: With this work, we built a database of chemical reaction relationships from almost 900,000 scientific abstracts without a large training set of labeled annotations. Further, we showed the generalizability of our initial application built on MetaCyc documents enriched with chemical reactions to a general set of articles related to bacteria.
    Keywords:  Chemical reactions; Curation; Database; Snorkel; Text mining
    DOI:  https://doi.org/10.1186/s12859-020-03542-1
  19. J Am Med Inform Assoc. 2020 May 27. pii: ocaa117. [Epub ahead of print]
      OBJECTIVE: As COVID-19 started its rapid emergence and gradually transformed into an unprecedented pandemic, the need for having a knowledge repository for the disease became crucial. To address this issue, a new COVID-19 machine readable dataset known as COVID-19 Open Research Dataset (CORD-19) has been released. Based on this, our objective was to build a computable co-occurrence network embeddings to assist association detection amongst COVID-19 related biomedical entities.MATERIALS AND METHODS: Leveraging a Linked Data version of CORD-19 (i.e., CORD-19-on-FHIR), we first utilized SPARQL to extract co-occurrences among chemicals, diseases, genes, and mutations and build a co-occurrence network. We then trained the representation of the derived co-occurrence network using node2vec with four edge embeddings operations (L1, L2, Average, and Hadamard). Six algorithms (Decision Tree, Linear Regression, Support Vector Machine, Random Forest, Naive Bayes, and Multi-layer Perceptron) were applied to evaluate performance on link prediction. An unsupervised learning strategy was also developed incorporating the t-SNE and DBSCAN algorithms for case studies.
    RESULTS: Random Forest classifier showed the best performance on link prediction across different network embeddings. For edge embeddings generated using the Average operation, Random Forest achieved the optimal average precision of 0.97 and F1 score of 0.90. For unsupervised learning, 63 clusters were formed with silhouette score of 0.128. Significant associations were detected for five coronavirus infectious diseases in their corresponding subgroups.
    CONCLUSION: In this study, we constructed COVID-19-centered co-occurrence network embeddings. Results indicated that the generated embeddings were able to extract significant associations for COVID-19 and coronavirus infectious diseases.
    Keywords:  Association extraction; COVID-19; Co-occurrence network embeddings; Coronavirus infectious diseases
    DOI:  https://doi.org/10.1093/jamia/ocaa117
  20. BMC Cancer. 2020 May 24. 20(1): 462
      BACKGROUND: Urothelial cancer (UC) includes carcinomas of the bladder, ureters, and renal pelvis. New treatments and biomarkers of UC emerged in this decade. To identify the key information in a vast amount of literature can be challenging. In this study, we use text mining to explore UC publications to identify important information that may lead to new research directions.METHOD: We used topic modeling to analyze the titles and abstracts of 29,883 articles of UC from Pubmed, Web of Science, and Embase in Mar 2020. We applied latent Dirichlet allocation modeling to extract 15 topics and conducted trend analysis. Gene ontology term enrichment analysis and Kyoto encyclopedia of genes and genomes pathway analysis were performed to identify UC related pathways.
    RESULTS: There was a growing trend regarding UC treatment especially immune checkpoint therapy but not the staging of UC. The risk factors of UC carried in different countries such as cigarette smoking in the United State and aristolochic acid in Taiwan and China. GMCSF, IL-5, Syndecan-1, ErbB receptor, integrin, c-Met, and TRAIL signaling pathways are the most relevant biological pathway associated with UC.
    CONCLUSIONS: The risk factors of UC may be dependent on the countries and GMCSF, IL-5, Syndecan-1, ErbB receptor, integrin, c-Met, and TRAIL signaling pathways are the most relevant biological pathway associated with UC. These findings may provide further UC research directions.
    Keywords:  LDA2vec; Research trends; Text mining; Topic modeling; Urothelial carcinoma
    DOI:  https://doi.org/10.1186/s12885-020-06931-0