bims-librar Biomed News
on Biomedical librarianship
Issue of 2020–04–19
seventeen papers selected by
Thomas Krichel, Open Library Society



  1. Br J Nurs. 2020 Apr 09. 29(7): 431-435
      Undertaking a literature search can be a daunting prospect. Breaking the exercise down into smaller steps will make the process more manageable. This article suggests 10 steps that will help readers complete this task, from identifying key concepts to choosing databases for the search and saving the results and search strategy. It discusses each of the steps in a little more detail, with examples and suggestions on where to get help. This structured approach will help readers obtain a more focused set of results and, ultimately, save time and effort.
    Keywords:  Databases; Literature review; Literature search; Reference management software; Research questions; Search strategy
    DOI:  https://doi.org/10.12968/bjon.2020.29.7.431
  2. Health Commun. 2020 Apr 14. 1-13
      Online health information, as an emerging field in health communication research, has attracted close attention from researchers. To identify major determinants of why individuals seek health information online, we conducted a meta-analysis that systematically accumulates the existing research findings. To that end, by integrating three theories or models for examining information-seeking behavior, we developed a theoretical framework for the current meta-analysis that emphasizing psychosocial, instrumental, contextual, and demographic factors. By analyzing the effect sizes from 44 articles representing 54 empirical samples, we found that the quality, trustworthiness, and utility of online health information were the dominant predictors of seeking it and that instrumental factors were more important than psychological ones in determining whether individuals did so. Moreover, the development of information and communication technology, the sampling method, and the type of information sought significantly moderated pairwise relationships between determinants and seeking behavior, whereas culture did not. Herein, we discuss the theoretical implications of our findings as well as directions for future research.
    DOI:  https://doi.org/10.1080/10410236.2020.1748829
  3. J Med Internet Res. 2020 Apr 13. 22(4): e13369
       BACKGROUND: Despite increasing opportunities for acquiring health information online, discussion of the specific words used in searches has been limited.
    OBJECTIVE: The aim of this study was to clarify the medical information gap between medical professionals and the general public in Japan through health information-seeking activities on the internet.
    METHODS: Search and posting data were analyzed from one of the most popular domestic search engines in Japan (Yahoo! JAPAN Search) and the most popular Japanese community question answering service (Yahoo! Chiebukuro). We compared the frequency of 100 clinical words appearing in the clinical case reports of medical professionals (clinical frequency) with their frequency in Yahoo! JAPAN Search (search frequency) logs and questions posted to Yahoo! Chiebukuro (question frequency). The Spearman correlation coefficient was used to quantify association patterns among the three information sources. Additionally, user information (gender and age) in the search frequency associated with each registered user was extracted.
    RESULTS: Significant correlations were observed between clinical and search frequencies (r=0.29, P=.003), clinical and question frequencies (r=0.34, P=.001), and search and question frequencies (r=0.57, P<.001). Low-frequency words in clinical frequency (eg, "hypothyroidism," "ulcerative colitis") highly ranked in search frequency. Similarly, "pain," "slight fever," and "numbness" were highly ranked only in question frequency. The weighted average of ages was 34.5 (SD 2.7) years, and the weighted average of gender (man -1, woman +1) was 0.1 (SD 0.1) in search frequency. Some words were specifically extracted from the search frequency of certain age groups, including "abdominal pain" (10-20 years), "plasma cells" and "inflammatory findings" (20-30 years), "DM" (diabetes mellitus; 30-40 years), "abnormal shadow" and "inflammatory findings" (40-50 years), "hypertension" and "abnormal shadow" (50-60 years), and "lung cancer" and "gastric cancer" (60-70 years).
    CONCLUSIONS: Search and question frequencies showed similar tendencies, whereas search and clinical frequencies showed discrepancy. Low-clinical frequency words related to diseases such as "hypothyroidism" and "ulcerative colitis" had high search frequencies, whereas those related to symptoms such as "pain," "slight fever," and "numbness" had high question frequencies. Moreover, high search frequency words included designated intractable diseases such as "ulcerative colitis," which has an incidence of less than 0.1% in the Japanese population. Therefore, it is generally worthwhile to pay attention not only to major diseases but also to minor diseases that users frequently seek information on, and more words will need to be analyzed in the future. Some characteristic words for certain age groups were observed (eg, 20-40 years: "cancer"; 40-60 years: diagnoses and diseases identified in health examinations; 60-70 years: diseases with late adulthood onset and "death"). Overall, this analysis demonstrates that medical professionals as information providers should be aware of clinical frequency, and medical information gaps between professionals and the general public should be bridged.
    Keywords:  community question answering service; health knowledge; information-seeking behavior; internet; search engine
    DOI:  https://doi.org/10.2196/13369
  4. Yearb Med Inform. 2020 Apr 17.
       BACKGROUND: As Director of the US National Library of Medicine (NLM) for 30 years, Dr. Donald A. B. Lindberg was instrumental in bringing biomedical research and healthcare worldwide into the age of genomic and translational medicine through the informatics systems developed by the NLM. Lindberg opened free access and worldwide public dissemination of all the NLM's biomedical literature and databases, thus helping transform not only biomedical research like the Human Genome Project and its successors, but also the practices of medicine and healthcare internationally. Guiding, leading, and teaching-by-example at national, regional, and global levels of biomedical and healthcare informatics, Lindberg helped coalesce a dynamic discipline that provides a foundation for the human understanding which promotes the future health of our world.
    OBJECTIVES: To provide historical insight into the scientific, technological, and practical clinical accomplishments of Donald Lindberg, and to describe how this led to contributions in the worldwide interdisciplinary evolution of informatics, and its impact on the biosciences and practices of medicine, nursing, and other healthcare-related disciplines.
    METHODS: Review and comment on the publications, scientific contributions, and leadership of Donald Lindberg in the evolution of biomedical and health informatics which anticipate the vision, scholarship, research in the field, and represent the deeply ethical humanism he exhibited throughout his life. These were essential in producing the informatics systems, such as the Unified Medical Language System (UMLS), MEDLINE, PubMed, PubMed Central, and ClinicalTrials.gov, which, together with NLM training programs and conferences, made possible the interactions among researchers and practitioners leading to the past quarter-century of rapid and dramatic advances in biomedical scientific inquiry and clinical discoveries, openly shared across the globe.
    CONCLUSION: Dr. Lindberg was a uniquely talented physician and pioneering researcher in biomedical and health informatics. As the main leader in developing and funding innovative informatics research for more than 30 years as Director of the National Library of Medicine, he helped bring together the most creative interdisciplinary researchers to bridge the worlds of biomedical research, education, and clinical practice. Lindberg's emphasis on open-access to the biomedical literature through publicly shared computer-mediated methods of search and inquiry are seen as an example of ethical scientific openness.
    DOI:  https://doi.org/10.1055/s-0040-1701972
  5. Soc Sci Res. 2020 03;pii: S0049-089X(18)30839-1. [Epub ahead of print]87 102395
      Researchers often explore health (care) beliefs as a function of individual characteristics; yet, few consider the role of context in shaping both beliefs and the behaviors that are informed by them. As a sociopolitical construct, ethnoraciality provides a concerning source of bias in studies of health (care) beliefs because it inhabits both individual and contextual forms. This study examines whether the ethnoracial context of the residential area where sexual minorities live is associated with a particular health (care) belief - sources of trustworthy health information - and considers how ethnoracial group membership status differentiates these ecological associations drawing on mediation and moderation models. Using data from the 2010 Social Justice Sexuality Project, our analysis shows that sexual minorities who live with high concentrations of Latinos and Whites are less likely to rely exclusively on medical professionals for trustworthy health information than those who live with high concentrations of Blacks. Moreover, exclusive reliance on medical professionals for health information among Black and Latino sexual minorities is stronger in co-ethnic communities (predominately Black and Latino areas, respectively). The analysis also documents status and contextual differentials and status-context contingencies of reliance on the Internet, social networks, and multiple agents ("triangulation") as sources of health information. Findings suggest that place-based co-ethnic networks may facilitate disease prevention among Black and Latino sexual minorities by improving the quality of their relationships with sick role gatekeepers and breaking down the silos of the medical complex. The study concludes by considering the value of a place-based approach to alleviating health disparities among sexual minorities vis-à-vis the health care system.
    Keywords:  Community sociology; Health care; Medical sociology; Neighborhood effects; Race/ethnicity; Sexual minorities; Social psychology
    DOI:  https://doi.org/10.1016/j.ssresearch.2019.102395
  6. J Health Commun. 2020 Apr 14. 1-12
      Previous tailoring research has traditionally studied effects of system-initiated message content to match individual characteristics. Recently scholars have explored how tailoring health information to individual modality preferences and processing styles can increase message effectiveness. Using a web-based experiment among a representative sample of Internet users (N = 392; 25-86 years), this study investigated the underlying mechanisms that might explain the effects of mode tailoring on website attitudes and recall of online health information. Results from structural equation modeling showed that mode tailoring - enabling users to self-customize a health website's presentation mode (via textual, visual, audiovisual information) - increased users' perceived active control, which in turn contributed to higher perceived relevance and website engagement, and reduced cognitive load. Positive indirect effects of mode tailoring (vs. no tailoring) through these mechanisms were found for both website attitude and information recall. The findings suggest that perceived active control is the key driver of mode tailoring effects. Mode tailoring can be a promising and novel strategy to maximize the effectiveness of tailored health communications. The authors discuss the implications for theory and design of digital health information.
    DOI:  https://doi.org/10.1080/10810730.2020.1743797
  7. Support Care Cancer. 2020 Apr 15.
       PURPOSE: Our objective was to evaluate health information seeking behaviors in yCRC (young onset colorectal cancer, diagnosed ≤ 50 years) and aCRC (average-age onset colorectal cancer, diagnosed ≥ 50 years).
    METHODS: We administered an international, Internet-based survey to ask individuals diagnosed with CRC how they seek health information, including sources sought and utilization behaviors. We also asked participants their preferences for digital technologies.
    RESULTS: In total 1125 individuals including 455 with yCRC (68.6% female) and 670 with aCRC (53.5% female) participated. There were similar frequencies of seeking among participants with yCRC and aCRC across all sources except for the Internet. Healthcare providers were the most frequently sought source with similar proportions of participants indicating their response as "always" (yCRC, 43.7% vs. aCRC, 43.2%, p = 0.91). We also observed differences in utilization behaviors with more participants with yCRC using the Internet first when seeking information (yCRC 31.6% vs. aCRC 24.3%, p < 0.05) and those with aCRC seeking healthcare providers first (aCRC 61.9% vs. yCRC 45.5%, p < 0.05). With respect to digital technologies, we found a higher proportion of yCRC participants owning smartphones and indicating use of apps related to health/wellness and cancer.
    CONCLUSION: Individuals with yCRC and aCRC similarly sought the same resources for health information on CRC. However, they differed with respect to utilization behaviors, particularly a greater reliance on digital technologies among individuals with yCRC. These have implications for informing age-specific resources and information to support patients.
    Keywords:  Colorectal cancer; Health information; Survey
    DOI:  https://doi.org/10.1007/s00520-020-05446-5
  8. Health Commun. 2020 Apr 14. 1-7
      Opioid abuse is a severe public health threat. Recent evidence points to a disturbing increase in the illicit use of fentanyl, a potent synthetic opioid, with abuse often involving illicitly produced opioids mixed with heroin. Public health experts have emphasized that there is an urgent need for new, effective harm-reduction strategies and technologies. We asked whether Internet search engines could contribute toward this goal. Using state-level data from the USA, we provide evidence for a cross-sectional and longitudinal statistical relationship between opioid-related overdose deaths and the number of Google searches using the term "fentanyl." This finding points to the relevance of Internet search engines: Users - who may be non-addicted vulnerable individuals, addicts, addicts' friends and family members, or physicians - do in fact search for fentanyl online. We argue that during such searches, an info box including a warning (i.e., awareness material to educate users about the risks) and a help message (i.e., references to professional help) can be presented to target users and possibly prevent both unintentional and suicidal overdoses. Even if this info box only helps some users, the high number of daily Google searches renders this a promising public health intervention to supplement other opioid harm-reduction strategies.
    DOI:  https://doi.org/10.1080/10410236.2020.1748820
  9. Healthcare (Basel). 2020 Apr 09. pii: E92. [Epub ahead of print]8(2):
      Korean immigrants in the United States (U.S.) are known for their preference for, and dependence on, co-ethnic doctors due to various barriers to the U.S. healthcare system. Recent immigrants tend to face more barriers than their non-recent counterparts. However, there is little information on how they find their doctors in the U.S. This study includes a self-administrated survey of Korean immigrants aged 18 and above who lived in the New York-New Jersey Metropolitan area in 2013-2014 (n = 440). Descriptive analysis was conducted to understand the most common information sources and the number of sources based on the duration of stay in the U.S. More recent Korean immigrants were female, had no family doctor, uninsured, younger, and more educated than their non-recent counterparts. Regardless of the duration of stay in the U.S., family members and friends were the most frequently sought-after sources for Korean immigrants in their search for doctors. In addition to family members and friends, non-recent Korean immigrants also used other methods (e.g., Korean business directories), whereas recent immigrants used both U.S. and Korean websites. More recent Korean immigrants used multiple sources compared to non-recent Korean immigrants, often combined with a Korean website. Our study suggests policy implications to improve recent immigrants' accessibility to health information in a timely manner.
    Keywords:  Korean immigrants; co-ethnic doctors; health information seeking; information sources; recent immigrants; searching for a doctor
    DOI:  https://doi.org/10.3390/healthcare8020092
  10. Med Mal Infect. 2020 Apr 11. pii: S0399-077X(20)30090-1. [Epub ahead of print]
       OBJECTIVES: To identify patterns of use, perceived benefits, and barriers among people living with HIV (PLHIV) of online searches for health information and via social media.
    METHODS: Online multicenter observational survey (October 15-19, 2018).
    RESULTS: Study participation was accepted by 838/1,377 PLHIV followed in 46 centers, of which 325 (39%) responded online: 181 (56%) had already used the Internet to search for health information; 88/181 (49%) on HIV infection and 78 (43%) on nutrition. These 56% were characterized by a higher educational level (OR=1.82 ±0.50; p=0.028) and more often consulted other specialists (OR=3.14 ±1.26; p=0.004). A subset of 87/180 (48%) PLHIV had changed the way they looked after their health based on their online research, and were more often in material/social deprivation (p=0.02) and diabetic (p=0.02). A small subset of 19/180 (11%) had already asked or answered a question on a forum; these people tended to be women (p=0.03) in material/social deprivation (p=0.009). 296/322 (92%) PLHIV trusted their physician whereas only 206 (64%) trusted information sourced on medical websites. 238/323 (74%) PLHIV expected their physicians to recommend websites if asked, whereas only 23/323 (7%) had actually been given this guidance.
    CONCLUSION: More than half of PLHIV surveyed had already searched for health information on the Internet, and one in two had changed their behavior based on the online search. PLHIV did not see the Internet as an alternative to physicians but they wanted their physicians to guide them on how to find quality health information to better self-manage their condition.
    Keywords:  HIV; e-health; internet; social networks
    DOI:  https://doi.org/10.1016/j.medmal.2020.04.004
  11. J Med Internet Res. 2020 Apr 17. 22(4): e16768
       BACKGROUND: The internet allows patients to easily look for health information. However, how Chinese patients with breast cancer use the internet has rarely been investigated, and there is a scarcity of information about the influence of internet use on survival.
    OBJECTIVE: This observational study aimed to investigate the details of online medical information searching by Chinese patients with breast cancer and to determine whether internet use has any survival benefits.
    METHODS: Patients who were diagnosed with invasive breast cancer at Peking Union Medical College Hospital between January 2014 and December 2015 were enrolled. We obtained information on their internet-searching behavior and gathered data from the patients' medical and follow-up records. The associations between internet use and other clinic-pathological factors were analyzed. A Cox proportional-hazards model and the Kaplan-Meier method were used for disease-free survival (DFS) analyses.
    RESULTS: A total of 973 patients with invasive breast cancer who underwent definitive surgery took part in the study. Among them, 477 cases (49.0%) performed web-based breast cancer information searching before the initial treatment. A multivariate logistic regression analysis suggested that web-based breast cancer information searching was significantly associated with younger age (odds ratio [OR] 0.95, 95% CI 0.94-0.97, P<.001), higher education level (OR 1.37, 95% CI 1.01-1.86, P=.04), and breast conserving surgery (OR 1.35, 95% CI 1.04-1.77, P=.03). Baidu (73.4%, 350/477) and WeChat (66.7%, 318/477) were the two most popular online information sources for breast cancer; however, only 44.9% (214/477) felt satisfied with the online information. In contrast to the nonweb searching group, the web-using patients who were satisfied with online information showed significantly improved DFS (hazard ratio 0.26; 95% CI 0.08-0.88, P=.03).
    CONCLUSIONS: The patients who were most likely to search the internet for breast cancer information were younger and well-educated, and they were more likely to have breast conserving therapy. Web-using patients who were satisfied with the internet information showed significantly improved DFS. Patients should browse credible websites offering accurate and updated information, and website developers should provide high-quality and easy-to-understand information to better meet the needs of patients with breast cancer.
    Keywords:  breast cancer; breast conserving therapy; disease-free survival; internet; online information; satisfaction level
    DOI:  https://doi.org/10.2196/16768
  12. Laryngoscope. 2020 Apr 13.
       OBJECTIVES/HYPOTHESIS: The incidence of human papillomavirus-positive (HPV+) oropharyngeal cancer is rising, but public knowledge about this diagnosis remains low. This study aimed to investigate the quality and readability of online information about HPV+ oropharyngeal cancer.
    STUDY DESIGN: Cross-sectional website analysis.
    METHODS: This study conducted a total of 12 web searches across Google, Yahoo, and Bing to identify websites related to HPV+ oropharyngeal cancer. The QUality Evaluation Scoring Tool (QUEST) was used to measure quality based on seven website criteria. The Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL) were used to measure readability, with scores estimating the education level a reader would require to understand a piece of text. Readability improves as FRES increases and FKGL decreases.
    RESULTS: Twenty-seven unique web pages were evaluated. The mean USA reading grade level as measured by FKGL was 10.42 (standard deviation = 1.54). There was an inverse relationship between quality and readability, with a significant positive correlation between QUEST score and FKGL (r = 0.343, P = .040) and a significant negative correlation between QUEST score and FRES (r = -0.537, P = .002).
    CONCLUSIONS: With a mean USA reading grade level more than four grades above the American Medical Association's recommendation and results indicating that readability suffers as quality improves, these findings suggest that the currently available online information about HPV+ oropharyngeal cancer is insufficient. Improved patient education practices and resources about this diagnosis are needed.
    LEVEL OF EVIDENCE: NA Laryngoscope, 2020.
    Keywords:  Oropharyngeal cancer; consumer health information; human papillomavirus; patient education; quality; readability
    DOI:  https://doi.org/10.1002/lary.28670
  13. Database (Oxford). 2020 Jan 01. pii: baaa024. [Epub ahead of print]2020
      Gathering information from the scientific literature is essential for biomedical research, as much knowledge is conveyed through publications. However, the large and rapidly increasing publication rate makes it impractical for researchers to quickly identify all and only those documents related to their interest. As such, automated biomedical document classification attracts much interest. Such classification is critical in the curation of biological databases, because biocurators must scan through a vast number of articles to identify pertinent information within documents most relevant to the database. This is a slow, labor-intensive process that can benefit from effective automation.
    We present a document classification scheme aiming to identify papers containing information relevant to a specific topic, among a large collection of articles, for supporting the biocuration classification task. Our framework is based on a meta-classification scheme we have introduced before; here we incorporate into it features gathered from figure captions, in addition to those obtained from titles and abstracts. We trained and tested our classifier over a large imbalanced dataset, originally curated by the Gene Expression Database (GXD). GXD collects all the gene expression information in the Mouse Genome Informatics (MGI) resource. As part of the MGI literature classification pipeline, GXD curators identify MGI-selected papers that are relevant for GXD. The dataset consists of ~60 000 documents (5469 labeled as relevant; 52 866 as irrelevant), gathered throughout 2012-2016, in which each document is represented by the text of its title, abstract and figure captions. Our classifier attains precision 0.698, recall 0.784, f-measure 0.738 and Matthews correlation coefficient 0.711, demonstrating that the proposed framework effectively addresses the high imbalance in the GXD classification task. Moreover, our classifier's performance is significantly improved by utilizing information from image captions compared to using titles and abstracts alone; this observation clearly demonstrates that image captions provide substantial information for supporting biomedical document classification and curation.
    Database URL.
    DOI:  https://doi.org/10.1093/database/baaa024
  14. Front Psychol. 2020 ;11 566
      Infants register and react to informational uncertainty in the environment. They also form expectations about the probability of future events as well as update the expectation according to changes in the environment. A novel line of research has started to investigate infants' and toddlers' behavior under uncertainty. By combining these research areas, the present research investigated 12- and 24-month-old infants' searching behaviors under varying degree of informational uncertainty. An object was hidden in one of three possible locations and probabilistic information about the hiding location was manipulated across trials. Infants' time delay in search initiation for a hidden object linearly increased across the level of informational uncertainty. Infants' successful searching also varied according to probabilistic information. The findings suggest that infants modulate their behaviors based on probabilistic information. We discuss the possibility that infants' behavioral reaction to the environmental uncertainty constitutes the basis for the development of subjective uncertainty.
    Keywords:  infant uncertainty; latency; probabilistic information; searching; subjective uncertainty
    DOI:  https://doi.org/10.3389/fpsyg.2020.00566
  15. J Med Internet Res. 2020 Apr 15. 22(4): e16148
       BACKGROUND: People often search the internet to obtain health-related information not only for themselves but also for family members and, in particular, their children. However, for a minority of parents, such searches may become excessive and distressing. Little is known about excessive web-based searching by parents for information regarding their children's health.
    OBJECTIVE: This study aimed to develop and validate an instrument designed to assess parents' web-based health information searching behavior, the Children's Health Internet Research, Parental Inventory (CHIRPI).
    METHODS: A pilot survey was used to establish the instrument (21 items). CHIRPI was validated online in a second sample (372/384, 96.9% mothers; mean age 32.7 years, SD 5.8). Item analyses, an exploratory factor analysis (EFA), and correlations with parents' perception of their children's health-related vulnerability (Child Vulnerability Scale, CVS), parental health anxiety (modified short Health Anxiety Inventory, mSHAI), and parental cyberchondria (Cyberchondria Severity Scale, CSS-15) were calculated. A subset of participants (n=73) provided retest data after 4 weeks. CHIRPI scores (total scores and subscale scores) of parents with a chronically ill child and parents who perceived their child to be vulnerable (CVS+; CVS>10) were compared with 2×2 analyses of variances (ANOVAs) with the factors Child's Health Status (chronically ill vs healthy) and perceived vulnerability (CVS+ vs CVS-).
    RESULTS: CHIRPI's internal consistency was standardized alpha=.89. The EFA identified three subscales: Symptom Focus (standardized alpha=.87), Implementing Advice (standardized alpha=.74) and Distress (standardized alpha=.89). The retest reliability of CHIRPI was measured as rtt=0.78. CHIRPI correlated strongly with CSS-15 (r=0.66) and mSHAI (r=0.39). The ANOVAs comparing the CHIRPI total score and the subscale scores for parents having a chronically ill child and parents perceiving their child as vulnerable revealed the main effects for perceiving one's child as vulnerable but not for having a chronically ill child. No interactions were found. This pattern was observed for the CHIRPI total score (η2=0.053) and each subscale (Symptom Focus η2=0.012; Distress η2=0.113; and Implementing Advice η2=0.018).
    CONCLUSIONS: The psychometric properties of CHIRPI are excellent. Correlations with mSHAI and CSS-15 indicate its validity. CHIRPI appears to be differentially sensitive to excessive searches owing to parents perceiving their child's health to be vulnerable rather than to higher informational needs of parents with chronically ill children. Therefore, it may help to identify parents who search excessively for web-based health information. CHIRPI (and, in particular, the Distress subscale) seems to capture a pattern of factors related to anxious health-related cognitions, emotions, and behaviors of parents, which is also applied to their children.
    Keywords:  children; health behavior; health knowledge, attitudes, practice; hypochondriasis; internet; parents; questionnaire
    DOI:  https://doi.org/10.2196/16148
  16. J Chem Inf Model. 2020 Apr 14.
      Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of ca. 35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles' text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% prediction on morphology prediction, 0.99 AUC on protocol identification, and up to 0.87 F1-score on chemical entity recognition. In addition to processing articles' text, microscopy images of nanomaterials within articles are also automatically identified and analyzed to determine nanomaterials' morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nano-materials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.
    DOI:  https://doi.org/10.1021/acs.jcim.0c00199