bims-librar 2021-01-10 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2021–01–10
seventeen papers selected by
Thomas Krichel, Open Library Society

Human visual search follows a suboptimal Bayesian strategy revealed by a spatiotemporal computational model and experiment.
A Trilingual Medical Compendium from Medieval Oxford, Now in the Collection of the State Library Victoria.
A data-sharing scheme that supports multi-keyword search for electronic medical records.
Automatic Identification of Information Quality Metrics in Health News Stories.
COVID-19: time to flatten the infodemic curve.
Trust in Health Information Sources among Underserved and Vulnerable Populations in the U.S.
Network graph representation of COVID-19 scientific publications to aid knowledge discovery.
Extracting and modeling geographic information from scientific articles.
Communicating Scientific Uncertainty in an Age of COVID-19: An Investigation into the Use of Preprints by Digital Media Outlets.
Veterinary allergy information has lower health readability than human allergy information: a comparative analysis of allergy education materials for pets and people.
Assessing the readability of patient-targeted online information on musculoskeletal radiology procedures.
"Down the Rabbit Hole" of Vaccine Misinformation on YouTube: Network Exposure Study.
Digital health literacy and online information seeking in times of COVID-19. A cross-sectional survey among university students in Germany.
The Believability of Exercise Blogs Among Young Adults.
Caught in the net: Characterizing how testicular cancer patients use the internet as an information source.
The use of internet platforms for oral health information and associated factors among adolescents from Jakarta: a cross sectional study.
Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Commun Biol. 2021 Jan 04. 4(1): 34

Human visual search follows a suboptimal Bayesian strategy revealed by a spatiotemporal computational model and experiment.

Yunhui Zhou, Yuguo Yu.

There is conflicting evidence regarding whether humans can make spatially optimal eye movements during visual search. Some studies have shown that humans can optimally integrate information across fixations and determine the next fixation location, however, these models have generally ignored the control of fixation duration and memory limitation, and the model results do not agree well with the details of human eye movement metrics. Here, we measured the temporal course of the human visibility map and performed a visual search experiment. We further built a continuous-time eye movement model that considers saccadic inaccuracy, saccadic bias, and memory constraints. We show that this model agrees better with the spatial and temporal properties of human eye movements and predict that humans have a memory capacity of around eight previous fixations. The model results reveal that humans employ a suboptimal eye movement strategy to find a target, which may minimize costs while still achieving sufficiently high search performance.

DOI: https://doi.org/10.1038/s42003-020-01485-0
Bull Hist Med. 2020 ;94(3): 459-486

A Trilingual Medical Compendium from Medieval Oxford, Now in the Collection of the State Library Victoria.

Linda Ehrsam Voigts, Anna Welch.

A previously unstudied trilingual medieval medical manuscript, ca. 1400, RARES 091 M31, has been in the State Library Victoria, Melbourne, since 1862. The texts in this codex reveal the pedagogical and personal interests of a compiler from the world of Oxford colleges, halls, and libraries in the late fourteenth century. It contains academic medical texts as well as writings of a personal nature-charms, verses, prayers-in Latin, French, and Middle English. It appears to have been associated with Henry Beaumond (d. 1415), whose name appears in the codex. Beaumond was a physician with a problematic association with Exeter College, Oxford University. A good deal of information survives about Beaumond and his books, as well as his association with the influential cleric at New College, Oxford, Walter Awde (d. after 1404), who is also named in the manuscript. This study provides images and a full physical description of the manuscript.

DOI: https://doi.org/10.1353/bhm.2020.0072
PLoS One. 2021 ;16(1): e0244979

A data-sharing scheme that supports multi-keyword search for electronic medical records.

Shufen Niu, Wenke Liu, Song Han, Lizhi Fang.

As cloud storage technology develops, data sharing of cloud-based electronic medical records (EMRs) has become a hot topic in the academia and healthcare sectors. To solve the problem of secure search and sharing of EMR in cloud platforms, an EMR data-sharing scheme supporting multi-keyword search is proposed. The proposed scheme combines searchable encryption and proxy re-encryption technologies to perform keyword search and achieve secure sharing of encrypted EMR. At the same time, the scheme uses a traceable pseudo identity to protect the patient's private information. Our scheme is proven secure based on the modified Bilinear Diffie-Hellman assumption and Quotient Decisional Bilinear Diffie-Hellman assumption under the random oracle model. The performance of our scheme is evaluated through theoretical analysis and numerical simulation.

DOI: https://doi.org/10.1371/journal.pone.0244979
Front Public Health. 2020 ;8 515347

Automatic Identification of Information Quality Metrics in Health News Stories.

Majed Al-Jefri, Roger Evans, Joon Lee, Pietro Ghezzi.

  Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning. Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT. Results: For some criteria, such as mention of costs, benefits, harms, and "disease-mongering," the evaluation results were promising with an F1 measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process. Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.

Keywords:  health information quality assessment; machine learning; natural language processing; online health information; text classification

DOI:  https://doi.org/10.3389/fpubh.2020.515347
Clin Exp Med. 2021 Jan 08.

COVID-19: time to flatten the infodemic curve.

Anastasios Tentolouris, Ioannis Ntanasis-Stathopoulos, Panayotis K Vlachakis, Diamantis I Tsilimigras, Maria Gavriatopoulou, Meletios A Dimopoulos.

  Thousands of articles have been published regarding the coronavirus disease of 2019 (COVID-19). Most of them are not original research articles but reviews and editorials, and therefore, the absence of evidence-based guidelines has been evident. In parallel, the quality of manuscripts is questionable since the number of preprints has increased due to the need of fast publication of COVID-19-related articles. Furthermore, the number of retracted articles during the pandemic is exceptionally high. Media have an important role in the distribution of incorrect information, nevertheless individual people and policy makers are also responsible. As misinformation thrives in crisis periods, well-designed studies are needed to flatten the infodemic curve regarding prevention, diagnosis, and long-term complications of COVID-19.

Keywords:  COVID-19; Infodemic; Misinformation; SARS-CoV-2; Social media

DOI:  https://doi.org/10.1007/s10238-020-00680-x
J Health Care Poor Underserved. 2020 ;31(3): 1471-1487

Trust in Health Information Sources among Underserved and Vulnerable Populations in the U.S.

Christopher W Wheldon, Katherine T Carroll, Richard P Moser.

The purpose of this study was to examine trust in health information sources among underserved and vulnerable populations. Data (N=8,759) were from the Health Information National Trends Survey. Differences were assessed across the following subgroups: ethnoracial minorities, immigrants, rural residence, people with limited English proficiency, and sexual minorities. Trust was highest for doctors, followed by government, family/friends, charities, and religious organizations. In adjusted regression models, trusting health information from charitable and religious organizations was higher in ethnoracial minorities and immigrants. Individuals with limited English proficiency also had higher trust in religious organizations compared with those fluent in English. Trusting health information from doctors was lower among individuals with limited English proficiency. There was evidence in support of additive and multiplicative intersectional frameworks for understanding trust in vulnerable and underserved populations; however, the extent to which differences in trust explain disparities in health behaviors and outcomes should be examined.

DOI: https://doi.org/10.1353/hpu.2020.0106
BMJ Health Care Inform. 2021 Jan;pii: e100254. [Epub ahead of print]28(1):

Network graph representation of COVID-19 scientific publications to aid knowledge discovery.

George Cernile, Trevor Heritage, Neil J Sebire, Ben Gordon, Taralyn Schwering, Shana Kazemlou, Yulia Borecki.

   INTRODUCTION: Numerous scientific journal articles related to COVID-19 have been rapidly published, making navigation and understanding of relationships difficult.
METHODS: A graph network was constructed from the publicly available COVID-19 Open Research Dataset (CORD-19) of COVID-19-related publications using an engine leveraging medical knowledge bases to identify discrete medical concepts and an open-source tool (Gephi) to visualise the network.
RESULTS: The network shows connections between diseases, medications and procedures identified from the title and abstract of 195 958 COVID-19-related publications (CORD-19 Dataset). Connections between terms with few publications, those unconnected to the main network and those irrelevant were not displayed. Nodes were coloured by knowledge base and the size of the node related to the number of publications containing the term. The data set and visualisations were made publicly accessible via a webtool.
CONCLUSION: Knowledge management approaches (text mining and graph networks) can effectively allow rapid navigation and exploration of entity inter-relationships to improve understanding of diseases such as COVID-19.

Keywords:  BMJ health informatics; health care; information science; medical informatics

DOI:  https://doi.org/10.1136/bmjhci-2020-100254
PLoS One. 2021 ;16(1): e0244918

Extracting and modeling geographic information from scientific articles.

Elise Acheson, Ross S Purves.

Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.

DOI: https://doi.org/10.1371/journal.pone.0244918
Health Commun. 2021 Jan 03. 1-13

Communicating Scientific Uncertainty in an Age of COVID-19: An Investigation into the Use of Preprints by Digital Media Outlets.

Alice Fleerackers, Michelle Riedlinger, Laura Moorhead, Rukhsana Ahmed, Juan Pablo Alperin.

In this article, we investigate the surge in use of COVID-19-related preprints by media outlets. Journalists are a main source of reliable public health information during crises and, until recently, journalists have been reluctant to cover preprints because of the associated scientific uncertainty. Yet, uploads of COVID-19 preprints and their uptake by online media have outstripped that of preprints about any other topic. Using an innovative approach combining altmetrics methods with content analysis, we identified a diversity of outlets covering COVID-19-related preprints during the early months of the pandemic, including specialist medical news outlets, traditional news media outlets, and aggregators. We found a ubiquity of hyperlinks as citations and a multiplicity of framing devices for highlighting the scientific uncertainty associated with COVID-19 preprints. These devices were rarely used consistently (e.g., mentioning that the study was a preprint, unreviewed, preliminary, and/or in need of verification). Less than half of the stories we analyzed contained framing devices emphasizing uncertainty. Outlets in our sample were much less likely to identify the research they mentioned as preprint research, compared to identifying it as simply "research." This work has significant implications for public health communication within the changing media landscape. While current best practices in public health risk communication promote identifying and promoting trustworthy sources of information, the uptake of preprint research by online media presents new challenges. At the same time, it provides new opportunities for fostering greater awareness of the scientific uncertainty associated with health research findings.

DOI: https://doi.org/10.1080/10410236.2020.1864892
Vet Dermatol. 2021 Jan 05.

Veterinary allergy information has lower health readability than human allergy information: a comparative analysis of allergy education materials for pets and people.

Kathy Chu Tater.

BACKGROUND: Pet owners frequently consult online sources of veterinary health information. However, there are limited data on the readability of these resources and whether the readability is appropriate for pet owner education levels.
OBJECTIVES: To evaluate the education level of the US pet-owning population, and determine the readability of pet allergy information and compare the readability of online pet allergy information with online human allergy information.
ANIMALS/SUBJECTS: A subpopulation of 4,933 adults, representative of a population of 208,525,282, answering National Health and Nutrition Examination Survey (NHANES) demographic and pet questions. Allergy information in 54 articles (28 veterinary, 26 human) from six health websites (three veterinary, three human).
METHODS AND MATERIALS: An analysis was performed on 10,294 NHANES questionnaire responses to identify the subpopulation of 4,933 pet-owning adults. Flesch Reading Ease Scores and Flesch-Kincaid Grade Level Scores were calculated on the pet and human allergy information to evaluate readability.
RESULTS: The age-adjusted prevalence of high school graduation was higher for adults with pets (85.8 ± 1.33%) compared to adults without pets (78.5 ± 1.5%, P < 0.0001). Allergy information on veterinary websites was more difficult to read (P = 0.0052) and written at a higher grade level (P = 0.0047) than that on human health websites. The average veterinary health information readability score was 45.9 ± 8.7 ("difficult to read") and written at an 11th grade level or above (range: 8th grade-college level).
CONCLUSIONS: Allergy information on veterinary websites was less readable than allergy information on human health websites. Online veterinary information may be written at a reading level that is inappropriate for pet owners.

DOI: https://doi.org/10.1111/vde.12934
Skeletal Radiol. 2021 Jan 03.

Assessing the readability of patient-targeted online information on musculoskeletal radiology procedures.

Phuong T Duong, Matthew P Moy, F Joseph Simeone, Connie Y Chang, Tony T Wong.

   OBJECTIVE: To assess the readability of patient-targeted online information on musculoskeletal radiology procedures.
METHODS: Eleven common musculoskeletal radiology procedures were queried in three online search engines (Google, Yahoo!, Bing). All unique patient-targeted websites were identified (n = 384) from the first three pages of search results. The reading grade level of each website was calculated using 6 separate validated metrics for readability assessment. Analysis of word and sentence complexity was also performed. Results were compared between academic vs. non-academic websites and between websites found on different pages of the search results. Statistics were performed using a t test.
RESULTS: The mean reading grade level across all procedures was 10th-14th grade. Webpages for nerve block were written at a higher reading grade level on non-academic websites (p = 0.025). There was no difference in reading grade levels between academic and non-academic sources for all other procedures. There was no difference in reading grade levels between websites found on the first page of search results compared with the second and third pages. Across all websites, 16-22% of the words used had 3+ syllables and 31-43% of the words used had 6+ characters (complex words); 13-24% of the sentences used had 22+ words (complex sentences).
CONCLUSION: Patient-targeted online information on musculoskeletal radiology procedures are written at the 10th-14th grade reading level, which is well beyond the AMA and NIH recommendation. Readability can be lowered by decreasing text complexity through limitation of high-syllable words and reduction in word and sentence length.

Keywords:  Health literacy; Musculoskeletal procedures; Patient education; Readability

DOI:  https://doi.org/10.1007/s00256-020-03562-1
J Med Internet Res. 2021 Jan 05. 23(1): e23262

"Down the Rabbit Hole" of Vaccine Misinformation on YouTube: Network Exposure Study.

Lu Tang, Kayo Fujimoto, Muhammad Tuan Amith, Rachel Cunningham, Rebecca A Costantini, Felicia York, Grace Xiong, Julie A Boom, Cui Tao.

   BACKGROUND: Social media platforms such as YouTube are hotbeds for the spread of misinformation about vaccines.
OBJECTIVE: The aim of this study was to explore how individuals are exposed to antivaccine misinformation on YouTube based on whether they start their viewing from a keyword-based search or from antivaccine seed videos.
METHODS: Four networks of videos based on YouTube recommendations were collected in November 2019. Two search networks were created from provaccine and antivaccine keywords to resemble goal-oriented browsing. Two seed networks were constructed from conspiracy and antivaccine expert seed videos to resemble direct navigation. Video contents and network structures were analyzed using the network exposure model.
RESULTS: Viewers are more likely to encounter antivaccine videos through direct navigation starting from an antivaccine video than through goal-oriented browsing. In the two seed networks, provaccine videos, antivaccine videos, and videos containing health misinformation were all found to be more likely to lead to more antivaccine videos.
CONCLUSIONS: YouTube has boosted the search rankings of provaccine videos to combat the influence of antivaccine information. However, when viewers are directed to antivaccine videos on YouTube from another site, the recommendation algorithm is still likely to expose them to additional antivaccine information.

Keywords:  YouTube; infodemic; infodemiology; misinformation; network analysis; vaccine

DOI:  https://doi.org/10.2196/23262
J Med Internet Res. 2020 Dec 08.

Digital health literacy and online information seeking in times of COVID-19. A cross-sectional survey among university students in Germany.

Kevin Dadaczynski, Orkan Okan, Melanie Messer, Angela Y M Leung, Rafaela Rosário, Emily Darlington, Katharina Rathmann.

BACKGROUND: Digital communication technologies play an important role in governments' and public health authorities' health communication strategies during the COVID-19 pandemic. The internet and social media have become important sources of health-related information on the coronavirus and on protective behaviours. In addition, the COVID-19 infodemic spreads faster than the coronavirus itself, which interferes with governmental health-related communication efforts. This puts national public health containment strategies in jeopardy. Therefore, digital health literacy is a key competence to navigate coronavirus-related information and service environments.
OBJECTIVE: This study aimed to investigate university students' digital health literacy and online information seeking behaviours during the early stages of the coronavirus pandemic in Germany.
METHODS: A cross-sectional study among N=14,916 university students aged ≥18 from 130 universities across all sixteen federal states of Germany was conducted using an online survey. Along with sociodemographic characteristics (sex, age, subjective social status) measures included five subscales from the Digital Health Literacy Instrument (DHLI), which was adapted to the specific coronavirus context. Online information seeking behaviour was investigated by examining the online sources used by university students and the topics that students search for in connection with the coronavirus. Data were analysed using univariate and bivariate analyses.
RESULTS: Across digital health literacy dimensions, the greatest difficulties could be found for assessing the reliability of health-related information (42.3%) and the ability to determine whether the information was written with commercial interest (38.9%). Moreover, respondents also indicated that they most frequently have problems finding the information they are looking for (30.4%). When stratified according to sociodemographic characteristics, significant differences were found with female university students reporting a lower DHLI for the dimensions of 'information searching' and of 'evaluating reliability'. Search engines, news portals and public bodies' websites were most often used by the respondents as sources to search for information on COVID-19 and related issues. Female students were found to use social media and health portals more frequently, while male students used Wikipedia and other online encyclopaedias as well as YouTube more often. The use of social media was associated with a low ability to critically evaluate information, while opposite differences were observed for the use of public websites.
CONCLUSIONS: Although digital health literacy is, in summary, well developed in university students, a significant proportion of students still face difficulties with certain abilities to deal with information. There is need to strengthen the digital health literacy capacities of university students using tailored interventions. Improving the quality of health-related information on the internet is also key.
CLINICALTRIAL:

DOI: https://doi.org/10.2196/24097
J Sport Exerc Psychol. 2021 Jan 07. pii: jsep.2020-0177. [Epub ahead of print] 1-8

The Believability of Exercise Blogs Among Young Adults.

Elaine M Ori, Tanya R Berry, Lira Yun.

  It is unknown how lifelong digital media users such as young adult women perceive exercise information found online. A total of 141 women aged 18-30 years and residing in Canada were randomized to read either a factually incorrect or a factually correct blog article. Participants completed Go/No-Go tasks to measure automatically activated believability and evaluations and questionnaires to explicitly measure believability, affective evaluations, and intentions to exercise. Participants did not show evidence of automatically activated believability of the content found in either blog article. However, participants reading the factually correct article reported significantly greater explicit disbelief than those reading the factually incorrect article, though this did not predict intentions. Being factually correct may not be an important component of message believability. Exercise professionals need to remain aware of the content of popular online sources of information in an effort to curb misinformation.

Keywords:  exercise media; intentions to exercise; social media; women

DOI:  https://doi.org/10.1123/jsep.2020-0177
Can Urol Assoc J. 2021 Jan 04.

Caught in the net: Characterizing how testicular cancer patients use the internet as an information source.

Sarah Yeo, Bernhard Eigl, Sherry Chan, Christian Kollmannsberger, Paris-Ann Ingledew.

INTRODUCTION: Over 70% of Canadians who use the internet search for healthcare information online. This is especially true regarding the young adult population. Testicular cancer is the most commonly diagnosed cancer in men aged 15-29. This study characterizes how testicular cancer patients access healthcare information online, and how this influences their clinical encounters and treatment decisions.
METHODS: From June 2018 to January 2019, a survey consisting of 24 open and close-ended questions was distributed to testicular cancer patients at a tertiary cancer center. Survey results were evaluated using mixed methods analysis.
RESULTS: Fifty-nine surveys were distributed, and 44 responses were received. All respondents used the internet regularly and 82% used the internet as a source of information regarding their cancer. The majority followed top hits from Google when selecting websites to view. Frequent topics searched included treatment details and survivorship concerns. Eighty-nine percent of users found online information easy to understand and 94% found it increased their understanding. For 47% of users, the internet did not influence their clinical consultation nor their treatment decision (53%).
CONCLUSIONS: Most testicular cancer patients in this study are regular internet users and use the internet to search for testicular cancer information. Healthcare providers should recognize this and can play important roles in discussing online findings with patients to assess their background knowledge and expectations, as well as providing guidance on selecting credible online resources. The results of this study can be used to improve patient-physician communication and education.

DOI: https://doi.org/10.5489/cuaj.6870
BMC Oral Health. 2021 Jan 07. 21(1): 22

The use of internet platforms for oral health information and associated factors among adolescents from Jakarta: a cross sectional study.

Diah Ayu Maharani, Maha El Tantawi, Marsha Griselda Yoseph, Anton Rahardjo.

   BACKGROUND: The growth of the internet has increased its use to obtain health information including oral health information (OHI). This study assessed Indonesian adolescents' use of different internet platforms to obtain OHI and factors associated with this use.
METHODS: A cross-sectional study surveyed middle school students in five regions in Jakarta in 2019. Participants completed a questionnaire that assessed demographics, oral health practices (toothbrushing and dental visits), the presence of dental pain, using internet platform to obtain OHI and type of information searched for. Multinomial logistic regression was used to assess the association between using the internet for OHI (Google, Social Media (SM), both or none) and the independent factors: demographics, oral health practice, dental pain and whether participants search for causes, symptoms, prevention or treatment of oral diseases (ODs).
RESULTS: Most of the 521 participants were female (55.7%) with mean age = 13.4 years. Almost all of them (93.7%) searched the internet for OHI through Google (40.7%) or Google with SM (36.1%). Searching for OHI over SM was significantly associated with toothbrushing (OR = 4.12, 95% CI = 1.43, 11.89) and less dental visits (OR = 0.16, 95% CI = 0.05, 0.60). Searching Google for OHI was significantly associated with looking for information about causes (OR = 3.69, 95% CI = 1.33, 10.26) and treatment (OR = 6.17, 95% CI = 2.23, 17.03) of ODs.
CONCLUSIONS: Most adolescents used Google to seek OHI. Oral health practices and types of OHI searched for differed by internet platform. Dental health professionals should consider using internet-based interventions to promote oral health to this age group.

Keywords:  Adolescents; Indonesia; Oral health information; Social media

DOI:  https://doi.org/10.1186/s12903-020-01387-x
Front Genet. 2020 ;11 618862

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Nadezhda Biziukova, Olga Tarasova, Sergey Ivanov, Vladimir Poroikov.

  Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

Keywords:  HIV; NER; data mining; named entity recognition; text mining; viremic control; virus-host interactions

DOI:  https://doi.org/10.3389/fgene.2020.618862