Retrieve Pubmed publications using the RefManageR package

programming
Author

Michael Luu

Published

April 13, 2023

For anyone that’s working in academia, it may be useful to keep a tabulation of all of the publications that you are a co-author of. We can use the RefManageR package, which includes a function (ReadPubMed()) that allows us to query the PubMed API for publications based on a PubMed query.

Below I generate a PubMed query q that would query PubMed for publications that involves Luu, Michael as an author with a Cedars-Sinai affiliation. I then pass this query into the ReadPubMed() function and save the results as pm. The output is a BibEntry object that can be further coerced into a tibble.

q <- '(Luu, Michael[Author]) AND (Cedars-Sinai[Affiliation])'

pm <- RefManageR::ReadPubMed(q, retmax = 999)
  
out <- pm |> as_tibble()

glimpse(out)
Rows: 68
Columns: 15
$ bibtype    <chr> "Article", "Article", "Article", "Article", "Article", "Art…
$ title      <chr> "Concurrent prognostic utility of lymph node count and lymp…
$ author     <chr> "John M Masterson and Michael Luu and Aurash Naser-Tavakoli…
$ year       <chr> "2023", "2022", "2023", "2022", "2022", "2022", "2022", "20…
$ month      <chr> "Jan", "Nov", "Apr", "Aug", "Oct", "Jul", "Aug", "Jul", "Ma…
$ journal    <chr> "Prostate cancer and prostatic diseases", "NPJ breast cance…
$ eprint     <chr> "36600045", "36402796", "36385470", "36054029", "35997126",…
$ doi        <chr> "10.1038/s41391-022-00635-1", "10.1038/s41523-022-00489-9",…
$ language   <chr> "eng", "eng", "eng", "eng", "eng", "eng", "eng", "eng", "en…
$ issn       <chr> "1476-5608", "2374-4677", "1531-4995", "1531-4995", "1097-0…
$ abstract   <chr> "BACKGROUND: While both the number (+LN) and density (LND) …
$ eprinttype <chr> "pubmed", "pubmed", "pubmed", "pubmed", "pubmed", "pubmed",…
$ volume     <chr> NA, "8", "133", NA, "128", "113", "208", "114", "165", "40"…
$ number     <chr> NA, "1", "4", NA, "20", "4", "2", "7", "2", "4", "1", "1", …
$ pages      <chr> NA, "123", "E25", NA, "3610-3619", "787-795", "301-308", "1…
out
# A tibble: 68 × 15
   bibtype title author year  month journal eprint doi   language issn  abstract
   <chr>   <chr> <chr>  <chr> <chr> <chr>   <chr>  <chr> <chr>    <chr> <chr>   
 1 Article Conc… John … 2023  Jan   Prosta… 36600… 10.1… eng      1476… "BACKGR…
 2 Article Toxi… N Lyn… 2022  Nov   NPJ br… 36402… 10.1… eng      2374… "Adjuva…
 3 Article In R… Eric … 2023  Apr   The La… 36385… 10.1… eng      1531…  <NA>   
 4 Article Pred… Eric … 2022  Aug   The La… 36054… 10.1… eng      1531… "BACKGR…
 5 Article Disp… Yi-Te… 2022  Oct   Cancer  35997… 10.1… eng      1097… "BACKGR…
 6 Article Noda… Diana… 2022  Jul   Intern… 35395… 10.1… eng      1879… "PURPOS…
 7 Article Vari… Timot… 2022  Aug   The Jo… 35377… 10.1… eng      1527… "PURPOS…
 8 Article Quan… Antho… 2022  Jul   Journa… 35311… 10.1… eng      1460… "BACKGR…
 9 Article Path… Eric … 2022  May   Gyneco… 35216… 10.1… eng      1095… "PURPOS…
10 Article Pred… Paige… 2022  Apr   Urolog… 35067… 10.1… eng      1873… "BACKGR…
# ℹ 58 more rows
# ℹ 4 more variables: eprinttype <chr>, volume <chr>, number <chr>, pages <chr>

Now that we have this information into a tibble, we can further visualize the frequency of occurrences among the list of co-authors using a word cloud

plot_data <- out |>
  select(author) |>
  separate_wider_delim(
    author,
    names = paste0('author', 1:20),
    delim = ' and ',
    too_few = 'align_start'
  ) |>
  mutate(i = row_number()) |>
  pivot_longer(contains('author')) |>
  filter(!is.na(value)) |> 
  count(value) |> 
  arrange(desc(n)) |> 
  filter(value != 'Michael Luu')

set.seed(1)
ggplot(plot_data, aes(label = value, size = n, color = n)) +
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 16) +
  theme_minimal() +
  scale_color_viridis_c()

Session info

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] ggwordcloud_0.5.0 RefManageR_1.4.0  lubridate_1.9.2   forcats_1.0.0    
 [5] stringr_1.5.0     dplyr_1.1.1       purrr_1.0.1       readr_2.1.4      
 [9] tidyr_1.3.0       tibble_3.2.1      ggplot2_3.4.2     tidyverse_2.0.0  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  xfun_0.38         colorspace_2.1-0  vctrs_0.6.1      
 [5] generics_0.1.3    viridisLite_0.4.1 htmltools_0.5.4   yaml_2.3.6       
 [9] utf8_1.2.3        rlang_1.1.0       pillar_1.9.0      glue_1.6.2       
[13] withr_2.5.0       lifecycle_1.0.3   plyr_1.8.8        munsell_0.5.0    
[17] gtable_0.3.3      htmlwidgets_1.6.2 evaluate_0.19     labeling_0.4.2   
[21] knitr_1.41        tzdb_0.3.0        fastmap_1.1.0     curl_5.0.0       
[25] fansi_1.0.4       Rcpp_1.0.10       renv_0.16.0       scales_1.2.1     
[29] backports_1.4.1   jsonlite_1.8.4    farver_2.1.1      hms_1.1.3        
[33] png_0.1-8         digest_0.6.31     stringi_1.7.8     grid_4.2.2       
[37] bibtex_0.5.1      cli_3.6.1         tools_4.2.2       magrittr_2.0.3   
[41] pkgconfig_2.0.3   xml2_1.3.3        timechange_0.2.0  rmarkdown_2.19   
[45] httr_1.4.5        rstudioapi_0.14   R6_2.5.1          compiler_4.2.2   

Reuse

Citation

BibTeX citation:
@online{luu2023,
  author = {Luu, Michael},
  title = {Retrieve {Pubmed} Publications Using the {RefManageR}
    Package},
  date = {2023-04-13},
  langid = {en}
}
For attribution, please cite this work as:
Luu, Michael. 2023. “Retrieve Pubmed Publications Using the RefManageR Package.” April 13, 2023.