AnnotationBustR, a New R Package and Pre-Print That Extracts Sequence Data From GenBank Annotations

I have recently published a new R package, AnnotationBustR, and a Preprint of our paper that is submitted at PeerJ. Sequence data can be difficult to work with sometimes as sequences may be concatenated or a sequence of interest may be in a genome and not available by itself on GenBank. Additionally, the same gene may be annotated may be annotated differently among records,  making it difficult to extract data from a lot of records. AnnotationBustR was written to make this process easier and as users can supply a list of accessions and a set of terms they want to extract and get FASTA formatted files returned. We provide a vignette tutorial within the R package or on CRAN (see link below) on how to use the software.

The R package is developed on GitHub and  interfaces to GenBank through the R package seqinr. You can see the GitHub repo, CRAN page, and pre-print of our paper in the links below:



PeerJ Preprint citation and link: Borstein, S. R., & O’Meara, B. C. (2017). AnnotationBustR: An R package to extract subsequences from GenBank annotations. PeerJ Preprints, e2920v1,

