I have recently published a new R package, AnnotationBustR, and a Preprint of our paper that is submitted at PeerJ. Sequence data can be difficult to work with sometimes as sequences may be concatenated or a sequence of interest may be in a genome and not available by itself on GenBank. Additionally, the same gene may be annotated may be annotated differently among records, making it difficult to extract data from a lot of records. AnnotationBustR was written to make this process easier and as users can supply a list of accessions and a set of terms they want to extract and get FASTA formatted files returned. We provide a vignette tutorial within the R package or on CRAN (see link below) on how to use the software.
The R package is developed on GitHub and interfaces to GenBank through the R package seqinr. You can see the GitHub repo, CRAN page, and pre-print of our paper in the links below:
PeerJ Preprint citation and link: Borstein, S. R., & O’Meara, B. C. (2017). AnnotationBustR: An R package to extract subsequences from GenBank annotations. PeerJ Preprints, e2920v1, https://doi.org/10.7287/peerj.preprints.2920v1.