The Dfam database of repetitive DNA families.

TitleThe Dfam database of repetitive DNA families.
Publication TypeJournal Article
Year of Publication2016
AuthorsHubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, Smit AFA, Wheeler TJ
JournalNucleic Acids Res
Volume44
IssueD1
PaginationD81-9
Date Published2016 Jan 04
ISSN1362-4962
KeywordsAnimals, Databases, Nucleic Acid, DNA, DNA Transposable Elements, Genome, Humans, Internet, Markov Chains, Mice, Molecular Sequence Annotation, Repetitive Sequences, Nucleic Acid, Sequence Alignment
Abstract<p>Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.</p>
DOI10.1093/nar/gkv1272
Alternate JournalNucleic Acids Res.
PubMed ID26612867
PubMed Central IDPMC4702899
Grant ListR01 HG002939 / HG / NHGRI NIH HHS / United States
P41LM006252-1 / LM / NLM NIH HHS / United States
R01HG002939 / HG / NHGRI NIH HHS / United States
/ / Howard Hughes Medical Institute / United States