Repbase Reports |
---|
2002, Volume 2, Issue 3 |
March 31, 2002 |
Copyright © 2001-2016 - Genetic Information Research Institute |
ISSN# 1534-830X |
Page 1 |
G4_DM |
|||
---|---|---|---|
G4_DM is a non-LTR retrotransposon - a consensus sequence. |
|||
Submitted: 31-Mar-2002 |
Accepted: 31-Mar-2002 |
||
Key Words: Non-LTR retrotransposon; ORF1; ORF2; DNA binding protein; AP endonuclease; reverse transcriptase; RNase H; JOCKEY clad; G4_DM |
|||
Source: consensus |
Organism: Drosophila melanogaster |
Taxonomy: Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; Pterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila |
|
[1] |
Authors: Kapitonov,V.V. and Jurka,J. |
||
Title: G4_DM, an ancient family of non-LTR retrotransposons from the Jockey clad. |
|||
Journal: Repbase Reports 2:(3) p. 1 (2002) |
|||
Abstract: G4_DM belongs to the JOCKEY clad of non-LTR retrotransposons. Its copies have poly(A) 3' tail and they are flanked by ~10-bp direct repeats generated upon integration into genome. G4_DM forms a separate family of retrotransposons that were recently (several million years ago) active in the Drosophila melanogaster genome. There are ~30 copies of G4_DM in the sequenced genome, they are ~5% divergent from the consensus sequence and ~10% divergent from each other. The consensus sequence, which is a approximation of the G4_DM element that was active ~3 Myr ago, contains two ORFs (positions 2-1078 and 1143-3725) that encode the 359-aa G4p1 and 861-aa G4p2 proteins, correspondingly. G4p1 is a putative DNA/RNA binding protein, its ~200-aa N- and ~100-aa C-terminus are missing because of a lack of sequence data. G4p1: EVQRKKNSLDNSSSTSANKFALLSDGLPDKTGNKYNKNEDLEMVNEDSATDSAKPPPIILSDVNDISEML AYLNSKIKRELFYYKTQRYGHVRVMVKSIEEFRKLVKTLNNDCVQYHTYQLKDDRAFRVVIKNLHFSTNL DEIKSDEESKGHVVRNISNLKSRATKTPLNMFYVDIEPNNKNRDNVKHIGNAIVNIEPPRKNNEIVQCYR CQEFGHTKSYCTKTYRCVKSSSRHPSNICPKNTEQPAKCANCYEEHAASYKGCRIYQELLSKKISYQSKI PEXQXRPEXKXFRNPAKFAPPNKPTYTQQSNDYQSYAQIAAGNSKTNTSLERIEQLLEKQSELTNNLLNM IMLLVNLCK G4p2 is composed of the endonuclease, reverse transcriptase and RNase H domains. Its N-terminus, about 50 aa, is missing because of a lack of sequence data. G4p2: FIKTNEIDIMLISETHFTSKPYIMVVGYDIIRADHRSFXLDLLIRRLKLDGLKFQIMDSIRENAMQAATV TIKCMHADVSVTAIYLPPRFALKEADFKNFFQKLGPQFILRGDFNDKHPWWGSRLTNPKGSELYKCIVNN SITTFSTGKPTYWPTNSRKIPNLKDFVAYFGIPESHMRIMESFDLSSDHSLIIVTYSTVAHILTKPYKVI SANTDINAFKSYLETDKIDHAVELLTEQDKVSYICTKLPARNSQSNQLYLSAEIRQQIQHKRNLRKRWQE TLYPADKRSYNKAASDLKKLLSTLRNESLAEYLRNLDPHSCNHEHNLWRATKYLKRPAKRNTVVRNCNGE WCRSDDEQAKAFAQHLHSVFQPNDIDNPQTEREVDNFLESPCQMSLPIRKISINEVSSEIKWLNSKKAPG SDKIDGITLKILPPKCVRFLTFIFNAMLRVDHFPSQWKCAEIIMILKPNKAENEVTSYRPISLLSIFSKV FEKILLKRMLPILDEFAIIPEHQFGFRRGHGTPEQCHRIINEILSAFESKKYCTATFLDVQQAFDRVWHD GLLYKIKKWLPAPYFLLLKSYLTNRHFYVQQKNEYSPLHFIKAGVPQGSVLGPVLYTLYTADMPVTNTCT VATYADDTAILATSSSKEEASQLLQAELRLIESWFLLWKIKVNALKSAQITFALRRGDCPEVSFNGSAIP QSNCIKYLGLHLDRRLTWKNHIKAKRQQLNQKSLKMTWLLGRKSATTLENKVRLYKAILKPVWTYGIQLW GTASNSNIEILQRYQSKILRQIVNAPFYISNASIHKDLGIPYVKEEIAKHSKKYIDRLRTHENNLALSLV NNNNNVRRLKRFHVLDLPDRY
|
|||
Derived: [1] (Consensus) |
|||
Download Sequence - Format: IG, EMBL, FASTA |
|||
References: |