Restriction–Modification Systems with Specificity GGATC, GATGC and GATGG. Part 1. Evolution and Ecology

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The evolution of proteins from restriction–modification systems containing an endonuclease domain of the RE_AlwI family and either two DNA methyltransferases, each with a domain of the MethyltransfD12 family, or one DNA methyltransferase with two domains of this family was studied. It was found that all such systems recognize one of three DNA sequences, namely GGATC, GATGC or GATGG, and the restriction endonucleases of these systems are divided by sequence similarity into three clades that unambiguously correspond to specificities. The DNA methyltransferase domains of these systems are divided into two groups based on sequence similarity, with two domains of each system belonging to different groups. Within each group, the domains are divided into three clades according to their specificity. Evidence of multiple interspecific horizontal transfers of systems as a whole is found, as well as evidence of gene transfer between systems, including transfer of one of the DNA methyltransferases with a change in specificity. Evolutionary relationships of DNA methyltransferases from such systems with other DNA methyltransferases, including orphan DNA methyltransferases, were revealed.

Full Text

Restricted Access

About the authors

S. A. Spirin

Lomonosov Moscow State University; Higher School of Economics; Scientific Research Institute of System Development

Author for correspondence.
Email: sas@belozersky.msu.ru
Russian Federation, Moscow; Moscow; Moscow

I. S. Rusinov

Lomonosov Moscow State University

Email: sas@belozersky.msu.ru
Russian Federation, Moscow

O. L. Makarikova

Moscow Institute of Physics and Technology

Email: sas@belozersky.msu.ru
Russian Federation, Moscow

A. V. Alexeevsky

Lomonosov Moscow State University; Scientific Research Institute of System Development

Email: sas@belozersky.msu.ru
Russian Federation, Moscow; Moscow

A. S. Karyagina

Lomonosov Moscow State University; Gamaleya National Research Center for Epidemiology and Microbiology; All-Russia Research Institute of Agricultural Biotechnology

Email: sas@belozersky.msu.ru
Russian Federation, Moscow; Moscow; Moscow

References

  1. Williams, R. J. (2003) Restriction endonucleases: classification, properties, and applications, Mol. Biotechnol., 23, 225-244, https://doi.org/10.1385/mb:23:3:225.
  2. Roberts, R. J. (2003) A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes, Nucleic Acids Res., 31, 1805-1812, https://doi.org/10.1093/nar/gkg274.
  3. Madhusoodanan, U. K., and Rao, D. N. (2010) Diversity of DNA methyltransferases that recognize asymmetric target sequences, Crit. Rev. Biochem. Mol. Biol., 45, 125-145, https://doi.org/10.3109/10409231003628007.
  4. Vasu, K., and Nagaraja, V. (2013) Diverse functions of restriction-modification systems in addition to cellular defense, Microbiol. Mol. Biol. Rev., 77, 53-72, https://doi.org/10.1128/mmbr.00044-12.
  5. Fokina, A. S., Karyagina, A. S., Rusinov, I. S., Moshensky, D. M., Spirin, S. A., and Alexeevski, A. V. (2023) Evolution of restriction–modification systems consisting of one restriction endonuclease and two DNA methyltransferases, Biochemistry (Moscow), 88, 253-261, https://doi.org/10.1134/S0006297923020086.
  6. Mistry, J., Chuguransky, S., Williams, L., Qureshi, M., Salazar, G. A., Sonnhammer, E. L. L., Tosatto, S. C. E., Paladin, L., Raj, S., Richardson, L. J., Finn, R. D., and Bateman, A. (2020) Pfam: the protein families database in 2021, Nucleic Acids Res., 49, D412-D419, https://doi.org/10.1093/nar/gkaa913.
  7. Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2014) REBASE – a database for DNA restriction and modification: enzymes, genes and genomes, Nucleic Acids Res., 43, D298-D299, https://doi.org/10.1093/nar/ gku1046.
  8. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res., 32, 1792-1797, https://doi.org/10.1093/nar/gkh340.
  9. Lefort, V., Desper, R., and Gascuel, O. (2015) FastME 2.0: A comprehensive, accurate, and fast distance-based phylogeny inference program, Mol. Biol. Evol., 32, 2798-2800, https://doi.org/10.1093/molbev/msv150.
  10. Kumar, S., Stecher, G., and Tamura, K. (2016) MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets, Mol. Biol. Evol., 33, 1870-1874, doi: 10.1093/molbev/msw054.
  11. Letunic, I., and Bork, P. (2021) Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation, Nucleic Acids Res., 49, W293-W296, https://doi.org/10.1093/nar/gkab301.
  12. Li, W., and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, https://doi.org/10.1093/bioinformatics/btl158.
  13. Burge, C., Campbell, A. M., and Karlin, S. (1992) Over- and under-representation of short oligonucleotides in DNA sequences, Proc. Natl. Acad. Sci. USA, 89, 1358-1362, https://doi.org/10.1073/pnas.89.4.1358.
  14. Rusinov, I. S., Ershova, A. S., Karyagina, A. S., Spirin, S. A., and Alexeevski, A. V. (2018) Comparison of methods of detection of exceptional sequences in prokaryotic genomes, Biochemistry (Moscow), 83, 129-139, https://doi.org/10.1134/S0006297918020050.
  15. Karlin, S., Burge, C., and Campbell, A. M. (1992) Statistical analyses of counts and distributions of restriction sites in DNA sequences, Nucleic Acids Res., 20, 1363-1370, https://doi.org/10.1093/nar/20.6.1363.
  16. Rusinov, I., Ershova, A., Karyagina, A., Spirin, S., and Alexeevski, A. (2015) Lifespan of restriction-modification systems critically affects avoidance of their recognition sites in host genomes, BMC Genomics, 16, 1-15, https://doi.org/10.1186/s12864-015-2288-4.
  17. Brézellec, P., Hoebeke, M., Hiet, M. S., Pasek, S., and Ferat, J. L. (2006) DomainSieve: a protein domain-based screen that led to the identification of dam-associated genes with potential link to DNA maintenance, Bioinformatics, 22, 1935-1941, https://doi.org/10.1093/bioinformatics/btl336.
  18. Murray, N. E. (2002) 2001 Fred Griffith review lecture. Immigration control of DNA in bacteria: self versus non-self, Microbiology, 148, 3-20, https://doi.org/10.1099/00221287-148-1-3.
  19. Friedrich, T., Fatemi, M., Gowhar, H., Leismann, O., and Jeltsch, A. (2000) Specificity of DNA binding and methylation by the M.FokI DNA methyltransferase, Biochim. Biophys. Acta, 1480, 145-159, https://doi.org/10.1016/s0167-4838(00)00065-0.
  20. Horton, J. R., Liebert, K., Bekes, M., Jeltsch, A., and Cheng, X. (2006) Structure and substrate recognition of the Escherichia coli DNA adenine methyltransferase, J. Mol. Biol., 358, 559-570, https://doi.org/10.1016/j.jmb.2006.02.028.

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Appendix 1
Download (75KB)
3. Appendix 2
Download (278KB)
4. Appendix 3
Download (2MB)
5. Appendix 4
Download (19KB)
6. Fig. 1. Phylogenetic trees of ERs and MTases from three R-M systems with two MTases and three with one (fused) MTase. The name is followed by the recognisable sequence according to REBASE. a - Tree constructed from ER sequences. Asterisk indicates enzymes from systems with a fused MTase. b - Tree constructed by MTase domain sequences. For domains from separate MTases the names of these MTases from REBASE are used, for domains of fusion MTases the suffixes N for N-terminal domains or C for C-terminal domains are assigned to the names of proteins on the tree. The letters A and B denote two groups of MTases

Download (230KB)
7. Fig. 2. ER tree from P-M systems with GATGC specificity with the gene order marked. Cluster representatives were selected based on 70% sequence identity. The letter F at the beginning of the name means that the MTases of the system are fused, gene order is labelled with three letters, similar to the table. Numbers on the branches denote bootstrap support

Download (765KB)
8. Fig. 3. Phylogenetic trees of MTases of MethyltransfD12 family from P-M systems with specificities GGATC, GATGC and GATGG and close to them. a - Group A MTase tree, b - Group B MTase tree. Branches with bootstrap support less than 40% have been removed. Monophyletic groups of MTases with the same specificity are combined and indicated by rhombuses; for each such group the recognition site is indicated, then, in parentheses, the composition of systems occurring in the group, where M stands for MTase, 2M - two MTases, R - restriction endonuclease, MR - fusion bifunctional protein including methyltransferase and endonuclease domains. If systems have ERs in the composition, the identifiers of the catalytic domain families found in these ERs are given in other brackets; Unk means that Pfam profiles do not find domains in the ER sequence. The number of systems in which the indicated site was experimentally confirmed by PacBio technology is given after the composition. The MTase B tree includes the M.EcoT4Dam MTase whose GATC specificity was not confirmed by PacBio data

Download (385KB)
9. Fig. 4. Phylogenetic tree of ERs close to ERs with specificities GGATC, GATGC, and GATGG. Clades consisting of ERs of the same specificity are indicated by rhombuses. The recognised sequence is followed by the number of ERs of this clade that are part of systems with MTases with this specificity confirmed by PacBio

Download (146KB)

Copyright (c) 2025 Russian Academy of Sciences