How stable is gene-to-protein translation in a relatively identical medium? I.e. if we abstract away all the issues with RNA and somehow neutralize any interfering products from elsewhere, will a gene sequence always produce the same protein, and always produce it, whenever encountered at as specific place? Or is there something deeper where changes to the logic in some other, unrelated part of the DNA could directly affect the way this gene is expressed (i.e. not through their protein interfering with this one)?
Or maybe I don’t understand enough to even formulate the right question here. Or perhaps this subject simply hasn’t been researched and analyzed enough to give an answer to the above yet?
If the answer is simple, are there any known ratios and reliability rates?
There’s no particular hidden question; I’m not asking about designer babies or gengineered foodstuffs or anything like that. I’m academically curious about the fundamentals of DNA and genetic expression (and any comparison between this and programming, which I understand better, would be very nice), but hopelessly out of my depth and under-informed, to the point where I can’t even understand research papers or the ones they cite or the ones that those cite, and the only things I understand properly are by-order-of-historical-discovery-style textbooks (like traditional physics textbooks) that teach things that were obsolete long before my parents were born.
The genetic code—the relationship between base triplets in the reading frame of a messenger RNA and amino acids that come out of the ribosome that RNA gets threaded through – is at least as ancient as the most recent common ancestor of all life and is almost universal. There are living systems that use slightly different codons though – animal and fungal mitochondria, for example, have a varied lot of substitutions, and ciliate microbes have one substitution as well. If you were to move things back and forth between those systems, you would need to change things or else there would be problems.
If you avoid or compensate for those weird systems, you can move reading frames wherever you want and they will produce the same primary protein sequence. The interesting part is getting that sequence to be made and making sure it works in its new context.
At the protein level, some proteins require the proper context or cofactors or small molecules to fold properly. For example, a protein that depends on disulfide bonds to hold itself in the correct shape will never fold properly if it is expressed inside a bacterium or in the cytosol of a eukaryotic cell – it has to contain the destination-tag that causes it to be secreted into the membrane-bound spaces of the ER compartment that are kept as an oxidizing environment where such bonds can form.
At the translation level, eukaryotes and eubacteria have developed divergent methods for bringing the RNA and ribosome together, and keeping the RNA stable. Eukaryotes automatically add a chemically modified ‘cap’ to the front end of all RNAs they make, and require the presence of a particular sequence after the reading frame that allows everything after that point to get chopped off and replaced with a bunch of As (which the ubiquitous RNA-destroying enzymes mostly ignore). Proteins coat the poly-A, interact with cap-binding proteins, and this complex massively increases the rate at which the capped end gets fed into the ribosome. Eubacteria on the other hand do neither of these things, and a ribosome will bind to any point on the RNA that a particular ribosome-binding sequence appears allowing them to have multiple reading frames in the same RNA molecule under identical genetic control. If you don’t put in all the proper elements you will still get protein but less than you could have.
There’s also introns to consider. Eukaryotes cut these intervening sequences within the reading frames out, but different eukaryotes have slightly different machinery for recognizing them, so what is properly sliced out in one organism might not be in another. And if you put one into a bacterium it won’t be spliced at all. We solve this by any time that you are moving things around between different systems, only moving around the processed reading frame with no introns. (Though it turns out that the presence of introns actually increases the rate of export of the RNA from the nucleus to the rest of the cell for translation because export is coupled to splicing, but it’s not necessary, it just speeds it up.)
Everything said so far is actually quite easy to deal with, you just need to make sure that your favorite reading frame has the right basic elements around it for its new context. The big thing is making sure that your gene is actually expressed in its new context. You need to put in the right promoter elements upstream that bind the proteins necessary to both tag the DNA as to-be-read and anchor an RNA polymerase to actually make the transcript. In my lab we mostly just use existing promoters from other genes in yeast, but one of my labmates has actually used synthetic promoters with artificial activators around the stripped-down core of a natural promoter element to make a synthetic genetic oscillator. In animal cells people love using viral promoters because they are very strong and smaller than normal animal promoters, which can get rather large and are sometimes fragmented (especially into the introns), but normal promoters can be used too. The mileu of the cell will dictate if a certain promoter element is recognized and expressed – if the cell is making the right proteins that bind to and activate it, etc. We actually use that sometimes, putting a reading frame in front of say one of the GAL gene promoters that only turn on when you feed yeast galactose. Theres lots of post-transcriptional regulation of RNA stability too.
You also have to be careful about where you put things relative to other genes and other chromosomal features. If you put a transgene too close to a eukaryotic centromere (attachment point for fibers that pull chromosomes apart during cell division) it will not be expressed because the centromere condenses and silences DNA around it for quite a distance. If you stick two small promoter elements driving genes right next to each other, they can interact and wind up affecting each other’s expression (I’ve been having problems with this in my yeast). If you stick two genes very close to each other (second promoter right after the first reading frame) in series reading in the same direction on the DNA, unless you add a good ‘terminator’ element between them that makes the RNA polymerase fall off before it reads over the promoter of the second gene, the second gene’s expression can be somewhat suppressed because reading through its promoter keeps knocking off the proteins necessary to launch another polymerase down it.
On top of all these things, there are all kinds of dirty tricks that are rare but existent, like a gene in yeast where the 3-base reading frame suddenly stutters one base over halfway through the gene due to a ‘pseudoknot’ structure the RNA folds up into combined with a very rare codon that takes a long time to translate, letting the RNA slip one base over within the ribosome to a more common faster-translating codon before it actually reads the slow one. An individual gene probably doesn’t use such a dirty trick but they are around and if one exists you can bet some virus somewhere uses it – they do horrifying things with their nucleic acids to pack in overlapping genes or genes that make different things in different circumstances.
edit: that ‘dirty trick’ is sort of a special case of a wider-seen thing, where the relative concentration of different tRNA adapter molecules that constitute the actual mechanics of the genetic code affects how quickly different proteins are translated. Even in different organisms with the same code, two synonymous codons might have very different levels of the tRNA adapters in the cell and one could be translated a lot faster than the other in one organism. Sometimes we codon-optimize genes for particular organisms so that the gene is more efficient, but that can get expensive and sometimes it has bad side effects: our lab did that with the firefly luciferase gene that makes a luminescent protein, and it turned out that when it was translated extremely fast parts of the protein that normally folded independently one at a time interacted and folded together, screwing up its function.
In the end, what we always wind up doing is simplifying things. If you need to move things between very different contexts you strip a gene down to its uninterrupted reading frame and move that around with promoter elements and translation-enhancing elements appropriate to its new context. You make sure you have nice terminators and a little space between things. And try to insert things into known locations that you know work rather than randomly. Artificial regulation of a gene rather than using a natural promoter often leads to coarsely controlled expression because it hasn’t been optimized with all the subtle tricks, but you can almost always get them to work.
There are always surprises though. Otherwise it wouldn’t be research.
That was an awesome breakdown of things, thank you!
I’ve learned way more from this than from all my previous reading, without even including the data about what I didn’t know I don’t know and other meta.
Just for fun, here’s a couple of good-enough animations of various eukaryotic systems. Shows nothing of the constant jiggering back and forth of the molecules and makes it look far too directed, but it gives an idea of many of the things going on.
How stable is gene-to-protein translation in a relatively identical medium? I.e. if we abstract away all the issues with RNA and somehow neutralize any interfering products from elsewhere, will a gene sequence always produce the same protein, and always produce it, whenever encountered at as specific place? Or is there something deeper where changes to the logic in some other, unrelated part of the DNA could directly affect the way this gene is expressed (i.e. not through their protein interfering with this one)?
Or maybe I don’t understand enough to even formulate the right question here. Or perhaps this subject simply hasn’t been researched and analyzed enough to give an answer to the above yet?
If the answer is simple, are there any known ratios and reliability rates?
There’s no particular hidden question; I’m not asking about designer babies or gengineered foodstuffs or anything like that. I’m academically curious about the fundamentals of DNA and genetic expression (and any comparison between this and programming, which I understand better, would be very nice), but hopelessly out of my depth and under-informed, to the point where I can’t even understand research papers or the ones they cite or the ones that those cite, and the only things I understand properly are by-order-of-historical-discovery-style textbooks (like traditional physics textbooks) that teach things that were obsolete long before my parents were born.
The dreaded answer: ’Well, it depends...”
The genetic code—the relationship between base triplets in the reading frame of a messenger RNA and amino acids that come out of the ribosome that RNA gets threaded through – is at least as ancient as the most recent common ancestor of all life and is almost universal. There are living systems that use slightly different codons though – animal and fungal mitochondria, for example, have a varied lot of substitutions, and ciliate microbes have one substitution as well. If you were to move things back and forth between those systems, you would need to change things or else there would be problems.
If you avoid or compensate for those weird systems, you can move reading frames wherever you want and they will produce the same primary protein sequence. The interesting part is getting that sequence to be made and making sure it works in its new context.
At the protein level, some proteins require the proper context or cofactors or small molecules to fold properly. For example, a protein that depends on disulfide bonds to hold itself in the correct shape will never fold properly if it is expressed inside a bacterium or in the cytosol of a eukaryotic cell – it has to contain the destination-tag that causes it to be secreted into the membrane-bound spaces of the ER compartment that are kept as an oxidizing environment where such bonds can form.
At the translation level, eukaryotes and eubacteria have developed divergent methods for bringing the RNA and ribosome together, and keeping the RNA stable. Eukaryotes automatically add a chemically modified ‘cap’ to the front end of all RNAs they make, and require the presence of a particular sequence after the reading frame that allows everything after that point to get chopped off and replaced with a bunch of As (which the ubiquitous RNA-destroying enzymes mostly ignore). Proteins coat the poly-A, interact with cap-binding proteins, and this complex massively increases the rate at which the capped end gets fed into the ribosome. Eubacteria on the other hand do neither of these things, and a ribosome will bind to any point on the RNA that a particular ribosome-binding sequence appears allowing them to have multiple reading frames in the same RNA molecule under identical genetic control. If you don’t put in all the proper elements you will still get protein but less than you could have.
There’s also introns to consider. Eukaryotes cut these intervening sequences within the reading frames out, but different eukaryotes have slightly different machinery for recognizing them, so what is properly sliced out in one organism might not be in another. And if you put one into a bacterium it won’t be spliced at all. We solve this by any time that you are moving things around between different systems, only moving around the processed reading frame with no introns. (Though it turns out that the presence of introns actually increases the rate of export of the RNA from the nucleus to the rest of the cell for translation because export is coupled to splicing, but it’s not necessary, it just speeds it up.)
Everything said so far is actually quite easy to deal with, you just need to make sure that your favorite reading frame has the right basic elements around it for its new context. The big thing is making sure that your gene is actually expressed in its new context. You need to put in the right promoter elements upstream that bind the proteins necessary to both tag the DNA as to-be-read and anchor an RNA polymerase to actually make the transcript. In my lab we mostly just use existing promoters from other genes in yeast, but one of my labmates has actually used synthetic promoters with artificial activators around the stripped-down core of a natural promoter element to make a synthetic genetic oscillator. In animal cells people love using viral promoters because they are very strong and smaller than normal animal promoters, which can get rather large and are sometimes fragmented (especially into the introns), but normal promoters can be used too. The mileu of the cell will dictate if a certain promoter element is recognized and expressed – if the cell is making the right proteins that bind to and activate it, etc. We actually use that sometimes, putting a reading frame in front of say one of the GAL gene promoters that only turn on when you feed yeast galactose. Theres lots of post-transcriptional regulation of RNA stability too.
You also have to be careful about where you put things relative to other genes and other chromosomal features. If you put a transgene too close to a eukaryotic centromere (attachment point for fibers that pull chromosomes apart during cell division) it will not be expressed because the centromere condenses and silences DNA around it for quite a distance. If you stick two small promoter elements driving genes right next to each other, they can interact and wind up affecting each other’s expression (I’ve been having problems with this in my yeast). If you stick two genes very close to each other (second promoter right after the first reading frame) in series reading in the same direction on the DNA, unless you add a good ‘terminator’ element between them that makes the RNA polymerase fall off before it reads over the promoter of the second gene, the second gene’s expression can be somewhat suppressed because reading through its promoter keeps knocking off the proteins necessary to launch another polymerase down it.
On top of all these things, there are all kinds of dirty tricks that are rare but existent, like a gene in yeast where the 3-base reading frame suddenly stutters one base over halfway through the gene due to a ‘pseudoknot’ structure the RNA folds up into combined with a very rare codon that takes a long time to translate, letting the RNA slip one base over within the ribosome to a more common faster-translating codon before it actually reads the slow one. An individual gene probably doesn’t use such a dirty trick but they are around and if one exists you can bet some virus somewhere uses it – they do horrifying things with their nucleic acids to pack in overlapping genes or genes that make different things in different circumstances.
edit: that ‘dirty trick’ is sort of a special case of a wider-seen thing, where the relative concentration of different tRNA adapter molecules that constitute the actual mechanics of the genetic code affects how quickly different proteins are translated. Even in different organisms with the same code, two synonymous codons might have very different levels of the tRNA adapters in the cell and one could be translated a lot faster than the other in one organism. Sometimes we codon-optimize genes for particular organisms so that the gene is more efficient, but that can get expensive and sometimes it has bad side effects: our lab did that with the firefly luciferase gene that makes a luminescent protein, and it turned out that when it was translated extremely fast parts of the protein that normally folded independently one at a time interacted and folded together, screwing up its function.
In the end, what we always wind up doing is simplifying things. If you need to move things between very different contexts you strip a gene down to its uninterrupted reading frame and move that around with promoter elements and translation-enhancing elements appropriate to its new context. You make sure you have nice terminators and a little space between things. And try to insert things into known locations that you know work rather than randomly. Artificial regulation of a gene rather than using a natural promoter often leads to coarsely controlled expression because it hasn’t been optimized with all the subtle tricks, but you can almost always get them to work.
There are always surprises though. Otherwise it wouldn’t be research.
That was an awesome breakdown of things, thank you!
I’ve learned way more from this than from all my previous reading, without even including the data about what I didn’t know I don’t know and other meta.
Any time. Feel free to message with other questions too.
Just for fun, here’s a couple of good-enough animations of various eukaryotic systems. Shows nothing of the constant jiggering back and forth of the molecules and makes it look far too directed, but it gives an idea of many of the things going on.
https://www.youtube.com/watch?v=yqESR7E4b_8