Yeah, you break the big strand randomly up and sequence the starts and ends of each of the little fragment. Then you have to assemble all the little bits together, which is called either assembling a genome (if you don’t have anything else to work off of) or alignment (if you’re just comparing you sequenced data to an existing reference genome). These are computationally nontrivial, though alignment at least is well solved these days through tools like bwa or STAR. For something like wastewater survey you do ”meta genomics” since it comes from many bacterial/viral genomes.
This process wasn’t really enough to get a complete picture of the genome. The reads were too short and so some parts of the genome were hard to read: anything that is highly repetitive cant be assembled accurately since it all looks the same. The recent “T2T” genome is really the first complete human genome we have despite being two decades after the human genome project finished. But the earlier reference was good enough for most things. Actually the very start and end of the chromosomes, the telomeres, were some of the hardest part part to sequence, hence the name of “telomere 2 telomere” genome for the new build. Older builds have like a million ”N”s at the start and end of every chromosome, denoting an unknown sequence (ACTG all possible) in the telomeres.
Yeah, you break the big strand randomly up and sequence the starts and ends of each of the little fragment. Then you have to assemble all the little bits together, which is called either assembling a genome (if you don’t have anything else to work off of) or alignment (if you’re just comparing you sequenced data to an existing reference genome). These are computationally nontrivial, though alignment at least is well solved these days through tools like bwa or STAR. For something like wastewater survey you do ”meta genomics” since it comes from many bacterial/viral genomes.
This process wasn’t really enough to get a complete picture of the genome. The reads were too short and so some parts of the genome were hard to read: anything that is highly repetitive cant be assembled accurately since it all looks the same. The recent “T2T” genome is really the first complete human genome we have despite being two decades after the human genome project finished. But the earlier reference was good enough for most things. Actually the very start and end of the chromosomes, the telomeres, were some of the hardest part part to sequence, hence the name of “telomere 2 telomere” genome for the new build. Older builds have like a million ”N”s at the start and end of every chromosome, denoting an unknown sequence (ACTG all possible) in the telomeres.