Thanks for the explanation and tips! I used your procedure and ended up with the same 131MB file. Interestingly I did not need to remove the “—” entries. I have been exchanging email with BGI and they indicated files could have significantly different number of entries (but I am surprised at >3x!). Is there any chance your sequencing had greater than 4x coverage? My VCF file is queued up and should be available in a few months which should help clarify what I am seeing.
Thanks for the explanation and tips! I used your procedure and ended up with the same 131MB file. Interestingly I did not need to remove the “—” entries. I have been exchanging email with BGI and they indicated files could have significantly different number of entries (but I am surprised at >3x!). Is there any chance your sequencing had greater than 4x coverage? My VCF file is queued up and should be available in a few months which should help clarify what I am seeing.
I don’t know. How do I find out?
I think the VCF would tell you if you had it. Another possibility would be using a lower quality threshold for calling SNPs, but that seems unlikely.