It appears that if you try:
import msprime
ts = msprime.sim_mutations(msprime.sim_ancestry(20, ploidy=1, sequence_length=1000, random_seed=42), rate=1e-3, random_seed=42)
print(ts.num_sites)
print(ts.as_vcf(individuals=[0]))
All 5 sites in the tree sequence are exported, as well as the REF and ALT alleles over all the samples, even though only one haploid genome is actually present in the VCF. This seems the right behaviour to me: if the user wants to exclude individuals and their variable sites, they should simplify first. But I think this behaviour should be documented (possibly in https://tskit.dev/tskit/docs/stable/export.html?)
Also, it is relatively common to want to export a VCF of just the positions, but no data, so I think we should probably allow ts.write_vcf(file, individuals=[])?
It appears that if you try:
All 5 sites in the tree sequence are exported, as well as the REF and ALT alleles over all the samples, even though only one haploid genome is actually present in the VCF. This seems the right behaviour to me: if the user wants to exclude individuals and their variable sites, they should simplify first. But I think this behaviour should be documented (possibly in https://tskit.dev/tskit/docs/stable/export.html?)
Also, it is relatively common to want to export a VCF of just the positions, but no data, so I think we should probably allow
ts.write_vcf(file, individuals=[])?