FragGeneScanPlus makes an error when translating protein fragments in negative frames, as it doesn't transform the sequence to its inverse complement before translating.
Attached to this issue is a small FASTA file, which is wrongly translated to:
ELNLNILSFNTNWVRTVSTPGSTFLTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH
While the solution should be:
MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL
Full test example:
Translating the following sequence with length 551:
TGTTCTGCTTCCTTGTACATGTGAGGACTAGAGTTGAATTTGAATATCCTGTCCTTCAACACGAACTGGGTACGTACATAGGTTTCTACACCCGGGTCCACTTTTTAGCTCACCTGTCTGAATGTCAAAAATTGCTTCGTGCAATGGACATTCCACAGTTTGATCTTCGATAAATCCTTCCGTTAATAAGGCATACGCATGAGGGCAAACATTTTCGATAGCGAAGTAATTTTCATCGACAAAAAAAACACCAATTTTTTTCCCTTCAACTTCGACGGCTTTCGGCTCATCTTCGCTAACGTCACCCTGCTGACAAACTGAGATCCAACTCATACCTTGCGTCCTCATTTTGTTTTATATACAAAACATAATTTGATTTTCAAAACACAAGCTAAGCATAATCCTCTTGATTAATTTTTGTCAAAGTAAAAATAAACATTAAAATCAATTGATTAATAAATTTTAAATAATTTGTTACGTTTCAAGTCAGAAACAATGTTTTAAATATAAAAATTGTTTTATGTAATCTTTATAATTACAATAGTTCTAAA
Performing 6-frame translation:
+1: CSASLYM*GLELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPHFVLYTKHNLIFKTQAKHNPLD*FLSK*K*TLKSID**ILNNLLRFKSETMF*I*KLFYVIFIITIVL
+2: VLLPCTCED*S*I*ISCPSTRTGYVHRFLHPGPLFSSPV*MSKIASCNGHSTV*SSINPSVNKAYA*GQTFSIAK*FSSTKKTPIFFPSTSTAFGSSSLTSPC*QTEIQLIPCVLILFYIQNII*FSKHKLSIILLINFCQSKNKH*NQLINKF*IICYVSSQKQCFKYKNCFM*SL*LQ*F*
+3: FCFLVHVRTRVEFEYPVLQHELGTYIGFYTRVHFLAHLSECQKLLRAMDIPQFDLR*ILPLIRHTHEGKHFR*RSNFHRQKKHQFFSLQLRRLSAHLR*RHPADKLRSNSYLASSFCFIYKT*FDFQNTS*A*SS*LIFVKVKINIKIN*LINFK*FVTFQVRNNVLNIKIVLCNLYNYNSSK
-1: FRTIVIIKIT*NNFYI*NIVSDLKRNKLFKIY*SIDFNVYFYFDKN*SRGLCLACVLKIKLCFVYKTK*GRKV*VGSQFVSRVTLAKMSRKPSKLKGKKLVFFLSMKITSLSKMFALMRMPY*RKDLSKIKLWNVHCTKQFLTFRQVS*KVDPGVETYVRTQFVLKDRIFKFNSSPHMYKEAE
-2: LELL*L*RLHKTIFIFKTLFLT*NVTNYLKFINQLILMFIFTLTKINQEDYA*LVF*KSNYVLYIKQNEDARYELDLSLSAG*R*RR*AESRRS*REKNWCFFCR*KLLRYRKCLPSCVCLINGRIYRRSNCGMSIARSNF*HSDR*AKKWTRV*KPMYVPSSC*RTGYSNSTLVLTCTRKQN
-3: *NYCNYKDYIKQFLYLKHCF*LET*QII*NLLIN*F*CLFLL*QKLIKRIMLSLCFENQIMFCI*NKMRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL*SSHVQGSRT
Solution FragGeneScanPlus:
ELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH
Correct solution (using reverse complement of ORF):
MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL
Generating code (don't forget test_fasta.txt):
#!/usr/bin/env python3
from Bio import SeqIO
seq = SeqIO.read('test_fasta.txt', 'fasta').seq
print('Translating the following sequence with length {}:'.format(len(seq)))
print('\n{}\n\n'.format(seq))
print('Performing 6-frame translation:\n')
for s, strand in ((seq, 1), (seq.reverse_complement(), -1)):
for frame in range(3):
print('{:+2d}: {}'.format((frame + 1) * strand, s[frame:].translate(table=11)))
print()
# Known ORF on the negative strand (visible in the 6-frame translation on -3)
orf_start = 28
orf_stop = 348
orf = seq[(orf_start + 2):orf_stop]
print('Solution FragGeneScanPlus:\n{}\n'.format(orf.translate(table=11)))
orf = orf.reverse_complement()
print('Correct solution (using reverse complement of ORF):\n{}\n'.format(orf.translate(table=11)))
FragGeneScanPlus makes an error when translating protein fragments in negative frames, as it doesn't transform the sequence to its inverse complement before translating.
Attached to this issue is a small FASTA file, which is wrongly translated to:
While the solution should be:
Full test example:
Generating code (don't forget test_fasta.txt):