12-29-2008, 08:24 AM
I want to take a string (a protein sequence) and replace all the instances of "K*" and "R*" (where * is anything) with "K,*" and "R,*" respectively. In other words, every time a K or an R appears, I want to place a comma after it. The exception is any time there is a P following the K or R - then no insert. This is how the enzyme trypsin acts. I'm trying to do this in python, but I'm not very good at regular expressions.
I could iterate through the sequences, but I think regex will be much faster.
For instance:
MVLTIYPDELVQIVSDKIASNRGKITLNQLWDISGKPFDLSDKKVKQFVLSCVILKKDI
MVLTIYPDELVQIVSDK,IASNR,GK,ITLNQLWDISGKPFDLSDK,K,VK,QFVLSCVILK,K,DI
Any help is appreciated. I will keep reading and post back if I figure it out.
Thanks.
I could iterate through the sequences, but I think regex will be much faster.
For instance:
MVLTIYPDELVQIVSDKIASNRGKITLNQLWDISGKPFDLSDKKVKQFVLSCVILKKDI
MVLTIYPDELVQIVSDK,IASNR,GK,ITLNQLWDISGKPFDLSDK,K,VK,QFVLSCVILK,K,DI
Any help is appreciated. I will keep reading and post back if I figure it out.
Thanks.