Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Any regular expressions gurus here?
#1
I want to take a string (a protein sequence) and replace all the instances of "K*" and "R*" (where * is anything) with "K,*" and "R,*" respectively. In other words, every time a K or an R appears, I want to place a comma after it. The exception is any time there is a P following the K or R - then no insert. This is how the enzyme trypsin acts. I'm trying to do this in python, but I'm not very good at regular expressions.

I could iterate through the sequences, but I think regex will be much faster.

For instance:

MVLTIYPDELVQIVSDKIASNRGKITLNQLWDISGKPFDLSDKKVKQFVLSCVILKKDI

MVLTIYPDELVQIVSDK,IASNR,GK,ITLNQLWDISGKPFDLSDK,K,VK,QFVLSCVILK,K,DI

Any help is appreciated. I will keep reading and post back if I figure it out.

Thanks.
Reply
#2
How about this one?
I'm not sure how Python does regexps. This should be suitable for Perl-alikes.

s/([KR])[^P]/\1,/


If I got the memory part of the replacement expression wrong, you can do it in two passes with

s/K[^P]/K,/

s/R[^P]/R,/
Reply
#3
TheTominator wrote:
How about this one?
I'm not sure how Python does regexps. This should be suitable for Perl-alikes.

s/([KR])[^P]/\1,/

...

s/([KR])([^P])/\1,\2/

will not strip off the non-P following a K-or-R match
Reply
#4
TheCaber wrote:
[quote=TheTominator]
How about this one?
I'm not sure how Python does regexps. This should be suitable for Perl-alikes.

s/([KR])[^P]/\1,/

...

s/([KR])([^P])/\1,\2/

will not strip off the non-P following a K-or-R match
Oh yeah. Good catch.

I always do programming in two stages.

Stage 1: Bugging
Stage 2: Debugging
Reply
#5
Thanks.

This is the python code that did it. I could not do it in one pass. There is no global flag for python for regex.

def trypsinize(proteins):

p = re.compile(r'([K|R])([^P])'Wink
q = re.compile(r'([K|R])([^,P])'Wink

output = []
for protein in proteins:
name = protein[0]
sequence = protein[1]
a = p.sub(r'\1,\2',sequence)
b = q.sub(r'\1,\2',a)
output.append([name,b.split(','Wink])
return output
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)