Recent Posts

Question 1

Question 2

P1 KVTSIQAWV   2
P2 KVTSIQCWV   2.5
P3 KVTSIQDWV   4.5

P4 MTYDRVVAI   5

Now user wants to extract all the sequences in file2 that match with those from file

1. However, they are different by one amino acid. Output should contain peptides

and values from second file. Let us do it in R with a package called "fuzzyjoin".

Code is as follows (test1.txt = 1st file, test2.txt=2nd file above):

=========================

df1= read.csv("test1.txt", sep = "\t", stringsAsFactors = F, header = F)
df2= read.csv("test2.txt", sep = "\t", stringsAsFactors = F, header = F)

library(fuzzyjoin)

df3=stringdist_inner_join(df1, df2, by=c("V2"="V2"), max_dist=1, ignore_case=T)

===========================

output:

=================

> df3
  V1.x      V2.x V1.y      V2.y  V3
1   P4 KVTSIQhWV   P1 KVTSIQAWV 2.0
2   P4 KVTSIQhWV   P2 KVTSIQCWV 2.5
3   P4 KVTSIQhWV   P3 KVTSIQDWV 4.5
4   P5 MtYDRYVAI   P4 MTYDRVVAI 5.0

===================

Now output can be customized by subsetting df3 for the required information (columns)

Recent Posts

Links

Apr 21, 2018 - Extract matching peptides/short sequences with mismatches