Let us say you have two files one with peptides/short sequences of interest and another with expected peptide. Experimentally derived sequence may or may not match with calculated sequence.  These differences are as short as one letter or as long as entire sequence except one or two bases/aminoacids.  Let us say we have a file with following information:


P1   SLVFLPFnT
P2   KLLLAtKSL    
P3   sIWKHATPV    
P4   KVTSIQhWV
P5   MtYDRYVAI
 
and another file with following information:
 
P1 KVTSIQAWV   2
P2 KVTSIQCWV   2.5
P3 KVTSIQDWV   4.5 
P4 MTYDRVVAI   5
 
Now user wants to extract all the sequences in file2 that match with those from file
1. However, they are different by one amino acid. Output should contain peptides
and values from second file. Let us do it in R with a package called "fuzzyjoin".
 
Code is as follows (test1.txt = 1st file, test2.txt=2nd file above):
 
========================= 
df1= read.csv("test1.txt", sep = "\t", stringsAsFactors = F, header = F)
df2= read.csv("test2.txt", sep = "\t", stringsAsFactors = F, header = F) 
library(fuzzyjoin) 
df3=stringdist_inner_join(df1, df2, by=c("V2"="V2"), max_dist=1, ignore_case=T)
===========================
 
output:
================= 
> df3
  V1.x      V2.x V1.y      V2.y  V3
1   P4 KVTSIQhWV   P1 KVTSIQAWV 2.0
2   P4 KVTSIQhWV   P2 KVTSIQCWV 2.5
3   P4 KVTSIQhWV   P3 KVTSIQDWV 4.5
4   P5 MtYDRYVAI   P4 MTYDRVVAI 5.0 
===================

Now output can be customized by subsetting df3 for the required information (columns)