Well, I had a request to identify common and not-common elements between two frames in python (pandas). Not-common means not shared between either frames. In pandas, outer join terminology is confusing for SQL folks. Outer join means union in Pandas, in SQL, outer join means symmetric difference. Since no outer join (in terms of SQL) in pandas, we have a round about way. Let us go that way.
Data frame1:
gene1 1.5183
gene2 1.49
gene3 1.4991
gene4 1.4104
gene5 0.964
gene6 59.19
gene7 118.4
Data frame 2:
gene1 1.5052
gene2 1.505
gene3 1.5037
gene8 1.7976
gene9 1.3445
gene10 117.68
gene6 60.3
gene7 116.99
Now user wants common elements between data frame 1 and data frame 2. This is easy in pandas. User also wants elements unique to each dataframe. This is tricky.
Expected output of common elements:
gene1 1.5183 1.5052
gene2 1.49 1.505
gene3 1.4991 1.5037
gene6 59.19 60.3
gene7 118.4 116.99
Expected outcome of unique elements:
gene4 1.4104 –
gene5 0.964 –
gene8 – 1.7976
gene9 – 1.3445
gene10 – 117.68
Code:
----------------------
import pandas as pd
test1=pd.read_csv("test1.txt", sep="\t", header=None)
test1.columns=["a","b"]
test2=pd.read_csv("test2.txt", sep="\t", header=None)
test2.columns=["a","b"]
pd.merge(test1, test2, on='a', how='inner') # prints common elements
a b_x b_y
0 gene1 1.5183 1.5052
1 gene2 1.4900 1.5050
2 gene3 1.4991 1.5037
3 gene6 59.1900 60.3000
4 gene7 118.4000 116.9900
outer_common=pd.merge(test1, test2, on='a', how='outer', indicator=True)
outer_common_nc=outer_common[outer_common['_merge']!="both"]
outer_common_nc.iloc[:,0:3] # prints unique elements.
a b_x b_y
3 gene4 1.4104 NaN
4 gene5 0.9640 NaN
7 gene8 NaN 1.7976
8 gene9 NaN 1.3445
9 gene10 NaN 117.6800
=============================
indicator in merge function gives the element presence in each data frame. First we created a frame with union (outer join in pandas). Then we filtered the data frame for elements that are not in both frames. After that we removed unwanted _merge column. Ofcourse, one can always replace NaN with whatever user wants.