Biologist's bioinformatics notes

Tidyverse provides a variety of functions and these keep evolving. One of the interesting functions is "uncount". Uncount duplicates the rows based on the value from a column. Let us take an example, with a simple data frame.

====================================================================

> df=data.frame("alphabet"=c("A","A","B","B"))

====================================================================

Data frame has single column, with two "A" and two "B" and column name is alphabet. Now let us print the data frame

===================================================================

> df
alphabet
1        A
2        A
3        B
4        B

====================================================================

One of the common function is to count A and B, print the number (frequency). This is easy and supported by multiple functions in R. Let us do it in dplyr.

==================================================================

> df1=df %>%
+     group_by(alphabet) %>%
+     count()
> df1
# A tibble: 2 x 2
# Groups:   alphabet [2]
alphabet     n
<chr>    <int>
1 A            2
2 B            2

==================================================================

There are some times, we need to rebuild the data frame or expand rows as per column n. Let us do this.

========================================

> df1 %>%
+     uncount(n)
# A tibble: 4 x 1
# Groups:   alphabet [2]
alphabet
<chr>
1 A
2 A
3 B
4 B

======================================

Now this function I have used elsewhere in the blog.

Recent Posts

Links

Aug 1, 2020 - Uncount function in R