R supports lookarounds (as of v 3.4.4). Lookarounds are basically for searching for a pattern that is in front or behind some thing and you do not care about what is before or behind. For eg let us say that you want to extract last names of all persons with John as first name. You can grep first for John and you would get both first name and last names of all Johns and also if any one as John in last name that also you would get. Now you would some how have to get out John at the start of your search. Lookaround address that problem. It would fetch all the Johns, but print only last names. Kind of neat. It has several options and is kind of confusing R supports both lookforward and lookbehind assertions (lookaround functions).
Let us say we have sentence with a mix of english words and non-english words (such as greek alphabets). Now the task is to break each one as string (str split in R). Let us do this in R. We do following:
- Create greek alphabets (beta, gamma, mu and phi)
- Put them in a string
- Now break the string into small words using look around in R
==========================================
> beta <- intToUtf8(0x03B2)
> gamma=intToUtf8(0x03B3)
> mu=intToUtf8(0x03BC)
> phi=intToUtf8(0x03D5)
> print(c(beta,gamma,mu,phi))
[1] "β" "γ" "μ" "ϕ"
> test=paste0(beta,"gene",gamma,"protein", beta, "me",beta,phi)
> test
[1] "βgeneγproteinβmeβϕ"
> library(stringr)
> gamma=intToUtf8(0x03B3)
> mu=intToUtf8(0x03BC)
> phi=intToUtf8(0x03D5)
> print(c(beta,gamma,mu,phi))
[1] "β" "γ" "μ" "ϕ"
> test=paste0(beta,"gene",gamma,"protein", beta, "me",beta,phi)
> test
[1] "βgeneγproteinβmeβϕ"
> library(stringr)
> strsplit(test,"(?<=\\W)(?=\\w)|(?<=\\w)(?=\\W)|(?<=\\W)(?=\\W)", perl = T)
[[1]]
[1] "β" "gene" "γ" "protein" "β" "me" "β" "ϕ"
[[1]]
[1] "β" "gene" "γ" "protein" "β" "me" "β" "ϕ"
===========================================