Earlier today I ran into a situation where I had to compare two data frames for some analysis I was doing. In particular I needed to identify the rows in data frame A which were not present in data frame B. I have used several different methods for this task in the past, but recently I have been using the anti_join function in the dplyr package.
I provided an example below using the sleep dataset.
I want to find all the rows of sleep.A not present in sleep.B based on the columns group and ID. The group and ID combinations present in rows 20, 2, and 4 of sleep.A are not present in sleep.B. Using anti_join we can confirm this as shown below.