In some cases you may want to include all the rows in one or both data sets in the output, even if there is not a matching row in the other. The all.x and all.y arguments let you do this. In the rows without matches, merge puts NAs in the columns with no matching data.
This is a generic function but currently the only substantive method for it works on data frames. The default method converts x and y to data frames and calls the method for data frames.
merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = F, all.x = all, all.y = all, suffixes = c(".x", ".y"))
# create 2 data frames, one with information on authors # and one concerning books. Use merge to relate the # names of the books written to attributes of the authors. # Note that some authors have no books listed and some have # several books.authors <- data.frame( FirstName = c("Lorne","Loren","Robin", "Robin","Billy"), LastName = c("Green","Jaye","Green", "Howe","Jaye"), Age = c(82,40,45,2,40), Income=c(1200000,40000,25000,0,27500), Home=c("California","Washington","Washington", "Alberta", "Washington")) books <- data.frame( AuthorFirstName=c("Lorne","Loren","Loren", "Loren","Robin","Rich"), AuthorLastName=c("Green","Jaye","Jaye","Jaye", "Green","Calaway"), Book=c("Bonanza","Midwifery", "Gardening", "Perennials", "Who_dun_it?","Splus"))
# Look at all cases in which the author is in both the # authors and books datasets. Match author by both first # and last names -- these have different labels in the 2 # datasets but are in the first 2 columns of both.
merge(authors, books, by = 1:2)
# Produces the following output: FirstName LastName Age Income Home 1 Lorne Green 82 1200000 California 2 Loren Jaye 40 40000 Washington 3 Loren Jaye 40 40000 Washington 4 Loren Jaye 40 40000 Washington 5 Robin Green 45 25000 Washington Book 1 Bonanza 2 Midwifery 3 Gardening 4 Perennials 5 Who_dun_it?
# Next, make sure all authors in the authors dataset are # listed, even if there is no book listed for them. Using # by.x and by.y may be a more reliable way to handle # cases in which the datasets have different # names for the columns to match by.
merge(authors, books, by.x = c("FirstName","LastName"), by.y = c("AuthorFirstName","AuthorLastName"), all.x = T)
# Produces the following: FirstName LastName Age Income Home 1 Billy Jaye 40 27500 Washington 2 Lorne Green 82 1200000 California 3 Loren Jaye 40 40000 Washington 4 Loren Jaye 40 40000 Washington 5 Loren Jaye 40 40000 Washington 6 Robin Green 45 25000 Washington 7 Robin Howe 2 0 Alberta Book 1 NA 2 Bonanza 3 Midwifery 4 Gardening 5 Perennials 6 Who_dun_it? 7 NA
# Use the state.x77 dataset to relate the income of the author # to the median income of his or her home state (we have no # information on Alberta, a Canadian province). Note the use # of "row.names" where the "column" to match on is not a variable # in the dataset but is the names of the rows. Both datasets have # a column called "Income" which is not a key variable, so supply # the suffixes argument to distinguish between them in the # output (without suffixes they would be labeled "Income.x" # and "Income.y").
state.data <- data.frame(state.x77) merge(authors, state.data[,"Income", drop = F], by.x = "Home", by.y = "row.names", all.x = T, suffixes = c("Author","State"))
# Produces the following: FirstName LastName Age IncomeAuthor Home 1 Robin Howe 2 0 Alberta 2 Lorne Green 82 1200000 California 3 Loren Jaye 40 40000 Washington 4 Robin Green 45 25000 Washington 5 Billy Jaye 40 27500 Washington IncomeState 1 NA 2 5114 3 4864 4 4864 5 4864