Download PDF
Translations (PDF)
Tidy data is a way to organize tabular data in a consistent data structure across packages. A table is tidy if:
Tibbles are a table format provided by the tibble package. They inherit the data frame class, but have improved behaviors:
Subset a new tibble with ]
, a vector with [[
and $
.
No partial matching when subsetting columns.
Display concise views of the data on one screen.
options(tibble.print_max = n, tibble.print_min = m, tibble.width = Inf)
: Control default display settings.
View()
or glimpse()
: View the entire data set.
tibble(...)
: Construct by columns.
tribble(...)
: Construct by rows.
as_tibble(x, ...)
: Convert a data frame to a tibble.
enframe(x, name = "name", value = "value")
: Convert a named vector to a tibble. Also deframe()
.
is_tibble(x)
: Test whether x is a tibble.
Pivot data to reorganize values into a new layout.
pivot_longer(data, cols, name_to = "name", values_to = "value", values_drop_na = FALSE)
: “Lengthen” data by collapsing several columns into two.
table4a
looks like the following:# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
names_to
column and values to a new values_to
column. The output of pivot_longer()
will look like the following:pivot_wider(data, name_from = "name", values_from = "value")
: The inverse of pivot_longer()
. “Widen” data by expanding two columns into several.
table2
looks like the following:# A tibble: 12 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
pivot_wider()
will look like the following:# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
Use these functions to split or combine cells into individual, isolated values.
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
: Collapse cells across several columns into a single column.
table5
looks like the following:# A tibble: 6 × 4
country century year rate
<chr> <chr> <chr> <chr>
1 Afghanistan 19 99 745/19987071
2 Afghanistan 20 00 2666/20595360
3 Brazil 19 99 37737/172006362
4 Brazil 20 00 80488/174504898
5 China 19 99 212258/1272915272
6 China 20 00 213766/1280428583
unite()
will look like the following:separate_wider_delim(data, cols, delim, ..., names = NULL, names_sep = NULL, names_repair = "check unique", too_few, too_many, cols_remove = TRUE)
: Separate each cell in a column into several columns. Also extract()
.
table3
looks like the following:# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
separate_wider_delim()
will look like the following:# A tibble: 6 × 4
country year cases pop
<chr> <dbl> <chr> <chr>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
separate_longer_delim(data, cols, delim, .., width, keep_empty)
: Separate each cell in a column into several rows.
table3
looks like the following:# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
separate_longer_delim()
will look like the following:# A tibble: 12 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745
2 Afghanistan 1999 19987071
3 Afghanistan 2000 2666
4 Afghanistan 2000 20595360
5 Brazil 1999 37737
6 Brazil 1999 172006362
7 Brazil 2000 80488
8 Brazil 2000 174504898
9 China 1999 212258
10 China 1999 1272915272
11 China 2000 213766
12 China 2000 1280428583
Create new combinations of variables or identify implicit missing values (combinations of variables not present in the data).
expand(data, ...)
: Create a new tibble with all possible combinations of the values of the variables listed in … Drop other variables.
complete(data, ..., fill = list())
: Add missing possible combinations of values of variables listed in … Fill remaining variables with NA.
Drop or replace explicit missing values (NA
).
drop_na(data, ...)
: Drop rows containing NA
s in … columns.
fill(data, ..., .direction = "down")
: Fill in NA
s in … columns using the next or previous value.
replace_na(data, replace)
: Specify a value to replace NA
in selected columns.
A nested data frame stores individual tables as a list-column of data frames within a larger organizing data frame. List-columns can also be lists of vectors or lists of varying data types. Use a nested data frame to:
map()
, map2()
, or pmap()
or with dplyr rowwise()
grouping.nest(data, ...)
: Moves groups of cells into a list-column of a data frame. Use alone or with dplyr::group_by()
.Group the data frame with group_by()
and use nest()
to move the groups into a list-column.
Use nest(new_col = c(x,y))
to specify the columns to group using dplyr::select()
syntax.
Index list-columns with [[]]
.
tibble::tribble(...)
: Makes list-columns when needed.
tibble::tibble(...)
: Saves list input as list-columns.
tibble::enframe(x, name = "name", value = "value")
: Convert multi-level list to a tibble with list-cols.
dplyr::mutate()
, transmute()
, and summarise()
will output list-columns if they return a list.
unnest(data, cols, ..., keep_empty = FALSE)
: Flatten nested columns back to regular columns. The inverse of nest()
.
unnest_longer(data, col, values_to = NULL, indices_to = NULL)
: Turn each element of a list-column into a row.
unnest_wider(data, col)
: Turn each element of a list-column into a regular column.
hoist(.data, .col, ..., remove = TRUE)
: Selectively pull list components out into their own top-level columns. Uses purrr::pluck()
syntax for selecting from lists.
A vectorized function takes a vector, transforms each element in parallel, and returns a vector of the same length. By themselves vectorized functions cannot work with lists, such as list-columns.
dplyr::rowwise(.data, ...)
: Group data so that each row is one group, and within the groups, elements of list-columns appear directly (accessed with [[
), not as lists of length one. When you use rowwise(), dplyr functions will seem to apply functions to list-columns in a vectorized fashion.
Apply a function to a list-column and create a new list-column. In this example, dim()
returns two values per row and so is wrapped with list()
to tell mutate()
to create a list-column.
Apply a function to a list-column and create a regular column. In this example, nrow()
returns one integer per row.
Collapse multiple list-columns into a single list-column. In this example, append()
returns a list for each row, so col type must be list.
Apply a function to multiple list-columns. In this example, length()
returns one integer per row.
See purrr package for more list functions.
CC BY SA Posit Software, PBC • info@posit.co • posit.co
Learn more at tidyr.tidyverse.org.
Updated: 2024-05.