Table Transformer: obtain a summary stats table for numeric columns
Source:R/table_transformers.R
tt_summary_stats.Rd
With any table object, you can produce a summary table that is scoped to the
numeric column values. The output summary table will have a leading column
called ".param."
with labels for each of the nine rows, each corresponding
to the following summary statistics:
Minimum (
"min"
)5th Percentile (
"p05"
)1st Quartile (
"q_1"
)Median (
"med"
)3rd Quartile (
"q_3"
)95th Percentile (
"p95"
)Maximum (
"max"
)Interquartile Range (
"iqr"
)Range (
"range"
)
Only numerical data from the input table will generate columns in the output table. Column names from the input will be used in the output, preserving order as well.
Examples
Get summary statistics for the game_revenue
dataset that is included in the
pointblank package.
tt_summary_stats(tbl = game_revenue)
#> # A tibble: 9 x 3
#> .param. item_revenue session_duration
#> <chr> <dbl> <dbl>
#> 1 min 0 3.2
#> 2 p05 0.02 8.2
#> 3 q_1 0.09 18.5
#> 4 med 0.38 26.5
#> 5 q_3 1.25 33.8
#> 6 p95 22.0 39.5
#> 7 max 143. 41
#> 8 iqr 1.16 15.3
#> 9 range 143. 37.8
Table transformers work great in conjunction with validation functions. Let's
ensure that the maximum revenue for individual purchases in the
game_revenue
table is less than $150.
tt_summary_stats(tbl = game_revenue) %>%
col_vals_lt(
columns = item_revenue,
value = 150,
segments = .param. ~ "max"
)
#> # A tibble: 9 x 3
#> .param. item_revenue session_duration
#> <chr> <dbl> <dbl>
#> 1 min 0 3.2
#> 2 p05 0.02 8.2
#> 3 q_1 0.09 18.5
#> 4 med 0.38 26.5
#> 5 q_3 1.25 33.8
#> 6 p95 22.0 39.5
#> 7 max 143. 41
#> 8 iqr 1.16 15.3
#> 9 range 143. 37.8
We see data, and not an error, so the validation was successful!
Let's do another: for in-app purchases in the game_revenue
table, check
that the median revenue is somewhere between $8 and $12.
game_revenue %>%
dplyr::filter(item_type == "iap") %>%
tt_summary_stats() %>%
col_vals_between(
columns = item_revenue,
left = 8, right = 12,
segments = .param. ~ "med"
)
#> # A tibble: 9 x 3
#> .param. item_revenue session_duration
#> <chr> <dbl> <dbl>
#> 1 min 0.4 3.2
#> 2 p05 1.39 5.99
#> 3 q_1 4.49 14.0
#> 4 med 10.5 22.6
#> 5 q_3 20.3 30.6
#> 6 p95 66.0 38.8
#> 7 max 143. 41
#> 8 iqr 15.8 16.7
#> 9 range 143. 37.8
We can get more creative with this transformer. Why not use a transformed
table in a validation plan? While performing validations of the
game_revenue
table with an agent we can include the same revenue check as
above by using tt_summary_stats()
in the preconditions
argument. This
transforms the target table into a summary table for the validation step. The
final step of the transformation in preconditions
is a dplyr::filter()
step that isolates the row of the median statistic.
agent <-
create_agent(
tbl = game_revenue,
tbl_name = "game_revenue",
label = "`tt_summary_stats()` example.",
actions = action_levels(
warn_at = 0.10,
stop_at = 0.25,
notify_at = 0.35
)
) %>%
rows_complete() %>%
rows_distinct() %>%
col_vals_between(
columns = item_revenue,
left = 8, right = 12,
preconditions = ~ . %>%
dplyr::filter(item_type == "iap") %>%
tt_summary_stats() %>%
dplyr::filter(.param. == "med")
) %>%
interrogate()
Printing the agent
in the console shows the validation report in the
Viewer. Here is an excerpt of validation report. Take note of the final step
(STEP 3
) as it shows the entry that corresponds to the col_vals_between()
validation step that uses the summary stats table as its target.
See also
Other Table Transformers:
get_tt_param()
,
tt_string_info()
,
tt_tbl_colnames()
,
tt_tbl_dims()
,
tt_time_shift()
,
tt_time_slice()