In an agent-based workflow (i.e., initiating with create_agent()
), after
interrogation with interrogate()
, we can extract the row data that didn't
pass row-based validation steps with the get_data_extracts()
function.
There is one discrete extract per row-based validation step and the amount of
data available in a particular extract depends on both the fraction of test
units that didn't pass the validation step and the level of sampling or
explicit collection from that set of units. These extracts can be collected
programmatically through get_data_extracts()
but they may also be
downloaded as CSV files from the HTML report generated by the agent's print
method or through the use of get_agent_report()
.
The availability of data extracts for each row-based validation step depends
on whether extract_failed
is set to TRUE
within the interrogate()
call
(it is by default). The amount of fail rows extracted depends on the
collection parameters in interrogate()
, and the default behavior is to
collect up to the first 5000 fail rows.
Row-based validation steps are based on those validation functions of the
form col_vals_*()
and also include conjointly()
and rows_distinct()
.
Only functions from that combined set of validation functions can yield data
extracts.
Arguments
- agent
The pointblank agent object
obj:<ptblank_agent>
// requiredA pointblank agent object that is commonly created through the use of the
create_agent()
function. It should have hadinterrogate()
called on it, such that the validation steps were carried out and any sample rows from non-passing validations could potentially be available in the object.- i
A validation step number
scalar<integer>
// default:NULL
(optional
)The validation step number, which is assigned to each validation step by pointblank in the order of definition. If
NULL
(the default), all data extract tables will be provided in a list object.
Examples
Create a series of two validation steps focused on testing row values for
part of the small_table
object. Use interrogate()
right after that.
agent <-
create_agent(
tbl = small_table %>%
dplyr::select(a:f),
label = "`get_data_extracts()`"
) %>%
col_vals_gt(d, value = 1000) %>%
col_vals_between(
columns = c,
left = vars(a), right = vars(d),
na_pass = TRUE
) %>%
interrogate()
Using get_data_extracts()
with its defaults returns of a list of tables,
where each table is named after the validation step that has an extract
available.
agent %>% get_data_extracts()
## $`1`
## # A tibble: 6 × 6
## a b c d e f
## <int> <chr> <dbl> <dbl> <lgl> <chr>
## 1 8 3-ldm-038 7 284. TRUE low
## 2 7 1-knw-093 3 843. TRUE high
## 3 3 5-bce-642 9 838. FALSE high
## 4 3 5-bce-642 9 838. FALSE high
## 5 4 2-dmx-010 7 834. TRUE low
## 6 2 7-dmx-010 8 108. FALSE low
##
## $`2`
## # A tibble: 4 × 6
## a b c d e f
## <int> <chr> <dbl> <dbl> <lgl> <chr>
## 1 6 8-kdg-938 3 2343. TRUE high
## 2 8 3-ldm-038 7 284. TRUE low
## 3 7 1-knw-093 3 843. TRUE high
## 4 4 5-boe-639 2 1036. FALSE low
We can get an extract for a specific step by specifying it in the i
argument. Let's get the failing rows from the first validation step (the
col_vals_gt()
one).
agent %>% get_data_extracts(i = 1)
## # A tibble: 6 × 6
## a b c d e f
## <int> <chr> <dbl> <dbl> <lgl> <chr>
## 1 8 3-ldm-038 7 284. TRUE low
## 2 7 1-knw-093 3 843. TRUE high
## 3 3 5-bce-642 9 838. FALSE high
## 4 3 5-bce-642 9 838. FALSE high
## 5 4 2-dmx-010 7 834. TRUE low
## 6 2 7-dmx-010 8 108. FALSE low
See also
Other Post-interrogation:
all_passed()
,
get_agent_x_list()
,
get_sundered_data()
,
write_testthat_file()