Skip to contents

In an agent-based workflow (i.e., initiating with create_agent()), after interrogation with interrogate(), we can extract the row data that didn't pass row-based validation steps with the get_data_extracts() function. There is one discrete extract per row-based validation step and the amount of data available in a particular extract depends on both the fraction of test units that didn't pass the validation step and the level of sampling or explicit collection from that set of units. These extracts can be collected programmatically through get_data_extracts() but they may also be downloaded as CSV files from the HTML report generated by the agent's print method or through the use of get_agent_report().

The availability of data extracts for each row-based validation step depends on whether extract_failed is set to TRUE within the interrogate() call (it is by default). The amount of fail rows extracted depends on the collection parameters in interrogate(), and the default behavior is to collect up to the first 5000 fail rows.

Row-based validation steps are based on those validation functions of the form col_vals_*() and also include conjointly() and rows_distinct(). Only functions from that combined set of validation functions can yield data extracts.

Usage

get_data_extracts(agent, i = NULL)

Arguments

agent

The pointblank agent object

obj:<ptblank_agent> // required

A pointblank agent object that is commonly created through the use of the create_agent() function. It should have had interrogate() called on it, such that the validation steps were carried out and any sample rows from non-passing validations could potentially be available in the object.

i

A validation step number

scalar<integer> // default: NULL (optional)

The validation step number, which is assigned to each validation step by pointblank in the order of definition. If NULL (the default), all data extract tables will be provided in a list object.

Value

A list of tables if i is not provided, or, a standalone table if i is given.

Examples

Create a series of two validation steps focused on testing row values for part of the small_table object. Use interrogate() right after that.

agent <-
  create_agent(
    tbl = small_table %>%
      dplyr::select(a:f),
    label = "`get_data_extracts()`"
  ) %>%
  col_vals_gt(d, value = 1000) %>%
  col_vals_between(
    columns = c,
    left = vars(a), right = vars(d),
    na_pass = TRUE
  ) %>%
  interrogate()

Using get_data_extracts() with its defaults returns of a list of tables, where each table is named after the validation step that has an extract available.

## $`1`
## # A tibble: 6 × 6
##       a b             c     d e     f
##   <int> <chr>     <dbl> <dbl> <lgl> <chr>
## 1     8 3-ldm-038     7  284. TRUE  low
## 2     7 1-knw-093     3  843. TRUE  high
## 3     3 5-bce-642     9  838. FALSE high
## 4     3 5-bce-642     9  838. FALSE high
## 5     4 2-dmx-010     7  834. TRUE  low
## 6     2 7-dmx-010     8  108. FALSE low
##
## $`2`
## # A tibble: 4 × 6
##       a b             c     d e     f
##   <int> <chr>     <dbl> <dbl> <lgl> <chr>
## 1     6 8-kdg-938     3 2343. TRUE  high
## 2     8 3-ldm-038     7  284. TRUE  low
## 3     7 1-knw-093     3  843. TRUE  high
## 4     4 5-boe-639     2 1036. FALSE low

We can get an extract for a specific step by specifying it in the i argument. Let's get the failing rows from the first validation step (the col_vals_gt() one).

agent %>% get_data_extracts(i = 1)

## # A tibble: 6 × 6
##       a b             c     d e     f
##   <int> <chr>     <dbl> <dbl> <lgl> <chr>
## 1     8 3-ldm-038     7  284. TRUE  low
## 2     7 1-knw-093     3  843. TRUE  high
## 3     3 5-bce-642     9  838. FALSE high
## 4     3 5-bce-642     9  838. FALSE high
## 5     4 2-dmx-010     7  834. TRUE  low
## 6     2 7-dmx-010     8  108. FALSE low

Function ID

8-2

See also