Skip to contents

The VALID-IV: Data Tests for Conditionals workflow is probably not much of a workflow really. But maybe you consider programming and control flow a sort of workflow. If that’s the case and you are programming with data, the functions of this workflow might be useful for that. A set of test_*() functions, with the same arguments as the corresponding expect_*() functions of the VALID-III workflow, are used with data tables and the result of each call returns a single logical value (TRUE or FALSE). Here’s the complete list of functions with a phrase for what each function tests:

Exactly Like the expect_*() Functions Except You Get a TRUE or FALSE

The interface of each test_*() function is an exact match to the expect_*() counterpart. If you haven’t used either of those but have used the standard validation functions, here’s a quick rundown.

The following arguments from the validation functions (e.g., col_vals_in_set() and many more) have been removed in the corresponding test_*() functions:

  • actions
  • step_id
  • label
  • brief
  • active

Instead of actions we do get the threshold argument as a simplified replacement. What’s supplied here is a single failure threshold value. By default this is set to 1 meaning that a single test that fails will result in an overall failure and the return of FALSE (otherwise, TRUE).

The rules for threshold setting (in action_levels(), warn_on_fail(), and stop_on_fail()) will be explained in some detail here. Whole numbers beyond 1 indicate that any failing units up to that absolute threshold value will result in a TRUE. Likewise, fractional values (between 0 and 1) act as a proportional failure threshold, where 0.25 means that 25% or more failing test units results in a FALSE.

We can use the preconditions argument in cases where we’d like to transform the input data before evaluation of the test. If you would like to do things to the input table like summarize it, perform filtering, mutate one or more columns, perform table joins, etc., then this is a good way to go about that.

Here’s Several Examples Quick Snap

Let’s have some examples before leaving this article. They will all use our small_table:

small_table
## # A tibble: 13 × 8
##    date_time           date           a b             c      d e     f    
##    <dttm>              <date>     <int> <chr>     <dbl>  <dbl> <lgl> <chr>
##  1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3  3423. TRUE  high 
##  2 2016-01-04 00:32:00 2016-01-04     3 5-egh-163     8 10000. TRUE  low  
##  3 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3  2343. TRUE  high 
##  4 2016-01-06 17:23:00 2016-01-06     2 5-jdo-903    NA  3892. FALSE mid  
##  5 2016-01-09 12:36:00 2016-01-09     8 3-ldm-038     7   284. TRUE  low  
##  6 2016-01-11 06:15:00 2016-01-11     4 2-dhe-923     4  3291. TRUE  mid  
##  7 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3   843. TRUE  high 
##  8 2016-01-17 11:27:00 2016-01-17     4 5-boe-639     2  1036. FALSE low  
##  9 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high 
## 10 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high 
## 11 2016-01-26 20:07:00 2016-01-26     4 2-dmx-010     7   834. TRUE  low  
## 12 2016-01-28 02:51:00 2016-01-28     2 7-dmx-010     8   108. FALSE low  
## 13 2016-01-30 11:23:00 2016-01-30     1 3-dka-303    NA  2230. TRUE  high

If you’d like to test your pointblank validation skill, guess whether each of these is TRUE or FALSE before hovering over the line of code.









And there you have it. A nice set of examples revealing their truthy/falsy nature only ::after closer inspection.