Skip to contents

Planning and Prep

Should you need to understand your data a bit more, use the scan_data() function. It provides a comprehensive report on the data you might be planning on validating. If going down the road of preparing a data quality analysis, call on an agent to carry out that eventual plan with create_agent(). The agent here is to be told which table is the target, and, we devise a stepwise validation plan with validation functions. If collecting and publishing table information is your goal, then begin with the create_informant() function.

Thoroughly scan a table to better understand it
Create a pointblank agent object
Create a pointblank informant object
Perform pointblank validation testing within R Markdown documents
action_levels() warn_on_fail() stop_on_fail()
Set action levels: failure thresholds and functions to invoke
Get a table from a database
Get a table from a local or remote file
Define a store of tables with table-prep formulas: a table store
Obtain a table-prep formula from a table store
Obtain a materialized table via a table store
Draft a starter pointblank validation .R/.Rmd file with a data table

Validation, Expectation, and Test Functions

Validation functions are either used with an agent object or, more simply, just with the table of interest. When used with an agent, each validation works to build up a validation plan (which is executed with the interrogate() function). If one or more validation functions are used directly on data (that is, no agent is involved whatsoever), then the data is checked and passed through if there are no problems. We can fine tune the warn_* and/or stop_* thresholds and so that if the level of failed validation units exceeds those set levels, then we’ll get either a warning or an error. Each validation function is associated with an expectation function (of the form expect_*()). These expectation functions are equivalent in usage and behavior to those in the testthat package. Finally, each validation function has an associated test function (of the form test_*()) that always returns a logical value.

col_vals_lt() expect_col_vals_lt() test_col_vals_lt()
Are column data less than a fixed value or data in another column?
col_vals_lte() expect_col_vals_lte() test_col_vals_lte()
Are column data less than or equal to a fixed value or data in another column?
col_vals_equal() expect_col_vals_equal() test_col_vals_equal()
Are column data equal to a fixed value or data in another column?
col_vals_not_equal() expect_col_vals_not_equal() test_col_vals_not_equal()
Are column data not equal to a fixed value or data in another column?
col_vals_gte() expect_col_vals_gte() test_col_vals_gte()
Are column data greater than or equal to a fixed value or data in another column?
col_vals_gt() expect_col_vals_gt() test_col_vals_gt()
Are column data greater than a fixed value or data in another column?
col_vals_between() expect_col_vals_between() test_col_vals_between()
Do column data lie between two specified values or data in other columns?
col_vals_not_between() expect_col_vals_not_between() test_col_vals_not_between()
Do column data lie outside of two specified values or data in other columns?
col_vals_in_set() expect_col_vals_in_set() test_col_vals_in_set()
Are column data part of a specified set of values?
col_vals_not_in_set() expect_col_vals_not_in_set() test_col_vals_not_in_set()
Are data not part of a specified set of values?
col_vals_make_set() expect_col_vals_make_set() test_col_vals_make_set()
Is a set of values entirely accounted for in a column of values?
col_vals_make_subset() expect_col_vals_make_subset() test_col_vals_make_subset()
Is a set of values a subset of a column of values?
col_vals_increasing() expect_col_vals_increasing() test_col_vals_increasing()
Are column data increasing by row?
col_vals_decreasing() expect_col_vals_decreasing() test_col_vals_decreasing()
Are column data decreasing by row?
col_vals_null() expect_col_vals_null() test_col_vals_null()
Are column data NULL/NA?
col_vals_not_null() expect_col_vals_not_null() test_col_vals_not_null()
Are column data not NULL/NA?
col_vals_regex() expect_col_vals_regex() test_col_vals_regex()
Do strings in column data match a regex pattern?
col_vals_within_spec() expect_col_vals_within_spec() test_col_vals_within_spec()
Do values in column data fit within a specification?
col_vals_expr() expect_col_vals_expr() test_col_vals_expr()
Do column data agree with a predicate expression?
rows_distinct() expect_rows_distinct() test_rows_distinct()
Are row data distinct?
rows_complete() expect_rows_complete() test_rows_complete()
Are row data complete?
col_is_character() expect_col_is_character() test_col_is_character()
Do the columns contain character/string data?
col_is_numeric() expect_col_is_numeric() test_col_is_numeric()
Do the columns contain numeric values?
col_is_integer() expect_col_is_integer() test_col_is_integer()
Do the columns contain integer values?
col_is_logical() expect_col_is_logical() test_col_is_logical()
Do the columns contain logical values?
col_is_date() expect_col_is_date() test_col_is_date()
Do the columns contain R Date objects?
col_is_posix() expect_col_is_posix() test_col_is_posix()
Do the columns contain POSIXct dates?
col_is_factor() expect_col_is_factor() test_col_is_factor()
Do the columns contain R factor objects?
col_exists() expect_col_exists() test_col_exists()
Do one or more columns actually exist?
col_schema_match() expect_col_schema_match() test_col_schema_match()
Do columns in the table (and their types) match a predefined schema?
row_count_match() expect_row_count_match() test_row_count_match()
Does the row count match that of a different table?
col_count_match() expect_col_count_match() test_col_count_match()
Does the column count match that of a different table?
tbl_match() expect_tbl_match() test_tbl_match()
Does the target table match a comparison table?
conjointly() expect_conjointly() test_conjointly()
Perform multiple rowwise validations for joint validity
serially() expect_serially() test_serially()
Run several tests and a final validation in a serial manner
specially() expect_specially() test_specially()
Perform a specialized validation with a user-defined function

Information Functions

We can progressively add information to an informant object by using the collection of info_*() functions. We can add more table-based properties with the info_tabular() function, details about the nature of each column with the info_columns() function, and add sections of our own choosing (and the info that make sense for those sections) with the info_section() function. Snippets of information can be gleaned from the target table by using the info_snippet() function. These bits of information can be incorporated in text defined by the other info_*() functions via { }. Some great snip_*() functions are included to make info-snipping as easy (and useful!) as can be.

Add information that focuses on aspects of the data table as a whole
Add information that focuses on aspects of a data table's columns
Add column information from another data table
Add information that focuses on some key aspect of the data table
Generate a useful text 'snippet' from the target table
A fn for info_snippet(): get a list of column categories
A fn for info_snippet(): get an inline statistical summary
A fn for info_snippet(): get the lowest value from a column
A fn for info_snippet(): get the highest value from a column


Sometimes we want to email a report of a validation because of the importance of the information contained therein. The email_blast() function can be used within the end_fns argument of create_agent(), giving us options to send a customizable message only if specified conditions are met.

Conditionally send email during interrogation
Create an email object from a pointblank agent
Provide simple email message body components: body
Provide simple email message body components: footer


Logging validation failure conditions makes for a good practice during data quality analysis. The log4r_step() function allows for simple generation of log entries and specification of logging destinations.

Enable logging of failure conditions at the validation step level

Agent: Interrogate and Report

If we have an agent object that has a plan (i.e., validation steps), the interrogate() function instructs the agent to interrogate the target table. The agent will go to work and also perform specified side-effect functions at the step level and upon completion (if those are functions are defined). After interrogation, we can get a report through printing, however, we can take advantage of more options by using the get_agent_report() function.

Given an agent that has a validation plan, perform an interrogation
Get a summary report from an agent

Informant: Incorporate and Report

If we have an informant object that has been loaded with information from using the info_*() functions, the incorporate() function works to regenerate snippets and integrate those into the info text. After refreshing the table information, we can get an information report through printing, or, by using get_informant_report() function.

Given an informant object, update and incorporate table snippets
Get a table information report from an informant object


The agent always has a special list called an x-list. Access that by invoking the get_agent_x_list() and you’ll then have a smorgasbord of information about how the validation went down. Table extracts are collected by default for failed rows (up to a limit) and we can access those with get_data_extracts(). Table rows can be sundered into ‘pass’ and ‘fail’ pieces. Access those table fragments with the get_sundered_data() function. Want to know if all validation steps have passed with flying colors? Sometimes that could happen; use the all_passed() function to find out.

Get the agent's x-list
Collect data extracts from a validation step
Sunder the data, splitting it into 'pass' and 'fail' pieces
Did all of the validations fully pass?
Transform a pointblank agent to a testthat test file

Object Ops

We have options for writing an agent or informant to disk with the x_write_disk() function. The on-disk object can be retrieved with the x_read_disk() function. You can export a report as an HTML file with export_report(). A set of functions are also available for setting a data table to an existing object, and, for editing an agent’s validation steps.

Write an agent, informant, multiagent, or table scan to disk
Read an agent, informant, multiagent, or table scan from disk
Export an agent, informant, multiagent, or table scan to HTML
Set a data table to an agent or an informant
Activate one or more of an agent's validation steps
Deactivate one or more of an agent's validation steps
Remove one or more of an agent's validation steps

The Multiagent

The multiagent is a group of agents, all bundled together into a single object. With this grouping, we can generate unified reporting across the component agents with the get_multiagent_report() function. One style of reporting ("long") provides a serial listing of agent reports. The other option ("wide") is useful for tracking the evolution of data quality checks over time since common steps across all interrogations will form individual rows (and each interrogation will form a column).

Create a pointblank multiagent object
Read pointblank agents stored on disk as a multiagent
Get a summary report using multiple agents

pointblank YAML

YAML files can be used in pointblank for two distinct purposes: (1) to define agents and their validation plans, and (2) to define information for tables. The yaml_write() function allows us write agent or informant YAML from the namesake objects. We can read them back from disk by using the yaml_read_agent() and yaml_read_informant() functions. As a nice shortcut, we read agent YAML and interrogate immediately with yaml_agent_interrogate(); in a similar manner, we can read informant YAML and incorporate table information with yaml_informant_incorporate().

Write pointblank objects to YAML files
Read a pointblank YAML file to create an agent object
Read a pointblank YAML file to create an informant object
Get an agent from pointblank YAML and interrogate()
Display pointblank YAML using an agent or a YAML file
Display validation expressions using pointblank YAML
Get an informant from pointblank YAML and incorporate()
Execute all agent and informant YAML tasks

Table Transformers

Table Transformer functions can radically transform a data table and either provide a wholly different table (like a summary table or table properties table) or do some useful filtering in a single step. This can be useful for preparing the target table for validation, creating a temporary table for a few validation steps, or even as something used outside of the pointblank workflows. As a nice bonus these transformer functions will work equally well with data frames, database tables, and Spark tables.

Table Transformer: obtain a summary stats table for numeric columns
Table Transformer: obtain a summary table for string columns
Table Transformer: get the dimensions of a table
Table Transformer: get a table's column names
Table Transformer: shift the times of a table
Table Transformer: slice a table with a slice point on a time column
Get a parameter value from a summary table

Utility and Helper Functions

Generate a table column schema manually or with a reference table
Determine if one or more columns exist in a table
Put the current date into a file name
Put the current datetime into a file name
A specialized version of stopifnot() for pointblank: stop_if_not()
Specify a file for download from GitHub


A small table that is useful for testing
An SQLite version of the small_table dataset
A table containing data pertaining to various specifications
A table with game revenue data
A table with metadata for the game_revenue dataset