If your target table is in a file, stored either locally or remotely, the
file_tbl()
function can make it possible to access it in a single function
call. Compatible file types for this function are: CSV (.csv
), TSV
(.tsv
), RDA (.rda
), and RDS (.rds
) files. This function generates an
in-memory tbl_df
object, which can be used as a target table for
create_agent()
and create_informant()
. Another great option is supplying
a table-prep formula involving file_tbl()
to tbl_store()
so that you have
access to tables based on flat files though single names via a table store.
In the remote data use case, we can specify a URL starting with http://
,
https://
, etc., and ending with the file containing the data table. If data
files are available in a GitHub repository then we can use the
from_github()
function to specify the name and location of the table data
in a repository.
Arguments
- file
The complete file path leading to a compatible data table either in the user system or at a
http://
,https://
,ftp://
, orftps://
URL. For a file hosted in a GitHub repository, a call to thefrom_github()
function can be used here.- type
The file type. This is normally inferred by file extension and is by default
NULL
to indicate that the extension will dictate the type of file reading that is performed internally. However, if there is no extension (and valid extensions are.csv
,.tsv
,.rda
, and.rds
), we can provide the type as either ofcsv
,tsv
,rda
, orrds
.- ...
Options passed to readr's
read_csv()
orread_tsv()
function. Both functions have the same arguments and one or the other will be used internally based on the file extension or an explicit value given totype
.- keep
In the case of a downloaded file, should it be stored in the working directory (
keep = TRUE
) or should it be downloaded to a temporary directory? By default, this isFALSE
.- verify
If
TRUE
(the default) then a verification of the data object having thedata.frame
class will be carried out.
Examples
Producing tables from CSV files
A local CSV file can be obtained as a tbl object by supplying a path to the
file and some CSV reading options (the ones used by readr::read_csv()
) to
the file_tbl()
function. For this example we could obtain a path to a CSV
file in the pointblank package with system.file()
.
csv_path <-
system.file(
"data_files", "small_table.csv",
package = "pointblank"
)
Then use that path in file_tbl()
with the option to specify the column
types in that CSV.
tbl <-
file_tbl(
file = csv_path,
col_types = "TDdcddlc"
)
tbl
## # A tibble: 13 × 8
## date_time date a b c d e f
## <dttm> <date> <dbl> <chr> <dbl> <dbl> <lgl> <chr>
## 1 2016-01-04 11:00:00 2016-01-04 2 1-bcd-… 3 3423. TRUE high
## 2 2016-01-04 00:32:00 2016-01-04 3 5-egh-… 8 10000. TRUE low
## 3 2016-01-05 13:32:00 2016-01-05 6 8-kdg-… 3 2343. TRUE high
## 4 2016-01-06 17:23:00 2016-01-06 2 5-jdo-… NA 3892. FALSE mid
## 5 2016-01-09 12:36:00 2016-01-09 8 3-ldm-… 7 284. TRUE low
## 6 2016-01-11 06:15:00 2016-01-11 4 2-dhe-… 4 3291. TRUE mid
## 7 2016-01-15 18:46:00 2016-01-15 7 1-knw-… 3 843. TRUE high
## 8 2016-01-17 11:27:00 2016-01-17 4 5-boe-… 2 1036. FALSE low
## 9 2016-01-20 04:30:00 2016-01-20 3 5-bce-… 9 838. FALSE high
## 10 2016-01-20 04:30:00 2016-01-20 3 5-bce-… 9 838. FALSE high
## 11 2016-01-26 20:07:00 2016-01-26 4 2-dmx-… 7 834. TRUE low
## 12 2016-01-28 02:51:00 2016-01-28 2 7-dmx-… 8 108. FALSE low
## 13 2016-01-30 11:23:00 2016-01-30 1 3-dka-… NA 2230. TRUE high
Now that we have a `tbl` object that is a tibble it could be introduced to
create_agent()
for validation.
agent <- create_agent(tbl = tbl)
A different strategy is to provide the data-reading function call directly to
create_agent()
:
agent <-
create_agent(
tbl = ~ file_tbl(
file = system.file(
"data_files", "small_table.csv",
package = "pointblank"
),
col_types = "TDdcddlc"
)
) %>%
col_vals_gt(columns = a, value = 0)
All of the file-reading instructions are encapsulated in the tbl
expression
(with the leading ~
) so the agent will always obtain the most recent
version of the table (and the logic can be translated to YAML, for later
use).
Producing tables from files on GitHub
A CSV can be obtained from a public GitHub repo by using the from_github()
helper function. Let's create an agent a supply a table-prep formula that
gets the same CSV file from the GitHub repository for the pointblank package.
agent <-
create_agent(
tbl = ~ file_tbl(
file = from_github(
file = "inst/data_files/small_table.csv",
repo = "rstudio/pointblank"
),
col_types = "TDdcddlc"
),
tbl_name = "small_table",
label = "`file_tbl()` example.",
) %>%
col_vals_gt(columns = a, value = 0) %>%
interrogate()
agent
This interrogated the data that was obtained from the remote source file, and, there's nothing to clean up (by default, the downloaded file goes into a system temp directory).
File access, table creation, and prep via the table store
Using table-prep formulas in a centralized table store can make it easier to
work with tables from disparate sources. Here's how to generate a table store
with two named entries for table preparations involving the tbl_store()
and
file_tbl()
functions.
store <-
tbl_store(
small_table_file ~ file_tbl(
file = system.file(
"data_files", "small_table.csv",
package = "pointblank"
),
col_types = "TDdcddlc"
),
small_high_file ~ {{ small_table_file }} %>%
dplyr::filter(f == "high")
)
Now it's easy to access either of these tables via tbl_get()
. We can
reference the table in the store by its name (given to the left of the ~
).
tbl_get(tbl = "small_table_file", store = store)
## # A tibble: 13 × 8
## date_time date a b c d e f
## <dttm> <date> <dbl> <chr> <dbl> <dbl> <lgl> <chr>
## 1 2016-01-04 11:00:00 2016-01-04 2 1-bcd-… 3 3423. TRUE high
## 2 2016-01-04 00:32:00 2016-01-04 3 5-egh-… 8 10000. TRUE low
## 3 2016-01-05 13:32:00 2016-01-05 6 8-kdg-… 3 2343. TRUE high
## 4 2016-01-06 17:23:00 2016-01-06 2 5-jdo-… NA 3892. FALSE mid
## 5 2016-01-09 12:36:00 2016-01-09 8 3-ldm-… 7 284. TRUE low
## 6 2016-01-11 06:15:00 2016-01-11 4 2-dhe-… 4 3291. TRUE mid
## 7 2016-01-15 18:46:00 2016-01-15 7 1-knw-… 3 843. TRUE high
## 8 2016-01-17 11:27:00 2016-01-17 4 5-boe-… 2 1036. FALSE low
## 9 2016-01-20 04:30:00 2016-01-20 3 5-bce-… 9 838. FALSE high
## 10 2016-01-20 04:30:00 2016-01-20 3 5-bce-… 9 838. FALSE high
## 11 2016-01-26 20:07:00 2016-01-26 4 2-dmx-… 7 834. TRUE low
## 12 2016-01-28 02:51:00 2016-01-28 2 7-dmx-… 8 108. FALSE low
## 13 2016-01-30 11:23:00 2016-01-30 1 3-dka-… NA 2230. TRUE high
The second table in the table store is a mutated version of the first. It's
just as easily obtainable via tbl_get()
:
tbl_get(tbl = "small_high_file", store = store)
## # A tibble: 6 × 8
## date_time date a b c d e f
## <dttm> <date> <dbl> <chr> <dbl> <dbl> <lgl> <chr>
## 1 2016-01-04 11:00:00 2016-01-04 2 1-bcd-345 3 3423. TRUE high
## 2 2016-01-05 13:32:00 2016-01-05 6 8-kdg-938 3 2343. TRUE high
## 3 2016-01-15 18:46:00 2016-01-15 7 1-knw-093 3 843. TRUE high
## 4 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 838. FALSE high
## 5 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 838. FALSE high
## 6 2016-01-30 11:23:00 2016-01-30 1 3-dka-303 NA 2230. TRUE high
The table-prep formulas in the store
object could also be used in functions
with a tbl
argument (like create_agent()
and create_informant()
). This
is accomplished most easily with the tbl_source()
function.
agent <-
create_agent(
tbl = ~ tbl_source(
tbl = "small_table_file",
store = store
)
)
informant <-
create_informant(
tbl = ~ tbl_source(
tbl = "small_high_file",
store = store
)
)
See also
Other Planning and Prep:
action_levels()
,
create_agent()
,
create_informant()
,
db_tbl()
,
draft_validation()
,
scan_data()
,
tbl_get()
,
tbl_source()
,
tbl_store()
,
validate_rmd()