1 Access a database

1.1 Connect to a database

The simpliest way to connect to a database. More complex examples will be examined later in the class.

  1. Click on the Connections tab

  2. Click on the New Connection button

  1. Select Postgres Dev

  1. Click OK

1.2 Explore the database using the RStudio IDE

Becoming familiar with the new interface for databases inside the RStudio IDE

  1. Expand the datawarehouse schema

  2. Expand the airport table

  3. Click on the table icon to the right of the airport table

  4. (Optional) Expand and explore the other tables

  5. Click on the disconnect icon to close the connection

1.3 List drivers and DSNs

Learn how to use the odbc package to get DB info from your machine

  1. To get a list of drivers available in the server
library(odbc)

odbcListDrivers()[1:2]
##             name attribute
## 1 AmazonRedshift    Driver
## 2           Hive    Driver
## 3         Impala    Driver
## 4         Oracle    Driver
## 5     PostgreSQL    Driver
## 6     Salesforce    Driver
## 7      SQLServer    Driver
## 8       Teradata    Driver
  1. Click on the ellipsis button located in the Files tab

  1. Type: /etc

  1. Locate and open the odbcinst.ini file

  2. To see a list of DSNs available in the server

odbcListDataSources()
##            name description
## 1  Postgres Dev  PostgreSQL
## 2 Postgres Prod  PostgreSQL
  1. Using the ellipsis button again, navigate to /etc/odbc.ini

1.4 Connect to a database using code

Use the odbc package along with DBI to open a connection to a database

  1. Run the following code to connect
library(DBI)
con <- dbConnect(odbc::odbc(), "Postgres Dev")
  1. Use dbListTables() to retrieve a list of tables
dbListTables(con)
## [1] "airport"     "carrier"     "flight"      "flightscore" "vflight"
  1. Use dbGetQuery() to run a quick query
odbc::dbGetQuery(con, "SELECT * FROM datawarehouse.airport LIMIT 10")
##    airport                 airportname          city state country
## 1      ABE Lehigh Valley International     Allentown    PA     USA
## 2      ABI            Abilene Regional       Abilene    TX     USA
## 3      ABQ   Albuquerque International   Albuquerque    NM     USA
## 4      ABY  Southwest Georgia Regional        Albany    GA     USA
## 5      ACK          Nantucket Memorial     Nantucket    MA     USA
## 6      ACT               Waco Regional          Waco    TX     USA
## 7      ACV                      Arcata Arcata/Eureka    CA     USA
## 8      ACY Atlantic City International Atlantic City    NJ     USA
## 9      ADK                        Adak          Adak    AK     USA
## 10     ADQ                      Kodiak        Kodiak    AK     USA
##         lat       long
## 1  40.65236  -75.44040
## 2  32.41132  -99.68190
## 3  35.04022 -106.60919
## 4  31.53552  -84.19447
## 5  41.25305  -70.06018
## 6  31.61129  -97.23052
## 7  40.97812 -124.10862
## 8  39.45758  -74.57717
## 9  51.87796 -176.64603
## 10 57.74997 -152.49386
  1. Use the SQL chunk
SELECT * FROM datawarehouse.airport LIMIT 10
Table 1.1: Displaying records 1 - 10
airport airportname city state country lat long
ABE Lehigh Valley International Allentown PA USA 40.65236 -75.44040
ABI Abilene Regional Abilene TX USA 32.41132 -99.68190
ABQ Albuquerque International Albuquerque NM USA 35.04022 -106.60919
ABY Southwest Georgia Regional Albany GA USA 31.53552 -84.19447
ACK Nantucket Memorial Nantucket MA USA 41.25305 -70.06018
ACT Waco Regional Waco TX USA 31.61129 -97.23052
ACV Arcata Arcata/Eureka CA USA 40.97812 -124.10862
ACY Atlantic City International Atlantic City NJ USA 39.45758 -74.57717
ADK Adak Adak AK USA 51.87796 -176.64603
ADQ Kodiak Kodiak AK USA 57.74997 -152.49386
  1. Use the output.var option to load results to a variable
SELECT * FROM datawarehouse.airport LIMIT 10
  1. Test the variable
sql_top10
##    airport                 airportname          city state country
## 1      ABE Lehigh Valley International     Allentown    PA     USA
## 2      ABI            Abilene Regional       Abilene    TX     USA
## 3      ABQ   Albuquerque International   Albuquerque    NM     USA
## 4      ABY  Southwest Georgia Regional        Albany    GA     USA
## 5      ACK          Nantucket Memorial     Nantucket    MA     USA
## 6      ACT               Waco Regional          Waco    TX     USA
## 7      ACV                      Arcata Arcata/Eureka    CA     USA
## 8      ACY Atlantic City International Atlantic City    NJ     USA
## 9      ADK                        Adak          Adak    AK     USA
## 10     ADQ                      Kodiak        Kodiak    AK     USA
##         lat       long
## 1  40.65236  -75.44040
## 2  32.41132  -99.68190
## 3  35.04022 -106.60919
## 4  31.53552  -84.19447
## 5  41.25305  -70.06018
## 6  31.61129  -97.23052
## 7  40.97812 -124.10862
## 8  39.45758  -74.57717
## 9  51.87796 -176.64603
## 10 57.74997 -152.49386
  1. Disconnect from the database using dbDisconnect()
dbDisconnect(con)

1.5 RStudio SQL Script

Try out the new SQL Script support in RStudio 1.2

  1. Open the query-example.sql file

  2. Click the Preview button. It is located in the top-right area of the script

  3. In the script, change airport to carrier

  4. Click on Preview again

1.6 Connect to a database without a DSN

A more complex way of connecting to a database, using best practices: http://db.rstudio.com/best-practices/managing-credentials/#prompt-for-credentials

  1. Use the following code to start a new connection that does not use the pre-defined DSN
con <- dbConnect(
  odbc::odbc(),
  Driver = "PostgreSQL",
  Server = "localhost",
  UID    = rstudioapi::askForPassword("Database user"),
  PWD    = rstudioapi::askForPassword("Database password"),
  Port = 5432,
  Database = "postgres"
)
  1. When prompted, type in rstudio_dev for the user, and dev_user as the password

  2. Disconnect from the database using dbDisconnect()

dbDisconnect(con)
## Warning: Connection already closed.

1.7 Secure credentials in a file

Credentials can be saved in a YAML file and then read using the config package: http://db.rstudio.com/best-practices/managing-credentials/#stored-in-a-file-with-config

  1. Open and explore the config.yml file available in your working directory

  2. Load the datawarehouse-dev values to a variable

dw <- config::get("datawarehouse-dev")
  1. Check that the variable loaded propery, by checking the driver value
dw$driver
## [1] "PostgreSQL"
  1. Use info in the config.yml file to connect to the database
con <- dbConnect(odbc::odbc(),
   Driver = dw$driver,
   Server = dw$server,
   UID    = dw$uid,
   PWD    = dw$pwd,
   Port   = dw$port,
   Database = dw$database
)
  1. Disconnect from the database using dbDisconnect()
dbDisconnect(con)

1.8 Environment variables

Use .Renviron file to store credentials

  1. Open and explore the .Renviron file available in your working directory

  2. Confirm that the environment variables are loaded by using Sys.getenv()

Sys.getenv("uid")
## [1] "rstudio_dev"
  1. Pass the credentials using the environment variables
con <- dbConnect(
  odbc::odbc(),
  Driver = "PostgreSQL",
  Server = "localhost",
  UID    = Sys.getenv("uid"),
  PWD    = Sys.getenv("pwd"),
  Port = 5432,
  Database = "postgres"
)
  1. Disconnect from the database using dbDisconnect()
dbDisconnect(con)

1.9 Use options()

Set options() in a separate R script

  1. Open and explore the options.R script available in your working directory

  2. Source the options.R script

source("options.R")
  1. Confirm that the environment variables are loaded by using Sys.getenv()
getOption("database_userid")
## [1] "rstudio_dev"
  1. Pass the credentials using the environment variables
con <- dbConnect(
  odbc::odbc(),
  Driver = "PostgreSQL",
  Server = "localhost",
  UID    = getOption("database_userid"),
  PWD    = getOption("database_password"),
  Port = 5432,
  Database = "postgres"
)
  1. Disconnect from the database using dbDisconnect()
dbDisconnect(con)
## Warning: Connection already closed.