Learning Objectives

Data on the Web

R Package API Wrappers

Use API’s directly with httr2

Authentication

Basic Authentication

  • “Basic” authentication, where you just provide a username and password. The basic syntax for this is:

    req_auth_basic(req, username, password)
  • You should already have a username and password set up.

  • I would suggest keeping your password secure using your .Renviron file or via {keyring} (see below).

API keys

  • An API key is a string that you add to your request.

  • To obtain a free key from OMDB and access it in R:

    1. Sign up for a free key: https://www.omdbapi.com/apikey.aspx

    2. Open up your .Renviron file using the usethis package.

      library(usethis)
      edit_r_environ()
    3. Add the key OMDB sent you by email to the .Renviron package. You can call it OMDB_API_KEY, for example. In which case you would write the following in .Renviron:

      OMDB_API_KEY = <your-private-key>

      Where “<your-private-key>” is the private key OMDB sent you by email.

    4. Restart R.

    5. You can now always access your private key by

      movie_key <- Sys.getenv("OMDB_API_KEY")
    6. You typically put your API key as a query parameter via req_url_query() (see below)

  • It is important to never save or display your private key in a file you could share. You are responsible for all behavior associated with your key. That is why we saved it to our .Renviron and are only accessing it secretly through Sys.getenv().

  • It is still not great that your key is in a plain text environment. You can add a layer of security by using the keyring package: https://github.com/r-lib/keyring

    library(keyring)
    key_set("OMDB_API_KEY_SECURE") ## do this once to save the key
    movie_key_2 <- key_get("OMDB_API_KEY_SECURE") ## do this each time you need the key

    A person with access to your computer who knows R and the keyring package could still get to your key. But it is more secure than placing your key in a plain text file (which is what .Renviron is). There are more secure ways to access keys in R.

OAuth:

  • See OAuth from {httr2} for details.

  • OAuth (“Open Authorization”) is an open standard for authorization that allows users to securely access resources without giving away their login credentials.

  • The idea is that your software asks the user if it can use the user’s authorization to access the API.

    • It does this each time it needs to access the API.
    • This is commonly used in big API’s, like that of Google, Twitter, or Facebook.
  • As an example, we will consider the GitHub API.

  1. “Register an application” with the API provider before you can access the API.

    • You do this by creating a developer account on the API’s website, then registering a new OAuth app.

    • You won’t actually have an app, but API developers use this word for any means where you ask to use a user’s authorization.

    • This typically involves providing a name and a callback URL (typically http://localhost:1410) for your “application”.

    • For GitHub:

  2. The provider will then give you a client_id and a client_secret that you will need to use.

    • Neither of these need to be protected like a password since the user will provide their own password/username for authentication.

    • For GitHub:

  3. Obtain a “token URL”, which is sometimes called an “access URL”.

  4. Use oauth_client() to create a client. You feed in the client_id, the client_secret, the toeken_url, and any name you choose into it.

    client <- oauth_client(
      id = "client_id",
      secret = "client_secret",
      token_url = "token_url",
      name = "personal_app_name"
    )
    • For GitHub:

      client <- oauth_client(
        id = "933ffc6f53e466c58aa1",
        secret = "aa02ef46f93aa51a360f23f30f7640b445118e7f",
        token_url = "https://github.com/login/oauth/access_token",
        name = "gitapp"
      )
  5. You get an “authorization URL”.

  6. Feed the authorization URL and the client into req_oauth_auth_code() during a request.

    request("https://api.github.com/user") |>
      req_oauth_auth_code(client = client, auth_url = auth_url) |>
      req_perform() ->
      gout
    gout
  • There are sometimes other “flows” for OAuth that require different steps. See here for details: https://httr2.r-lib.org/articles/oauth.html

  • When using OAuth in a package, folks often (i) “cache” the token securely (because the generated token should be kept private), and (ii) ask folks to generate their own app.

  • Caching is easy. Just set cache_disk = TRUE in req_oauth_auth_code().

    • Note that this creates some security risks since the token will be saved on the disk.
    • So you should inform the user if you do this.
  • When you ask folks to generate their own app, then you should have client_id and client_secret as arguments that the user can provide.

httr2

URL Path

  • Every API has a base URL that you modify.

  • Some API’s only modify the URL path to obtain the endpoint.

  • Consider the Wizard World API

  • The documentation says that the base URL is “https://wizard-world-api.herokuapp.com”.

    baseurl <- "https://wizard-world-api.herokuapp.com"
  • Let’s start a request with this baseurl via request().

    wizreq <- request(baseurl)
  • The documentation just says that we modify this URL to obtain the different objects. - E.g., to obtain a list of all elixirs that occur in Harry Potter, we just add “Elixirs” at the end.

    • We can do this to our request with req_url_path_append()
    wizreq <- req_url_path_append(wizreq, "Elixirs")
    wizreq
    ## <httr2_request>
    ## GET https://wizard-world-api.herokuapp.com/Elixirs
    ## Body: empty
  • Let’s look at the http request

    req_dry_run(wizreq)
    ## GET /Elixirs HTTP/1.1
    ## Host: wizard-world-api.herokuapp.com
    ## User-Agent: httr2/0.2.3 r-curl/5.1.0 libcurl/7.81.0
    ## Accept: */*
    ## Accept-Encoding: deflate, gzip, br, zstd
  • We then implement this request via req_perform().

    eout <- req_perform(wizreq)
    eout
    ## <httr2_response>
    ## GET https://wizard-world-api.herokuapp.com/Elixirs
    ## Status: 200 OK
    ## Content-Type: application/json
    ## Body: In memory (62284 bytes)
  • We would then clean this output (see rectangling below). But as a preview, we would do

    tibble(elixir = resp_body_json(resp = eout)) |>
      unnest_wider(col = elixir)
    ## # A tibble: 145 × 10
    ##    id      name  effect sideEffects characteristics time  difficulty ingredients
    ##    <chr>   <chr> <chr>  <chr>       <chr>           <chr> <chr>      <list>     
    ##  1 0106fb… Ferg… Treat… Potential … <NA>            <NA>  Unknown    <list [3]> 
    ##  2 021b40… Mane… Rapid… <NA>        <NA>            <NA>  Unknown    <NULL>     
    ##  3 024f56… Poti… <NA>   <NA>        <NA>            <NA>  Unknown    <NULL>     
    ##  4 06beea… Rudi… Helps… <NA>        <NA>            <NA>  Unknown    <list [2]> 
    ##  5 078b53… Lung… Most … <NA>        <NA>            <NA>  Unknown    <NULL>     
    ##  6 07944d… Esse… Menta… <NA>        Green in colour <NA>  Advanced   <list [2]> 
    ##  7 0dd8d2… Anti… Cures… <NA>        Green in colour <NA>  Moderate   <list [4]> 
    ##  8 0e2240… Rest… Rever… <NA>        Purple coloured <NA>  Unknown    <NULL>     
    ##  9 0e7228… Skel… Resto… <NA>        Smokes when po… <NA>  Moderate   <list [6]> 
    ## 10 0e7f89… Chee… <NA>   <NA>        Yellow in colo… <NA>  Advanced   <list [1]> 
    ## # ℹ 135 more rows
    ## # ℹ 2 more variables: inventors <list>, manufacturer <chr>

Queries

  • Some APIs require you to modify the URL via queries. Queries occur after a question mark and are of the form http://www.url.com/?query1=arg1&query2=ar2&query3=arg3

  • The OMDB API requires you to provide queries. The documentation says

    Send all data requests to: http://www.omdbapi.com/?apikey=[yourkey]&

    But that documentation already has a query as a part of it (apikey=[yourkey]).

  • You add queries via req_url_query().

    • You provide it with name/value paires.
  • The documentation for OMDB has a table called “Parameters”, where they list the possible queries.

  • Let’s fetch information on the film The Lighthouse, obtaining a short plot in json format.

  • This is what the URL looks like without my API key:

    movie_key <- Sys.getenv("OMDB_API_KEY")
    request("http://www.omdbapi.com/") |>
      req_url_query(t = "The Lighthouse",
                    plot = "short",
                    r = "json") ->
      movie_req
    movie_req
    ## <httr2_request>
    ## GET http://www.omdbapi.com/?t=The%20Lighthouse&plot=short&r=json
    ## Body: empty
  • Let’s add our API key and perform the request.

    movie_req |>
      req_url_query(apikey = movie_key) |>
      req_perform() ->
      mout
    mout
    ## <httr2_response>
    ## GET
    ## http://www.omdbapi.com/?t=The%20Lighthouse&plot=short&r=json&apikey=18375155
    ## Status: 200 OK
    ## Content-Type: application/json
    ## Body: In memory (994 bytes)
  • Output is just a list:

    resp_body_json(mout) |>
      str()
    ## List of 25
    ##  $ Title     : chr "The Lighthouse"
    ##  $ Year      : chr "2019"
    ##  $ Rated     : chr "R"
    ##  $ Released  : chr "01 Nov 2019"
    ##  $ Runtime   : chr "109 min"
    ##  $ Genre     : chr "Drama, Fantasy, Horror"
    ##  $ Director  : chr "Robert Eggers"
    ##  $ Writer    : chr "Robert Eggers, Max Eggers"
    ##  $ Actors    : chr "Robert Pattinson, Willem Dafoe, Valeriia Karaman"
    ##  $ Plot      : chr "Two lighthouse keepers try to maintain their sanity while living on a remote and mysterious New England island in the 1890s."
    ##  $ Language  : chr "English"
    ##  $ Country   : chr "United States, Canada"
    ##  $ Awards    : chr "Nominated for 1 Oscar. 33 wins & 134 nominations total"
    ##  $ Poster    : chr "https://m.media-amazon.com/images/M/MV5BZmE0MGJhNmYtOWNjYi00Njc5LWE2YjEtMWMxZTVmODUwMmMxXkEyXkFqcGdeQXVyMTkxNjU"| __truncated__
    ##  $ Ratings   :List of 3
    ##   ..$ :List of 2
    ##   .. ..$ Source: chr "Internet Movie Database"
    ##   .. ..$ Value : chr "7.4/10"
    ##   ..$ :List of 2
    ##   .. ..$ Source: chr "Rotten Tomatoes"
    ##   .. ..$ Value : chr "90%"
    ##   ..$ :List of 2
    ##   .. ..$ Source: chr "Metacritic"
    ##   .. ..$ Value : chr "83/100"
    ##  $ Metascore : chr "83"
    ##  $ imdbRating: chr "7.4"
    ##  $ imdbVotes : chr "239,704"
    ##  $ imdbID    : chr "tt7984734"
    ##  $ Type      : chr "movie"
    ##  $ DVD       : chr "18 Oct 2019"
    ##  $ BoxOffice : chr "$10,867,104"
    ##  $ Production: chr "N/A"
    ##  $ Website   : chr "N/A"
    ##  $ Response  : chr "True"
  • In the API documentation:

     

Headers

  • Headers supply additional options for the return type.

  • Common headers are described by Wikipedia: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

  • You supply headers to a request via req_headers().

  • Consider the icanhazdadjoke API. One option is to include a header to specify plain text returns, rather than JSON returns.

    request("https://icanhazdadjoke.com/") |>
      req_headers(Accept = "text/plain") |>
      req_perform() ->
      dadout
    dadout
    resp_body_string(dadout)
  • This is different than a JSON output

    request("https://icanhazdadjoke.com/") |>
      req_perform() ->
      dadout2
    resp_body_string(dadout2)

Output:

  • Functions that work with the response are all of the form resp_*().

  • The status code describes whether your request was successful.

    • List of codes: https://http.cat/

    • Use resp_status() to get the code for our request:

      resp_status(mout)
      ## [1] 200
  • Headers provide infromation on the request. Use the resp_headers() function to see what headers we got in our request.

    resp_headers(mout)
  • The body contains the data you are probably most interested in. Use the resp_body_*() functions to access the body:

    resp_body_json(mout)
  • Background: The body typically comes in the form of either a JSON or XML data structure.

    • For JSON, you use resp_body_json()
    • For XML you use resp_body_xml()
    • You can see the unparsed output with resp_body_string()
    resp_body_string(mout)
    ## [1] "{\"Title\":\"The Lighthouse\",\"Year\":\"2019\",\"Rated\":\"R\",\"Released\":\"01 Nov 2019\",\"Runtime\":\"109 min\",\"Genre\":\"Drama, Fantasy, Horror\",\"Director\":\"Robert Eggers\",\"Writer\":\"Robert Eggers, Max Eggers\",\"Actors\":\"Robert Pattinson, Willem Dafoe, Valeriia Karaman\",\"Plot\":\"Two lighthouse keepers try to maintain their sanity while living on a remote and mysterious New England island in the 1890s.\",\"Language\":\"English\",\"Country\":\"United States, Canada\",\"Awards\":\"Nominated for 1 Oscar. 33 wins & 134 nominations total\",\"Poster\":\"https://m.media-amazon.com/images/M/MV5BZmE0MGJhNmYtOWNjYi00Njc5LWE2YjEtMWMxZTVmODUwMmMxXkEyXkFqcGdeQXVyMTkxNjUyNQ@@._V1_SX300.jpg\",\"Ratings\":[{\"Source\":\"Internet Movie Database\",\"Value\":\"7.4/10\"},{\"Source\":\"Rotten Tomatoes\",\"Value\":\"90%\"},{\"Source\":\"Metacritic\",\"Value\":\"83/100\"}],\"Metascore\":\"83\",\"imdbRating\":\"7.4\",\"imdbVotes\":\"239,704\",\"imdbID\":\"tt7984734\",\"Type\":\"movie\",\"DVD\":\"18 Oct 2019\",\"BoxOffice\":\"$10,867,104\",\"Production\":\"N/A\",\"Website\":\"N/A\",\"Response\":\"True\"}"
  • resp_header("content-type") will often tell you the type of body output.

    resp_header(mout, "content-type")
    ## [1] "application/json; charset=utf-8"
    resp_header(eout, "content-type")
    ## [1] "application/json; charset=utf-8"

Rectangling

NASA Exercise

Consider the NASA API

  1. Generate an API key and save it as NASA_API_KEY in your R environment file.

  2. Read about the APOD API and obtain all images from January of 2022.

  3. Clean the output into a data frame, like this:

    ## # A tibble: 31 × 8
    ##    copyright      date  explanation hdurl media_type service_version title url  
    ##    <chr>          <chr> <chr>       <chr> <chr>      <chr>           <chr> <chr>
    ##  1 "Soumyadeep M… 2022… very Full … http… image      v1              The … http…
    ##  2 "\nDani Caxet… 2022… Sometimes … http… image      v1              Quad… http…
    ##  3 "\nJan Hatten… 2022… You couldn… http… image      v1              Come… http…
    ##  4  <NA>          2022… What's hap… http… image      v1              Moon… http…
    ##  5 "\nLuca Vanze… 2022… Does the S… http… image      v1              A Ye… http…
    ##  6 "Tamas Ladany… 2022… That's not… http… image      v1              The … http…
    ##  7 "Point Blue C… 2022… A male Ade… http… image      v1              Ecst… http…
    ##  8 "Cheng Luo"    2022… Named for … http… image      v1              Quad… http…
    ##  9  <NA>          2022… What will … http… image      v1              Hubb… http…
    ## 10  <NA>          2022… Why does C… <NA>  video      v1              Come… http…
    ## # ℹ 21 more rows