Using cloudyr for data science

When you use the R programming language for your projects, get the most out of the features for supporting data science from IBM Cloud® Object Storage by using cloudyr.

This tutorial shows you how to integrate data from the IBM Cloud® Platform within your R project. Your project will use IBM Cloud Object Storage for storage with S3-compatible connectivity in your project.

Before you begin

We need to make sure that we have the prerequisites before continuing:

- IBM Cloud Platform account
- An instance of IBM Cloud Object Storage
- `R` installed and configured 
- S3-compatible authentication configuration

Create HMAC credentials

Before we begin, we might need to create a set of HMAC credentials as part of a Service Credential by using the configuration parameter {"HMAC":true} when we create credentials. For example, use the IBM Cloud Object Storage CLI as shown here.

ibmcloud resource service-key-create <key-name-without-spaces> Writer --instance-name "<instance name--use quotes if your instance name has spaces>" --parameters '{"HMAC":true}'

To store the results of the generated key, append the text, > cos_credentials to the end of the command in the example. For the purposes of this tutorial you need to find the cos_hmac_keys heading with child keys, access_key_id, and secret_access_key.

      cos_hmac_keys:
          access_key_id:      7xxxxxxxxxxxxxxa6440da12685eee02
          secret_access_key:  8xxxx8ed850cddbece407xxxxxxxxxxxxxx43r2d2586

While it is best practices to set credentials in environment variables, you can also set your credentials inside your local copy of your R script itself. Environment variables can alternatively be set before you start R using an Renviron.site or .Renviron file, used to set environment variables in R during startup.

You will need to set the actual values for the access_key_id and secret_access_key in your code along with the IBM Cloud Object Storage endpoint for your instance.

Add credentials to your `R` project

As it is beyond the scope of this tutorial, it is assumed you already installed the R language and suite of applications. Before you add any libraries or code to your project, ensure that you have credentials available to connect to IBM Cloud Object Storage. You will need the appropriate region for your bucket and endpoint.

Sys.setenv("AWS_ACCESS_KEY_ID" = "access_key_id",
           "AWS_SECRET_ACCESS_KEY" = "secret_access_key",
           "AWS_S3_ENDPOINT" = "myendpoint",
           "AWS_DEFAULT_REGION" = "")

Add libraries to your `R` project

We used a cloudyr S3-compatible client to test our credentials resulting in listing your buckets. To get additional packages, we use the source code collective known as CRAN that operates through a series of mirrors.

For this example, we use aws.s3 as shown in the example and added to the code to set or access your credentials.

library("aws.s3")
bucketlist()

Use library methods in your `R` project

You can learn a lot from working with sample packages. For example, the package for Cosmic Microwave Background Data Analysis presents a conundrum. The executable of the project for local compiling are small enough to work on one's personal machine, but working with the source data would be constrained due to the size of the data.

When using version 0.3.21 of the package, it is necessary to add region="" in a request to connect to COS.

In addition to PUT, HEAD, and other compatible API commands, we can GET objects as shown with the S3-compatible client we included earlier.

# return object using 'S3 URI' syntax, with progress bar
get_object("s3://mybucketname-only/example.csv", show_progress = TRUE)

Add data to your `R` project

As you can guess, the library discussed earlier has a save_object() method that can write directly to your bucket. While there are many ways to load data, we can use cloudSimplifieR to work with an open data set.

library(cloudSimplifieR)
d <- as.data.frame(csvToDataframe("s3://mybucket/example.csv"))
plot(d)

Next steps

In addition to creating your own projects, you can also use R Studio to analyze data.

Using cloudyr for data science

Before you begin

Create HMAC credentials

Add credentials to your R project

Add libraries to your R project

Use library methods in your R project

Add data to your R project

Next steps

Add credentials to your `R` project

Add libraries to your `R` project

Use library methods in your `R` project

Add data to your `R` project