IBM Cloud Docs
Serverless web app and eventing for data retrieval and analytics

Serverless web app and eventing for data retrieval and analytics

This tutorial may incur costs. Use the Cost Estimator to generate a cost estimate based on your projected usage.

In this tutorial, you create an application to automatically collect GitHub traffic statistics for repositories and provide the foundation for traffic analytics. GitHub only provides access to the traffic data for the last 14 days. If you want to analyze statistics over a longer period of time, you need to download and store that data yourself. In this tutorial, you deploy a serverless app in a IBM Cloud Code Engine project. The app manages the metadata for GitHub repositories and provides access to the statistics for data analytics. The traffic data is collected from GitHub either on-demand in the app or when triggered by Code Engine events, e.g., daily. The app discussed in this tutorial implements a multi-tenant-ready solution with the initial set of features supporting a single-tenant mode.

Architecture Diagram
Architecture diagram of the tutorial

Objectives

  • Deploy a containerized Python database app with multi-tenant support and secured access
  • Integrate App ID as OpenID Connect-based authentication provider
  • Set up automated, serverless collection of GitHub traffic statistics

Before you begin

This tutorial requires:

  • IBM Cloud CLI,
    • IBM Cloud Code Engine plugin,
    • IBM Cloud® Container Registry plugin,
  • a GitHub account.

You can run the sections requiring a shell in the IBM® Cloud Shell.

You will find instructions to download and install these tools for your operating environment in the Getting started with tutorials guide.

Service and Environment Setup (shell)

In this section, you set up the needed services and prepare the environment. All of this can be accomplished from the shell environment (terminal).

  1. If you are not logged in, use ibmcloud login or ibmcloud login --sso to log in interactively.

  2. St the resource group and region by running ibmcloud target command.

    RESOURCE_GROUP_NAME=Default
    REGION=us-south
    ibmcloud target -r $REGION -g $RESOURCE_GROUP_NAME
    
  3. Create a Db2 on Cloud instance with the free (lite) plan and name it ghstatsDB.

    ibmcloud resource service-instance-create ghstatsDB dashdb-for-transactions free $REGION
    
  4. Create an instance of the App ID service. Use ghstatsAppID as name and the Graduated tier plan.

    ibmcloud resource service-instance-create ghstatsAppID appid graduated-tier $REGION
    
  5. Add a new namespace ghstats to the IBM Cloud® Container Registry. You are going to use it for referencing container images. There is one global registry as well as regional registries. Use the global registry.

    ibmcloud cr region-set global
    NAMESPACE=ghstatsYourInitials
    ibmcloud cr namespace-add $NAMESPACE
    

Code Engine preparation (shell)

With the services provisioned and the general setup done, next is to create the Code Engine project, create a container image for the app and to deploy it.

  1. Create a Code Engine project named ghstats. The command automatically sets it as the current Code Engine context.
    ibmcloud ce project create --name ghstats
    
  2. Create a Code Engine build configuration, i.e., set up the project to build the container image for you. It takes the code from the GitHub repository for this tutorial and stores the image in the registry in the previously created namespace using the registered user information.
    ibmcloud ce build create --name ghstats-build --source https://github.com/IBM-Cloud/github-traffic-stats  --context-dir /backend --commit master --image private.icr.io/$NAMESPACE/codeengine-ghstats
    
  3. Notice the build create command had the side effect of creating a registry access secret that will allow the project to write and read the IBM Cloud® Container Registry.
    ibmcloud ce registry list
    
  4. Next, run the actual build process.
    ibmcloud ce buildrun submit --build ghstats-build
    
    The output indicates more commands to run to follow the status logs of the build as it progresses. Something like:
    ibmcloud ce buildrun logs -f -n ghstats-build-run-123456-123456789
    

Deploy the app (shell)

Once the build is ready, you can use the container image to deploy the app, thereafter bind the previously provisioned services.

  1. To deploy the app means to create a Code Engine app named ghstats-app. It pulls the image from the given registry and namespace.

    ibmcloud ce app create --name ghstats-app --image private.icr.io/$NAMESPACE/codeengine-ghstats:latest --registry-secret ce-auto-icr-private-global
    

    Once the app has deployed, you can check that it is available at the URL shown in the output. The app is not configured and hence not usable yet. You can check the deployment status using ibmcloud ce app list or for details by executing ibmcloud ce app get --name ghstats-app.

    By default, the minimum scaling is zero (0). It means that Code Engine reduces the running instances to zero if there is no workload on the app. This saves costs, but requires a short app restart when scaling up from zero again. You can avoid this by using the paramater --min 1 when creating or updating the app.

  2. To utilize the provisioned services, you have to bind them to the app. First, bind Db2 on Cloud, then App ID:

    ibmcloud ce application bind --name ghstats-app --service-instance ghstatsDB
    
    ibmcloud ce application bind --name ghstats-app --service-instance ghstatsAppID
    

    Each application bind creates the folowing resources and relationships:

    1. An IAM Service ID.
    2. An IAM API key is created in the IAM Service ID.
    3. A resource service key. These are called (Service credentials in the IBM Cloud console. Try the following command to display the App ID entry:
    ibmcloud resource service-keys  --instance-name ghstatsAppID
    

    Instead of binding the services to the app, you could also use secrets or configmaps. They can be populated from values stored in files or passed in as literal. A sample file for secrets and related instruction are in the GitHub repository for this tutorial.

App ID and GitHub configuration (browser)

The following steps are all performed using your Internet browser. First, you configure App ID to use the Cloud Directory and to work with the app. Thereafter, you create a GitHub access token. It is needed by the app to retrieve the traffic data.

  1. In the IBM Cloud® Resource List open the overview of your services. Locate the instance of the App ID service in the Services section. Click on its entry to open the details.

  2. In the service dashboard, click on Manage Authentication in the menu on the left side. It brings a list of the available identity providers, such as Facebook, Google, SAML 2.0 Federation and the Cloud Directory. Switch the Cloud Directory to Enabled, all other providers to Disabled.

    You may want to configure Multi-Factor Authentication (MFA) and advanced password rules. They are not discussed as part of this tutorial.

  3. Click on the Authentication Settings tab in the same dialog. In Add web redirect URLs enter the url of your application + /redirect_uri, for example https://ghstats-app.56ab78cd90ef.us-south.codeengine.appdomain.cloud/redirect_uri.

    For testing the app locally, the redirect URL is http://127.0.0.1:5000/redirect_uri. You can configure multiple redirect URLs. In order to test the app locally, copy .env.local.template to .env, adapt it and start the app using python3 ghstats.py.

  4. In the menu on the left, expand Cloud Directory and click on Users. It opens the list of users in the Cloud Directory. Click on the Create User button to add yourself as the first user. You are now done configuring the App ID service.

  5. In the browser, visit Github.com and go to Settings -> Developer settings -> Personal access tokens. Click on the button Generate new token (classic). Enter GHStats Tutorial for the Note. Thereafter, enable public_repo under the repo category and read:org under admin:org. Now, at the bottom of that page, click on Generate token. The new access token is displayed on the next page. You need it during the following application setup.

    GitHub Access Token
    GitHub Access Token

Configure and test Python app

After the preparation, you configure and test the app. The app is written in Python using the popular Flask microframework. You can add repositories for statistics collection or remove them. You can access the traffic data in a tabular view or as line chart.

  1. In a browser, open the URI of the deployed app. You should see a welcome page.

    Welcome Screen
    Welcome Screen

  2. In the browser, add /admin/initialize-app to the URI and access the page. It is used to initialize the application and its data. Click on the button Start initialization. This will take you to a password-protected configuration page. The email address you log in with is taken as identification for the system administrator. Use the email address and password that you configured earlier.

  3. In the configuration page, enter a name (it is used for greetings), your GitHub user name and the access token that you generated before. Click on Initialize. This creates the database tables and inserts some configuration values. Finally, it creates database records for the system administrator and a tenant.

    First Step
    First Step

  4. Once done, you are taken to the list of managed repositories. You can now add repositories by providing the name of the GitHub account or organization and the name of the repository. After entering the data, click on Add repository. The repository, along with a newly assigned identifier, should appear in the table. You can remove repositories from the system by entering their ID and clicking Delete repository.

    List of repositories
    List of repositories

  5. For testing, click on Administration, then Collect statistics. It retrieves the traffic data on demand. Thereafter, click on Repositories and Daily Traffic. It should display collected data.

    Traffic data
    Traffic data

Set up daily data retrieval (shell)

With the app in place and configured, the last part is to initiate daily retrieval of GitHub traffic data. You are going to create a cron subscription. Similar to a cron job, the app subscribes to events on the specified schedule (eventing).

  1. Create the cron subscription ghstats-daily with a daily schedule at 6 am UTC with a POST event at the path /collectStats. Replace SECRET_TOKEN_AS_IDENTIFIER with your chosen secret value. It is used to identify the event giver to the app.

    ibmcloud ce subscription cron create --name ghstats-daily --destination ghstats-app --path /collectStats --schedule '0 6 * * *' --data '{"token":"SECRET_TOKEN_AS_IDENTIFIER"}' --content-type application/json
    
  2. To make the secret token know to the app, update the app. Replace SECRET_TOKEN_AS_IDENTIFIER with the value you picked at the previous step.

    ibmcloud ce app update --name ghstats-app --registry-secret usicr --env EVENT_TOKEN=SECRET_TOKEN_AS_IDENTIFIER
    

    This creates a new app revision. You can check that the events were received and processed by the app when navigating in the app to Administration, then System log.

    The command above creates a schedule for 6 am UTC daily. To directly check that the eventing works, choose a time few minutes after your current time, converted to UTC.

Conclusions

In this tutorial, you deployed a serverless app in IBM Cloud Code Engine. The app source is taken from a GitHub repository. You instructed Code Engine to build the container image and store it in the IBM Cloud® Container Registry. Next, it was pulled from there and deployed as container. The app is bound to IBM Cloud services.

The app and the associated eventing allow to automatically retrieve traffic data for GitHub repositories. Information about those repositories, including the tenant-specific access token, is stored in a SQL database (Db2® Warehouse on Cloud). That database is used by the Python app to manage users, repositories and to present the traffic statistics. Users can see the traffic statistics in searchable tables or visualized in a simple line chart (see image below). It is also possible to download the list of repositories and the traffic data as CSV files.

Line chart
Line chart

Security: Rotate service credentials

If you use this solution in production, then you should rotate the service credentials on a regular basis. Many security policies have a requirement to change passwords and credentials every 90 days or with similar frequency.

You can recreate and thereby rotate the credentials for the services bound to the app by unbinding, then binding the services again. When using secrets instead of service bindings, you even have more options by first recreating service keys, then updating the secrets and as last step updating the app.

Remove resources

To clean up the resources used for this tutorial, you can delete the related project and services.

  1. Unbind the the provisioned services. First display the bindings then delete them by Service Bindings Names (FIRST and SECOND below are from the get output)
    ibmcloud ce application get --name ghstats-app
    
    ibmcloud ce application unbind --name ghstats-app --binding ghstats-app-ce-service-binding-FIRST
    
    ibmcloud ce application unbind --name ghstats-app --binding ghstats-app-ce-service-binding-SECOND
    
  2. Delete the project and its components.
    ibmcloud ce project delete --name ghstats --hard -f
    
  3. Delete the services:
    ibmcloud resource service-instance-delete -f ghstatsDB
    
    ibmcloud resource service-instance-delete -f ghstatsAppID
    
  4. Delete the Container Registry namespace
    ibmcloud cr namespace-rm $NAMESPACE -f
    
  5. Delete the Github.com token

Depending on the resource it might not be deleted immediately, but retained (by default for 7 days). You can reclaim the resource by deleting it permanently or restore it within the retention period. See this document on how to use resource reclamation.

Expand the tutorial

Want to add to or change this tutorial? Here are some ideas:

  • Expand the app for multi-tenant support.
  • Use social identity providers.
  • Add a date picker to the statistics page to filter displayed data.
  • Use a custom login page for App ID.

Related Content

Here are links to additional information on the topics covered in this tutorial. The app itself is available in this GitHub repository.

Documentation: