IBM Cloud Docs
Text analysis with Code Engine

Text analysis with Code Engine

This tutorial may incur costs. Use the Cost Estimator to generate a cost estimate based on your projected usage.

In this tutorial, you will learn about IBM Cloud® Code Engine by deploying a text analysis with Natural Language Understanding application. You will create a Code Engine project, select the project and deploy Code Engine entities - applications and jobs - to the project. You will learn how to bind IBM Cloud services to your Code Engine entities. Moreover, you will also understand the autoscaling capability of Code Engine where instances are scaled up or down (to zero) based on incoming workload.

IBM Cloud Code Engine is a fully managed, serverless platform that runs your containerized workloads, including web apps, microservices, event-driven functions, or batch jobs. Code Engine even builds container images for you from your source code. Because these workloads are all hosted within the same Kubernetes infrastructure, all of them can seamlessly work together. The Code Engine experience is designed so that you can focus on writing code and not on the infrastructure that is needed to host it.

The platform is designed to address the needs of developers who just want their code to run. Code Engine abstracts the operational burden of building, deploying, and managing workloads in Kubernetes so that developers can focus on what matters most to them: the source code.

Objectives

  • Understand IBM Cloud® Code Engine and how it simplifies the developer experience.
  • Understand how easy it is to deploy and scale an application using Code Engine.
  • Learn the use of jobs to execute run to completion workloads.

Architecture
Figure 1. Architecture diagram of the tutorial

  1. Developer creates a Code Engine project and deploys a frontend and a backend Code Engine application.
  2. Developer connects the frontend (UI) app to the backend by modifying the frontend application to set an environment variable value to point to the backend application's endpoint.
  3. Developer provisions the required cloud services and binds them to the backend application and jobs by creating secrets and configmap.
  4. User uploads a text file(s) via the frontend app that is stored in Object Storage through the backend application.
  5. User runs a Code Engine job via the backend to analyze text by pushing the text to Natural Language Understanding. The result is then saved to Object Storage and displayed in the frontend app when the user clicks the refresh button.

You can use the Code Engine console to view your progress while working through this tutorial.

Before you begin

This tutorial requires:

  • IBM Cloud CLI - This CLI tool will enable you to interact with IBM Cloud.
    • code-engine/ce plugin (code-engine/ce) - Plugins extend the capabilities of the IBM Cloud CLI with commands specific to a service. The Code Engine plugin will give you access to Code Engine commands on IBM Cloud.
    • Optional Container Registry plugin (container-registry)

You can find instructions to download and install these tools for your operating environment in the Getting started with tutorials guide. To avoid the installation of these tools, this tutorial will use the Cloud Shell from the IBM Cloud console.

Start a new IBM Cloud Shell

From the IBM Cloud console in your browser click the button in the upper right corner to create a new Cloud Shell.

Create an IBM Cloud Code Engine project

In this section, you will create a Code Engine project. A project is a grouping of Code Engine entities such as applications, jobs, and builds. Projects are used to manage resources and provide access to its entities.

Putting entities into a single project enables you to manage access control more easily. The entities within a project share the same private network, which enables them to talk to each other securely. For more details read the documentation on Code Engine projects.

  1. Navigate to IBM Cloud Code Engine Overview page.

  2. On the left pane, click on Projects and then click Create.

    • Select a location.
    • Provide a project name.
    • Select the resource group where you will create your project and also the cloud services required in the later steps. Resource groups are a way for you to organize your account resources into customizable groupings.
    • Click on Create.
    • Wait until the project status changes to Active.
  3. Switch to the Cloud Shell session that you started earlier and use it in this tutorial when you are asked to run CLI commands.

  4. Create a shell variable with the project name and resource group name

    PROJECT_NAME=YourProjectName
    RESOURCE_GROUP_NAME=YourResourceGroupName
    
  5. Target the resource group where you created your project.

    ibmcloud target -g $RESOURCE_GROUP_NAME
    
  6. Make the command line tooling point to your project by selecting it.

    ibmcloud code-engine project select --name $PROJECT_NAME
    

Deploy the frontend and backend apps as Code Engine applications

Code Engine applications run your code to serve HTTP requests, automatically scale up and back down to zero, and offer traffic routing to multiple revisions. In this section, you will deploy your frontend and backend applications to the Code Engine project. The frontend web application will allow users to upload text files, while the backend application will write the file to IBM Cloud Object Storage.

We've already built images for the two applications and pushed them to the public Container Registry. You will use these pre-built container images to deploy the respective applications. Creation of your own applications will be covered in a later step.

Deploy a frontend application

  1. To deploy a new Code Engine application, run the following command; providing a service name frontend and the pre-built container image as a parameter to --image flag.

    ibmcloud code-engine application create --name frontend --image icr.io/solution-tutorials/tutorial-text-analysis-code-engine-frontend
    

    After running this command, you should see some output with a URL to your application. It should look something like: https://frontend.305atabsd0w.us-south.codeengine.appdomain.cloud. Copy or make note of this application URL for the next step. With just these two pieces of data (application name and image name), Code Engine has deployed your application and will handle the complexities of configuring it and managing it for you.

    The application source code used to build the container images is available in a GitHub repo for your reference. If you wish to build the container images from source code and push the images to a private Container Registry, follow the instructions here.

  2. Open the application URL from the previous step in a browser to see an output similar to this:

    Frontend is running
    Frontend is running

    Run ibmcloud code-engine application get -n frontend command to see the details of the application. You should see details like the ID, project information, age of the application, the URL to access the application, a Console URL to access your application configuration, Image, Resource allocation, and various revisions, conditions and runtime for your application. Since you only have one revision, you should see that 100% of the traffic is going to the latest revision. You can also check the number of instances and their status.

  3. For troubleshooting and to check the logs of your application, run the following command by replacing the <INSTANCE_NAME> with the name of one of the instances from the ibmcloud code-engine application get -n frontend command.

    If you do not see any running instances, make sure to open the application URL from step 2 again.

    ibmcloud code-engine application logs --instance <INSTANCE_NAME>
    

    If the application is running, you should see backend URL: undefined and App listening on port 8080. Later on in the tutorial, you will connect this frontend application to our backend application

Congratulations!! You've just deployed a web application to Code Engine with a simple command and also without needing to know about the intricacies of Kubernetes such as pods, deployments, services, and ingress.

Scale the application

When you created the application with the application create command, you only passed in an image to use and a name for your application. While this is the minimum amount of information to deploy an application, there are a number of other knobs you have control over. Among others, you can set the number of requests that can be processed concurrently per instance, the amount of CPU for the instance of the application, the amount of memory set for the instance of the application, the environment variables for the application, the maximum and minimum number of instances that can be used for this application, and the port where the application listens for requests.

Most of these values have a default set if nothing is provided as an option when creating the application. Because we did not provide a value, Code Engine deployed our application with a default max scale of 10, meaning that it will only scale our application up to 10 instances. The default minimum scale is zero, so that when our application is no longer in use, it will scale itself back down to zero.

  1. To check the autoscaling capabilities of Code Engine, we can use a load generator to perform requests against our service. The following shell script simulates a basic load of 3000 requests.

    1. Open a local terminal window (shell).

    2. Create a shell variable for the frontend application URL from the step above.

      export APPURL=<frontend-application-url>
      
    3. Run the following script to generate some load. You can repeat it to create more traffic.

      seq 1 3000 | xargs -n1 -P300  curl -s  $APPURL -o /dev/null
      
  2. In your Cloud Shell session from the previous sections, run the below command to see the instance(pod) count incrementing as part of the autoscaling.

    ibmcloud code-engine application get -n frontend
    

    By default, the maximum number of requests that can be processed concurrently per instance is 10 leading to autoscaling and this value can be changed using --concurrency or -cn flag with application update command.

  3. If you didn't want to allow as many as 10 instances to be created, you can adjust the max scale to be a lower number. While your serverless application can easily scale up, you may depend on a downstream service such as a SQL DB that can only handle a limited number of connections or another rate limited API. Let's try limiting the number of instances for this frontend application.

    ibmcloud code-engine application update --name frontend --max-scale 5
    
  4. Once load generation is stopped, wait for a few minutes to see the instances terminating, eventually scaling down to zero instances.

  5. In your local window with the load generator command, run the script again to create requests against the app. In the Cloud Shell session, run the ibmcloud code-engine application get -n frontend command to see the instance count increasing to 5.

    Expected Output:

    Name                                        Revision        Running  Status   Restarts  Age
    frontend-00002-deployment-77d5fbfb5d-7zpfl  frontend-00002  3/3      Running  0         70s
    frontend-00002-deployment-77d5fbfb5d-kv6rn  frontend-00002  3/3      Running  0         69s
    frontend-00002-deployment-77d5fbfb5d-mhlwn  frontend-00002  3/3      Running  0         68s
    frontend-00002-deployment-77d5fbfb5d-qkjmd  frontend-00002  3/3      Running  0         67s
    frontend-00002-deployment-77d5fbfb5d-zpr9n  frontend-00002  3/3      Running  0         85s
    

Deploy a backend application and test the connection

  1. To deploy a new backend application to store your text files into IBM Cloud Object Storage, run this command

    ibmcloud code-engine application create --name backend --image icr.io/solution-tutorials/tutorial-text-analysis-code-engine-backend --cluster-local
    

    The --cluster-local flag will instruct Code Engine to keep the endpoint for this application private, meaning that it will only be available from within the project. This is often used for security purposes. In this case, there is no reason to expose the backend application with a public endpoint, since it will not be accessed from outside of the cluster.

  2. Copy and save the internal endpoint (URL) from the output to use it in the next command. It will look something like this:

    BACKEND_PRIVATE_URL=http://backend.xxxxxx
    

    You can run ibmcloud code-engine application get -n backend command to check the status and details of the backend application which includes the URL.

  3. The frontend application uses an environment variable (BACKEND_URL) to know where the backend application is hosted. You now need to update the frontend application to set this value to point to the backend application's endpoint.

    ibmcloud code-engine application update --name frontend --env BACKEND_URL=$BACKEND_PRIVATE_URL
    

    The --env flag can appear as many times as you would like if you need to set more than one environment variable. This option could have also been used on the ibmcloud code-engine application create command for the frontend application if you knew its value at that time. Learn more by reading the Working with environment variables documentation topic.

  4. Hard refresh the frontend URL on the browser to test the connection to the backend application. You should see a page with an option to upload a text file(.txt) and also an error message from the backend application as the backend is still not connected with the required IBM Cloud services to store and process the text files. Clicking on Upload text file should also show a similar error message.

Connect the backend application to Object Storage service

In this section, you will provision the required Object Storage and Natural Language Understanding services and bind the Object Storage service to the backend application. The backend application will store the text files into the Object Storage, while the Natural Language Understanding will be used later in the tutorial to perform text analysis on the uploaded text files.

With IBM Watson® Natural Language Understanding, developers can analyze semantic features of text input, including categories, concepts, emotion, entities, keywords, metadata, relations, semantic roles, and sentiment.

Provision Object Storage and Natural Language Understanding services

  1. Create an instance of Object Storage

    1. Select the Lite plan or the Standard plan if you already have an Object Storage service instance in your account.
    2. Set Service name to your-initials-code-engine-cos.
    3. Select the resource group where you created the Code Engine project.
    4. Click on Create.
    5. Capture the service name in a shell variable:
      COS_INSTANCE_NAME=your-initials-code-engine-cos
      
  2. Under Create Bucket click Create Bucket, then under Create a Custom Bucket select Create.

    When you create buckets or add objects, be sure to avoid the use of Personally Identifiable Information (PII).Note: PII is information that can identify any user (natural person) by name, location, or any other means.

    1. Enter Unique bucket name such as <your-initials>-bucket-code-engine.
    2. Select a Location, where you created the Code Engine project.
    3. Select Smart Tier Storage class.
    4. Click Create bucket.
    5. Capture the bucket name in a shell variable:
      COS_BUCKETNAME=your-initials-bucket-code-engine
      
  3. On the bucket page.

    1. Click the Configuration tab
    2. The Direct endpoint will keep data within IBM Cloud. Capture the direct endpoint in a shell variable. In the Dallas, us-south, region it might be:
      COS_ENDPOINT=s3.direct.us-south.cloud-object-storage.appdomain.cloud
      
  4. Create an instance of Natural Language Understanding

    1. Select a location and select Lite plan.
    2. Set Service name to code-engine-nlu and select the resource group where you created the Code Engine project.
    3. Read the license agreement and then check I have read and agree to the following license agreements:.
    4. Click on Create.
    5. Capture the service name in a shell variable:
      NLU_INSTANCE_NAME=YourServiceName
      

Bind the Object Storage service to the backend application

Now, you will need to pass in the credentials for the IBM Cloud Object Storage instance you just created into your backend application. You will do this by binding the Object Storage service to your application, which automatically adds credentials for a service to the environment variables for your application or job.

  1. Create a binding for Object Storage service with a prefix COS for ease of use in your application. The bind command creates a service credential in the service instance and from that initializes the environment variables of the application with the credentials. Each service binding can be configured to use a custom environment variable prefix by using the --prefix flag.

    ibmcloud code-engine application bind --name backend --service-instance $COS_INSTANCE_NAME --role Writer --prefix COS
    
  2. You will also need to provide the application with your bucket name where you want to store the text files, as well as your COS endpoint. Both of these were defined in an earlier step. The endpoint for us-south for the Smart tier is s3.direct.us-south.cloud-object-storage.appdomain.cloud.

    Define a configmap to hold the bucket name and the endpoint as the information isn't sensitive. ConfigMaps are a Kubernetes object, which allows you to decouple configuration artifacts from image content to keep containerized applications portable. You could create this configmap from a file or from a key value pair -- for now we'll use a key value pair with the --from-literal flag. Verify that you captured these earlier and create the configmap:

    echo bucket $COS_BUCKETNAME endpoint $COS_ENDPOINT
    
    ibmcloud code-engine configmap create --name backend-configuration --from-literal=COS_BUCKETNAME=$COS_BUCKETNAME --from-literal=COS_ENDPOINT=$COS_ENDPOINT
    
  3. With the configmap defined, you can now update the backend application by asking Code Engine to set environment variables in the runtime of the application based on the values in the configmap. Update the backend application with the following command

    ibmcloud code-engine application update --name backend --env-from-configmap backend-configuration
    

    To create a secret, you would need to use --env-from-secret flag. Both secrets and configmaps are "maps"; so the environment variables set will have a name corresponding to the "key" of each entry in those maps, and the environment variable values will be the value of that "key".

  4. To verify whether the backend application is updated with the binding and configmap. You can run the below command and look for the Service Bindings and Environment Variables sections in the output

    ibmcloud code-engine application get --name backend
    
  5. Go to the frontend UI and upload text files for text analysis. You should see the uploaded files with Not analyzed tag on them.

Process text files with an automated job

Now, you have the backend application connected to the frontend application. You have provided all the required credentials through service binding and uploaded files for text analysis. To complete the test, you will create a job to specify workload configuration information that is used each time that the job is run to analyze text using Natural Language Understanding service.

Create a job

Jobs, unlike applications which react to incoming HTTP requests, are meant to be used for running container images that contain an executable designed to run one time and then exit. When you create a job, you can specify workload configuration information that is used each time the job is run. You can create a job from the console or with the CLI.

This job will read text files from IBM Cloud Object Storage, and then analyze them using the Natural Language Understanding Service. It will need to have access to service credentials for both services.

  1. Run the following command to create a job,
    ibmcloud code-engine job create --name backend-job --image icr.io/solution-tutorials/tutorial-text-analysis-code-engine-backend-job --env-from-configmap backend-configuration
    
    You can set the version of Natural Language Understanding service using the --env flag. For versioning, check this link

Bind the IBM Cloud services to job

  1. Let's create a binding for Object Storage service with a prefix COS_JOB to be used with the jobs to read the uploaded files and to store the results,
    ibmcloud code-engine job bind --name backend-job --service-instance $COS_INSTANCE_NAME --role Writer --prefix COS_JOB
    
  2. Similarly, let's bind Natural Language Understanding service with a prefix NLU_JOB to analyze the uploaded text files,
    ibmcloud code-engine job bind --name backend-job --service-instance $NLU_INSTANCE_NAME --role Writer --prefix NLU_JOB
    
  3. To verify whether the job is updated with the binding and configmap. You can run the below command and look for the Service Bindings and Environment Variables sections in the output
    ibmcloud code-engine job get --name backend-job
    

Run the job

  1. To run a job with the configuration created above, use the jobrun submit command,

    ibmcloud code-engine jobrun submit --name backend-jobrun --job backend-job
    

    When you run a job, you can override many of the variables that you set in the job configuration. To check the variables, run ibmcloud code-engine jobrun submit --help.

  2. To check the status of the jobrun, run the following command

    ibmcloud code-engine jobrun get --name backend-jobrun
    
  3. The logs can be displayed:

    ibmcloud code-engine jobrun logs --follow --name backend
    
  4. In the frontend UI, click on the refresh button (next to Upload text file) to see the Keywords and JSON for each of the uploaded text files. The tag on each file should now change to Analyzed.

  5. Upload new files or delete individual file by clicking the delete icon, resubmit the jobrun with the below command and hit the refresh button to see the results.

    ibmcloud code-engine jobrun resubmit --jobrun backend-jobrun
    

Automate the job run

Instead of running the job manually, you can automate the job run by creating an IBM Cloud Object Storage subscription that listens for changes to an Object Storage bucket. When you create a subscription to a bucket, your job receives a separate event for each successful change to that bucket.

  1. Before you can create an Object Storage subscription, you must assign the Notifications Manager role to Code Engine. As a Notifications Manager, Code Engine can view, modify, and delete notifications for an Object Storage bucket. Follow the instructions here to assign the Notifications Manager role to your Code Engine project.
  2. Run the below command to connect your backend-job to the IBM Cloud Object Storage event producer. Check and update the bucket name before running the command
    ibmcloud code-engine subscription cos create --name backend-job-cos-event --destination-type job --destination backend-job --bucket $COS_BUCKETNAME --prefix files --event-type write
    
  3. Now, just upload new files and hit the refresh button to see the results. Going forward, you don't have to resubmit the jobrun as it is taken care by the subscription.

Optional: Build and push the container images to IBM Cloud Container Registry

There are a few options for building a container image with stand-alone build commands. Running a single build that pulls source from a local directory is used to create a new frontend application:

git clone https://github.com/IBM-Cloud/code-engine-text-analysis
cd code-engine-text-analysis/frontend
echo $BACKEND_PRIVATE_URL

You can change some of the source code to verify. The second occurrence of Text analysis with Code Engine in the body of public/index.html and public/501.html can be changed to add in your name. Create the container image in a code engine namespace and create the application in one command:

ibmcloud ce application create --name frontend-fromsource --build-source . --env BACKEND_URL=$BACKEND_PRIVATE_URL

Remove resources

  1. With the command below, delete the project to delete all its components (applications, jobs etc.).
    ibmcloud code-engine project delete --name $PROJECT_NAME
    
  2. Navigate to Resource List, then delete the services you created:
    • IBM Cloud® Object Storage
    • IBM Watson® Natural Language Understanding

Depending on the resource it might not be deleted immediately, but retained (by default for 7 days). You can reclaim the resource by deleting it permanently or restore it within the retention period. See this document on how to use resource reclamation.