IBM Cloud Docs
Creating a library set for Python package install

Creating a library set for Python package install

A library set is a collection of libraries that you can create and reference in Spark applications that consume the libraries. The library set is stored in the instance home storage associated with the instance at the time the instance is created.

Currently, you can only install Python packages through conda or pip install.

Analytics Engine bundles a Spark application called customize_instance_app.py that you run to create a library set with your custom packages and can be consumed by your Spark applications.

Prerequites: To create a library set, you must have the permissions to submit a Spark application. See User permissions.

To create a library set:

  1. Prepare a JSON file like the following:

    {
      "library_set": {
        "action": "add",
        "name": "my_library_set",
        "libraries": {
          "conda": {
            "python": {
              "packages": ["numpy"]
              }
          }
      }
      }
    }
    

    The description of the JSON attributes are as follows:

    • "library_set": The top level JSON object that defines the library set.
    • "action": Specifies the action to be taken for the library set. To create a library set, we use "add". Currently, "add" is the only option supported.
    • "name": Specifies the name with which the library set is identified. The created library set is stored in a file with this name in the IBM Cloud Object Storage instance that you specified as instance home. Important: If you create more than one library set, you must use unique names for each set. Otherwise they will overwrite one another.
    • "libraries": Defines a set of libraries. You can specify one or more libraries in this section. This element has one child JSON object per library package manager. Currently, only the "conda" and "pip" package managers are supported. Use "pip" or "conda" to install Python packages.
    • "conda": Library package manager.
    • "python": The library language. Currently, only Python is supported.
    • "packages": List of packages to install. To install a specific version of a package, pass the version using this format: package_name==version.
  2. Get the IAM token.

  3. Pass the JSON file as "arguments" in the following REST API call. Make sure that you escape the quotes as required, while passing to the REST API call.

    curl -X POST https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance_id>/spark_applications --header "Authorization: Bearer <IAM token>" -H "content-type: application/json" -d @createLibraryset.json
    

    Example for createLibraryset.json:

    {
      "application_details": {
        "application": "/opt/ibm/customization-scripts/customize_instance_app.py",
        "arguments": ["{\"library_set\":{\"action\":\"add\",\"name\":\"my_library_set\",\"libraries\":{\"conda\":{\"python\":{\"packages\":[\"numpy\"]}}}}}"]
        }
    }
    

    Important: You must escape all double quote characters in the strings that are passed as application arguments.

    If the application is accepted, you will receive a response like the following:

    {
      "id": "87e63712-a823-4aa1-9f6e-7291d4e5a113",
      "state": "accepted"
    }
    

    When the state turns to FINISHED, the library set creation is complete.

  4. Track the status of the application by invoking the application status REST API. See Get the status of an application.