Getting started with Red Hat AI Inference on IBM Cloud

Ready to start using AI in your applications? In this tutorial, you'll learn how to use inferencing to interact with foundation models and generate AI-powered responses. In just 15 minutes, you'll be chatting with a large language model and integrating conversational AI into your workflows.

Red Hat® AI Inference on IBM Cloud® is a business-ready, private, and secure generative AI solution powered by Red Hat OpenShift AI. Red Hat AI on IBM Cloud provides two core capabilities: inferencing for interacting with foundation models and model alignment for customizing models to your specific needs. This tutorial focuses on getting you started with inferencing, the fastest way to start using AI.

What you'll accomplish

In this tutorial, you'll do the following tasks:

Set up your IBM Cloud account and project.
Authenticate to the inferencing API.
Generate your first chat completion with a foundation model.
Learn about next steps for customizing models with your own data.

Before you begin

Make sure you have the following:

A Pay-As-You-Go or Subscription IBM Cloud account. Trial accounts are not supported. For more information or to upgrade your account, see Account types.
A Red Hat AI Inference project.
The Writer role or greater on the Red Hat AI Inference service. For more information, see Managing IAM access.

Get your project ID and API endpoint

Your project ID is required for all API requests.

Go to Red Hat AI Inference projects.
Open your project.
Click Details.
Copy your project ID and save it for the next steps.

API endpoint

All API requests use the following base URL format:

https://us-east.rhai.ibm.com/v1/projects/{project_id}/inference

Replace {project_id} with your project ID.

Authenticate to the API

Before you can interact with foundation models, you need to authenticate your API requests. You can use either a bearer token or an IBM Cloud API key. This tutorial shows how to use a service ID with an API key for programmatic access. For more information on using a bearer token, see Authenticating by using a bearer token.

Create a service ID and assign access

A service ID is a useful way to control and distribute access to Red Hat AI Inference projects. Create the service ID, then assign it access to your project.

In the IBM Cloud console, go to Manage > Access (IAM) > Service IDs and click Create.
Enter a name and description for your service ID, then click Create.
From the service ID page, click Assign access.
Select Red Hat AI Inference as the service.
Within Resources, select Specific resources and choose your project. By doing so, you limit access to a specific project.
Within Roles and actions, select the appropriate service access role:
- Select Writer if you need to create chat completions.
- Select Reader if you only need to read chat completions or view model information.
Platform access roles are not required for API access.
(Optional) Add conditions such as time-based access to further scope the service ID access.
Review the access summary and click Assign.

Create an API key

Now that your service ID has access to your Red Hat AI Inference project, create a service ID API key to use in your API calls.

From the service ID page, click API keys.
Click Create and enter a name for your API key.
For leaked key handling, select Disable the leaked key to automatically disable the key if it's detected as leaked.
Set an expiration date for the key. Regular key rotation is recommended for security.
Click Create.
Copy the API key and save it in a secure location. The key cannot be viewed again.

You can now use that API key to authenticate your requests. In the next step, you'll use this key in the Authorization: Bearer header of your API calls.

Explore available models

Different foundation models have different strengths, so it's important to review the models that are available in your project.

Make the following API call to list all the available models. Replace {project_id} with your project ID and {api_key} with your service ID API key:

curl -L 'https://us-east.rhai.ibm.com/v1/projects/{project_id}/inference/models' \
  -H 'Accept: application/json' \
  -H "Authorization: Bearer {api_key}"

from openai import OpenAI
client = OpenAI(
  api_key="{api_key}",
  base_url="https://us-east.rhai.ibm.com/v1/projects/{project_id}/inference",
)

models = client.models.list()
print(models)

The response shows you all the models you can use, along with information about their capabilities. You can experiment with different models to find the one that best fits your use case.

Generate your first chat completion

Now, send a message to the model and receive an AI-generated response.

Make the following API call, replacing {project_id} with your project ID and {api_key} with your service ID API key:

curl https://us-east.rhai.ibm.com/v1/projects/{project_id}/inference/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {api_key}" \
  -d '{
    "model": "granite-4-0-h-small",
    "messages": [
      {
        "role": "developer",
        "content": "You are a helpful assistant"
      },
      {
        "role": "user",
        "content": "Hello! Tell me about yourself"
      }
    ]
  }'

from openai import OpenAI
client = OpenAI(
  api_key="{api_key}",
  base_url="https://us-east.rhai.ibm.com/v1/projects/{project_id}/inference",
)

completion = client.chat.completions.create(
  model="granite-4-0-h-small",
  messages=[
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello! Tell me about yourself"}
  ]
)

print(completion.choices[0].message)

You should receive a response from the model introducing itself.

Understanding the request

Let's break down what you just did:

model: You specified granite-4-0-h-small, one of the available foundation models. Different models have different capabilities and performance characteristics.
messages: You provided two messages. One was a developer message that set the system prompt and instructed the model on how to behave. The user message included your actual question for the model to answer.
API endpoint: The request went to your project's inferencing endpoint, which handles routing to the appropriate model.

You can customize the model's behavior by adjusting the system prompt, adding more messages, or using different models for different use cases.

Next steps

You've successfully started using inferencing with Red Hat AI on IBM Cloud. Here's what you can do next:

Continue with inferencing

Learn more about inferencing to discover advanced features like streaming responses, adjusting model parameters, and managing conversation history.
Explore the OpenAI Chat Completion API and OGX API documentation for complete API reference.
Integrate inferencing into your applications using the Python SDK or other programming languages.

Customize models with your data

Ready to go beyond general-purpose models? You can customize foundation models with your organization's specific knowledge and skills through model alignment:

Prepare a taxonomy containing your business knowledge and skills.
Generate synthetic data from your taxonomy.
Train a custom model aligned with your specific needs.

By doing so, you can fine-tune models so they understand your business context, terminology, and requirements, which goes beyond what the general-purpose models can provide.