Agentic AI patterns on IBM Cloud

This reference architecture summarizes the best practices for deploying agentic artificial intelligence (AI) patterns on IBM Cloud.

Agentic AI is rapidly enabling a new paradigm for organizations to improve their products, services, and business processes. Understanding agentic AI and its dependencies is vital for navigating its complex landscape. Agentic AI systems require not just the AI and generative AI capabilities but also rely on various other supporting services that provide the ecosystem for development and operations.

Logical architecture and key components

The following figure shows the logical architecture and key components of a typical agentic AI system on IBM Cloud.

alt text — Agentic AI system showing logical architecture and key components.

To perform its function, each logical layer is enabled by a set of SaaS, PaaS, or IaaS cloud service options. SaaS services are multi-tenant, meaning that they logically separate deployments from different organizations but physically share the underlying infrastructure environment that is managed by IBM Cloud. SaaS is suited for convenience, initial experimentation, PoCs and for simple to medium-complexity deployments in production. IaaS and PaaS services provide single-tenant dedicated isolated infrastructure for deployments with varying degree of control and management responsibilities for the organization.

To learn more about the logical layers, their key components, and the typical SaaS, PassS, and IaaS cloud services that enable their function, read the following section.

Agentic AI application: Includes the core agentic AI system of AI agent/s with their tools, and the orchestration framework to service the user requests by using LLMs. This might also include the applications that are required for user interaction with the agentic AI system (for example the user interface, chat interface, smartphone apps, etc.) and the backend systems that provide the enterprise’s business logic and data to the agentic AI system (for example, microservices, internal API, etc.).

It is often cost effective to use multi-tenant (PaaS) services like serverless Code Engine for hosting applications, including the core agentic AI application. For complex applications and backend microservices, VPC based virtual server instances or Red Hat OpenShift cluster provide a single-tenant dedicated alternative. These applications typically do not require GPUs.
Generative AI platform: Includes the foundational platform that provides the LLM services. It brings together the GPUs and models to provide model serving for inferencing. It includes an ecosystem of UI/APIs or services that enable prompt and fine-tuning of models, iterative testing, and validation, model management, scaling, vector stores for retrieval augmented generation (RAG) and so on.

A fundamental architectural decision is about the organization's level of access and control over GPUs for generative AI tasks like inferencing, and fine-tuning. IBM Cloud watsonx.ai SaaS service has a shared multi-tenant infrastructure and GPUs shared across organizations' workloads. IBM Cloud IaaS and PaaS services that include options from virtual servers, Red Hat AI, and watsonx.ai software provide dedicated single-tenant infrastructure and control over GPUs.
Supporting services: This includes the ecosystem of supporting services for overall security, compliance, secure DevOps application lifecycle, and services that enable routine operational management for the agentic AI system and its dependencies.

These services are available as multi-tenant SaaS services on IBM Cloud.

Based on the described considerations and the enabling IaaS, PaaS, SaaS service combinations, multiple architectural patterns for deploying an agentic AI system are possible.

Common architectural patterns

The common architectural patterns for agentic AI on IBM Cloud include:

Pattern 1: Minimal - Shared GPUs: Refer to this for getting started with agentic AI, basic experimentation, development, PoCs, etc.
Pattern 2: Small/Medium - Shared GPUs: Refer to this for small/medium size production agentic AI workloads, where the organization wants convenience and does not want deep control over generative AI infrastructure and platform.
Pattern 3: Medium/Large - Dedicated GPUs: Refer to this for medium/large production agentic AI workloads, where the organization wants deeper control over generative AI infrastructure, GPUs and platform.

Reference architectures for these patterns are described in the following sections.

Architecture diagram

The reference architectures below show how IBM Cloud provides a secure, compliant, and resilient environment to implement an agentic AI system.

Pattern 1: Minimal - Shared GPUs

This architecture uses minimal shared multi-tenant PaaS and SaaS services from IBM Cloud for deploying applications and using generative AI platform GPUs. For trial/PoCs, the enhanced security, compliance, logging, monitoring services are not required and not included in the architecture.

The following is a description about the deployment of workloads and their administration.

Management and Administration: IBM Cloud Identity and Access Management services.
Application Workload: Serverless platform like Code Engine for hosting these applications, backend services and IBM Cloud databases.
Agentic AI Application Workload: Serverless platform like Code Engine and other IBM Cloud platforms and services like watsonx.ai and watsonx Orchestrate that provide low-code / no-code alternatives to create and host agentic AI systems.
Generative AI Platform: Generative AI capabilities and shared GPUs are available from watsonx.ai SaaS service.
Edge Compute and Network: PaaS and SaaS services provide their own security and access controls for the hosted applications and services.

To get started with this pattern, check out the following references:

Pattern 2: Small/Medium - Shared GPUs

This architecture uses shared multi-tenant PaaS and SaaS services from IBM Cloud for deploying applications and utilizing generative AI platform GPUs. Security, compliance, logging, monitoring and application lifecycle (DevSecOps) are common and required services available as SaaS on IBM Cloud.

Below is a description about the deployment of workloads and their administration.

Management and Administration: IBM Cloud Identity and Access Management services.
Application Workload: Serverless platform like Code Engine for hosting these applications, backend services and IBM Cloud databases.
Agentic AI Application Workload: Serverless platform like Code Engine and other IBM Cloud platforms and services like watsonx.ai and watsonx Orchestrate that provide low-code / no-code alternatives to create and host agentic AI systems.
Generative AI Platform: Generative AI capabilities and shared GPUs are available from watsonx.ai SaaS service.
Edge Compute and Network: PaaS and SaaS services provide their own security and access controls for the hosted applications and services.

In this pattern Deployable Architectures can be utilized for automating the provisioning and deployment of services. Here are some references.

Pattern 3: Medium/Large - Dedicated GPUs

This architecture uses dedicated single-tenant deployments of applications and generative AI platform and utilizes virtual private clouds (VPCs) with IBM Cloud IaaS and PaaS services. It reuses the best practices for IBM Cloud for Financial Services and VPC reference architecture.

Security, compliance, logging, monitoring and application lifecycle (DevSecOps) are common and required services available as SaaS on IBM Cloud.

The following is a description about the deployment of workloads on VPCs and their administration.

Management and Administration

The Management VPC provides compute, storage, and network services like VPN to enable the consumer's or service provider's administrators to monitor, operate, and maintain the deployed agentic AI environment infrastructure. IBM Cloud Identity and Access Management services provides controls for PaaS and SaaS services.

Application Workload

The Application workload VPC provides the dedicated compute, storage, and network services to support the frontend / UI application that end-users use to access an agentic system. It also includes backend applications and business microservices that agentic AI application may require. These include the virtual server instance or Red Hat OpenShift for containerized workloads.

Agentic AI Application Workload

The Agentic AI Application Workload VPC provides the dedicated compute, storage, and network services to support the core agentic AI application and their dependencies like the tools and orchestration frameworks. These include the virtual server instance or Red Hat OpenShift for containerized workloads.

Generative AI Platform

The Gen AI Platform Workload VPC provides the dedicated compute, including GPUs, storage, and network services to securely support a generative AI platform of virtual server instances or containers. The options include:

Virtual Server Instances (VSI) with GPUs
Kubernetes cluster with GPU nodes
Red Hat Enterprise Linux AI (RHELAI) VSI with GPU nodes
Red Hat OpenShift cluster with GPU nodes
Red Hat OpenShift AI (RHOAI) cluster with GPU nodes
Red Hat OpenShift cluster with GPU nodes + watsonx AI software

For more information about the available GPU profiles, see the VPC documentation.

Edge Compute and Network

The edge VPC is used to enhance boundary protection. Consumers access agentic AI front-end/user interface applications in the workload VPCs from the public internet through edge VPC protections and load balancing. Transit Gateway provides connectivity to external or on-premises resources via Direct Link and also between VPCs.

This pattern includes deploying applications and platforms on VPCs. Here are some references.

Design concepts

The following heatmap covers the design considerations related to the Architecture Framework.

Data: Artifical Intelligence
Compute: Virtual Servers, Containers, Serverless
Storage: Primary Storage, Backup
Networking: Enterprise Connectivity, Load Balancing, Domain Name Services
Security: Data Security, Identity & Access, Application Security, Infrastructure & Endpoints, Governance, Risk & Compliance
DevOps: Build & Test, Delivery Pipeline, Code Repository
Resiliency: High Availability
Service Management: Monitoring, Logging, Auditing / tracking, Automated Deployment

Requirements

The following table outlines the requirements that are addressed in this architecture.

Requirements
Aspect	Requirements
Compute	Provide properly isolated compute resources with adequate compute capacity for the applications.
Storage	Provide storage that meets the application and database performance requirements.
Networking	Deploy workloads in isolated environment and enforce information flow policies. Provide secure, encrypted connectivity to the cloud’s private network for management purposes. Distribute incoming application requests across available compute resources.
Security	Ensure all operator actions are executed securely through a bastion host. Protect the boundaries of the application against denial-of-service and application-layer attacks. Encrypt all application data in transit and at rest to protect from unauthorized disclosure. Encrypt all security data (operational and audit logs) to protect from unauthorized disclosure. Encrypt all data using customer managed keys to meet regulatory compliance requirements for additional security and customer control. Protect secrets through their entire lifecycle and secure them using access control measures. Firewalls must be restrictively configured to prevent all traffic, both inbound and outbound, except that which is required, documented, and approved.
DevOps	Delivering software and services at the speed the market demands requires teams to iterate and experiment rapidly. They must deploy new versions frequently, driven by feedback and data.
Resiliency	Support application availability targets and business continuity policies. Ensure availability of the application in the event of planned and unplanned outages. Backup application data to enable recovery in the event of unplanned outages. Provide highly available storage for security data (logs) and backup data.
Service Management	Monitor system and application health metrics and logs to detect issues that might impact the availability of the application. Generate alerts/notifications about issues that might impact the availability of applications to trigger appropriate responses to minimize down time. Monitor audit logs to track changes and detect potential security problems. Provide a mechanism to identify and send notifications about issues found in audit logs.

Components

The following table outlines the products or services used in the architecture for each aspect.

Components
Aspects	Architecture components	How the component is used
Data	watsonx Orchestrate	Orchestrate AI agents, assistants and workflows across your business
	watsonx.ai	Brings together new generative AI capabilities powered by foundation models and traditional machine learning (ML) into a powerful studio spanning the AI lifecycle
	watsonx.data with Milvus	Enables data analytics for AI at scale and provides Milvus database to store vector embeddings for RAG patterns
	watsonx.governance	Direct, manage and monitor the artificial intelligence activities
	Elasticsearch	Database to store vector embeddings for RAG patterns
Compute	Virtual Servers for VPC	Web, App, and database servers
	Code Engine	Abstracts the operational burden of building, deploying, and managing workloads in Kubernetes so that developers can focus on what matters most to them: the source code
	Red Hat OpenShift Kubernetes Service (ROKS)	A managed offering to create your own cluster of compute hosts where you can deploy and manage containerized apps on IBM Cloud
Storage	Cloud Object Storage	Web app static content, backups, logs (application, operational, and audit logs)
	VPC Block Storage	Web app storage if needed
Networking	VPC Virtual Private Network (VPN)	Remote access to manage resources in private network
	Virtual Private Endpoint (VPE)	For private network access to Cloud Services, e.g., Key Protect, COS, etc.
	VPC Load Balancers	Application Load Balancing for web servers, app servers, and database servers
	Direct Link 2.0	Seamlessly connect on-premises resources to cloud resources
	Transit Gateway (TGW)	Connects the Workload and Management VPCs within a region
	Cloud Internet Services (CIS)	Global load balancing between regions
	Access Control List (ACL)	To control all incoming and outgoing traffic in Virtual Private Cloud
Security	IAM	IBM Cloud Identity & Access Management
	Key Protect	A full-service encryption solution that allows data to be secured and stored in IBM Cloud
	BYO Bastion Host on VPC VSI	Remote access with Privileged Access Management
	App ID	Add authentication to web and mobile apps
	Secrets Manager	Certificate and Secrets Management
	Security and Compliance Center	Implement controls for secure data and workload deployments, and assess security and compliance posture
	Hyper Protect Crypto Services (HPCS)	Hardware security module (HSM) and Key Management Service
	Virtual Network Function (VNF)	Virtualized network services running on virtual machines.
DevOps	Continuous Integration (CI)	A pipeline that tests, scans and builds the deployable artifacts from the application repositories
	Continuous Deployment (CD)	A pipeline that generates all of the evidence and change request summary content
	Continuous Compliance (CC)	A pipeline that continuously scans deployed artifacts and repositories
	Container Registry	Highly available, and scalable private image registry
Resiliency	VPC VSIs, VPC Block across multiple zones in two regions	Web, app, database high availability and disaster recovery
Service Management	Cloud Monitoring	Apps and operational monitoring
	Cloud Logs	Operational and audit logs

Compliance

CI / CD / CC Pipelines

The Continuous Integration (CI), Continuous Deployment (CD), and Continuous Compliance (CC) pipelines, referred to as DevSecOps Application Lifecycle Management are used to deploy the application, check for vulnerabilities, and ensure auditability. Below are some of important compliance features of DevSecOps Application Lifecycle Management:

Vulnerability Scans: Vulnerability scans involve using specialized tools to look for security vulnerabilities in the code. This is crucial to identify and fix potential security issues before they become a problem in production.
Sign Build Artifacts: The code is compiled and built into software or application artifacts (like executable files or libraries). These artifacts are then digitally signed to ensure their authenticity and integrity.
Evidence Gathering: This involves collecting and storing evidence of the development process, such as commit logs, build logs, and other relevant data. It helps in tracing back and understanding what happened at different stages of development.
Evidence Locker: This involves collecting and storing evidence of the development process, such as commit logs, build logs, and other relevant data. This helps in tracing back and understanding what happened at different stages of development.

Security and Compliance Center

This reference architecture utilizes the Security and Compliance Center which defines policy as code, implements controls for secure data and workload deployments and assess security and compliance posture. For this reference architecture two profiles are used. The IBM Cloud Framework for Financial Services and AI ICT Guardrails. A profile is a grouping of controls that can be evaluated for compliance.