Gen AI Pattern for Watsonx on IBM Cloud

This reference architecture summarizes the best practices for Watsonx Gen AI Pattern deployment on IBM Cloud.

AI holds the promise to transform life and business but raises concerns around trust, security, and regulatory compliance. Understanding Gen AI and its infrastructure is vital for navigating its complex landscape. This reference architecture showcases how IBM Cloud and Watsonx provide a secure environment for deploying and governing Gen AI applications.

A more specific use case of this pattern is a Retrieval Augmented Generation (RAG) pattern. RAG enables foundation models to produce factually correct outputs by querying relevant content. RAG is a solution for any business scenario where there is a large body of documentation that a user must consult to provide confident answers.

Below is a diagram that shows the flow of a RAG solution. It is not the entire reference architecture, but a small portion highlighting the options and key IBM Cloud components for an end-to-end RAG flow, i.e., data administration and processing, end-user gen AI application, conversational flow and gen AI task inferencing from LLMs/foundation models. For example, watsonx Assistant can provide conversational flow. It requires indexed data in Elasticsearch or Watson Discovery for content retrieval and includes an embedded LLM for response generation. In the watsonx.ai option, watsonx.ai can provide endpoints to use for querying. It provides options like in-memory and Elasticsearch for content retrieval and has watsonx.ai LLMs/foundation models for response generation.

Architecture diagram

The below diagram represents the architecture for Gen AI on IBM cloud and reuses the best practices for IBM Cloud for Financial Services and VPC reference architecture.

Central to the architecture are three VPCs, which provide for separation of concerns between provider management functionality and consumer workloads.

Management VPC
Provides compute, storage, and network services to enable the client or service provider's administrators to monitor, operate, and maintain the environment.

Workload VPC
Provides compute, storage, and network services to support hosted applications and operations that deliver services to the consumer.

Edge VPC
The edge VPC is used to enhance boundary protection for the workload VPC, by allow consumers to access Gen AI User Interface through the public internet. (see here)

Other features of the reference architecture:

Can reside in one or more multizone regions to provide additional resiliency.
Enables access to the management VPC from the application provider's enterprise environment through IBM Cloud Virtual Private Network Gateway for VPC.
Provides connectivity from the consumer's enterprise environment to the workload VPC through Direct Link.
Connects management VPC, workload VPC, and Edge VPC by using IBM Cloud Transit Gateway.

Design concepts

Below is the Architecture Framework Design heatmap that covers design considerations and architecture decisions for the following aspects and domains:

Data: Artificial Intelligence
Compute: Virtual Servers, Containers, Serverless
Storage: Primary Storage, Backup
Networking: Enterprise Connectivity, Load Balancing, Domain Name Services
Security: Data Security, Identity & Access, Application Security, Infrastructure & Endpoints, Governance, Risk & Compliance
DevOps: Build & Test, Delivery Pipeline, Code Repository
Resiliency: High Availability
Service Management: Monitoring, Logging, Auditing / tracking, Automated Deployment

Requirements

The following table outlines the requirements that are addressed in this architecture.

Requirements
Aspect	Requirements
Compute	Provide properly isolated compute resources with adequate compute capacity for the applications.
Storage	Provide storage that meets the application and database performance requirements.
Networking	Deploy workloads in isolated environment and enforce information flow policies. Provide secure, encrypted connectivity to the cloud’s private network for management purposes. Distribute incoming application requests across available compute resources.
Security	Ensure all operator actions are executed securely through a bastion host. Protect the boundaries of the application against denial-of-service and application-layer attacks. Encrypt all application data in transit and at rest to protect from unauthorized disclosure. Encrypt all security data (operational and audit logs) to protect from unauthorized disclosure. Encrypt all data using customer managed keys to meet regulatory compliance requirements for additional security and customer control. Protect secrets through their entire lifecycle and secure them using access control measures. Firewalls must be restrictively configured to prevent all traffic, both inbound and outbound, except that which is required, documented, and approved.
DevOps	Delivering software and services at the speed the market demands requires teams to iterate and experiment rapidly. They must deploy new versions frequently, driven by feedback and data.
Resiliency	Support application availability targets and business continuity policies. Ensure availability of the application in the event of planned and unplanned outages. Backup application data to enable recovery in the event of unplanned outages. Provide highly available storage for security data (logs) and backup data.
Service Management	Monitor system and application health metrics and logs to detect issues that might impact the availability of the application. Generate alerts/notifications about issues that might impact the availability of applications to trigger appropriate responses to minimize down time. Monitor audit logs to track changes and detect potential security problems. Provide a mechanism to identify and send notifications about issues found in audit logs.

Components

The following table outlines the products or services used in the architecture for each aspect.

Components
Aspects	Architecture components	How the component is used
Data	Watsonx Assistant	Conversational artificial intelligence platform
	Watson Discovery	Automates the discovery of information and insights with advanced Natural Language Processing and Understanding
	watsonx.ai	Brings together new generative AI capabilities powered by foundation models and traditional machine learning (ML) into a powerful studio spanning the AI lifecycle
	watsonx.data	Enables you to scale analytics and AI with all your data, wherever it resides
	watsonx.governance	Direct, manage and monitor the artificial intelligence activities
	watsonx Orchestrate	A digital assistant and platform that uses automation to help businesses streamline processes and save time
	IBM Cloud Databases - ElasticSearch	Database to store vector representations also known as embeddings created by using machine learning algorithms
	Milvus	A vector database that stores, indexes, and manages massive embedding vectors that are developed by deep neural networks and other machine learning (ML) models.
Compute	Virtual Servers for VPC	Web, App, and database servers
	Code Engine	Abstracts the operational burden of building, deploying, and managing workloads in Kubernetes so that developers can focus on what matters most to them: the source code
	Red Hat OpenShift Kubernetes Service (ROKS)	A managed offering to create your own cluster of compute hosts where you can deploy and manage containerized apps on IBM Cloud
Storage	Cloud Object Storage	Web app static content, backups, logs (application, operational, and audit logs)
	VPC Block Storage	Web app storage if needed
Networking	VPC Virtual Private Network (VPN)	Remote access to manage resources in private network
	Virtual Private Endpoint (VPE)	For private network access to Cloud Services, e.g., Key Protect, COS, etc.
	VPC Load Balancers	Application Load Balancing for web servers, app servers, and database servers
	Direct Link 2.0	Seamlessly connect on-premises resources to cloud resources
	Transit Gateway (TGW)	Connects the Workload and Management VPCs within a region
	Cloud Internet Services (CIS)	Global load balancing between regions
	Access Control List (ACL)	To control all incoming and outgoing traffic in Virtual Private Cloud
Security	IAM	IBM Cloud Identity & Access Management
	Key Protect	A full-service encryption solution that allows data to be secured and stored in IBM Cloud
	BYO Bastion Host on VPC VSI	Remote access with Privileged Access Management
	App ID	Add authentication to web and mobile apps
	Secrets Manager	Certificate and Secrets Management
	Security and Compliance Center (SCC)	Implement controls for secure data and workload deployments, and assess security and compliance posture
	Hyper Protect Crypto Services (HPCS)	Hardware security module (HSM) and Key Management Service
	Virtual Network Function (VNF)	Virtualized network services running on virtual machines.
	Event Notifications	Get notified about critical events that occur in your IBM Cloud account.
DevOps	Continuous Integration (CI)	A pipeline that tests, scans and builds the deployable artifacts from the application repositories
	Continuous Deployment (CD)	A pipeline that generates all of the evidence and change request summary content
	Continuous Compliance (CC)	A pipeline that continuously scans deployed artifacts and repositories
	Container Registry	Highly available, and scalable private image registry
Resiliency	VPC VSIs, VPC Block across multiple zones in two regions	Web, app, database high availability and disaster recovery
Service Management	IBM Cloud Monitoring	Apps and operational monitoring
	IBM Cloud Logs	Scalable logging service that persists logs and provides users with capabilities for querying, tailing, and visualizing logs

Compliance

CI / CD / CC Pipelines
The Continuous Integration (CI), Continuous Deployment (CD), and Continuous Compliance (CC) pipelines, referred to as DevSecOps Application Lifecycle Management are used to deploy the application, check for vulnerabilities, and ensure auditability. Below are some of important compliance features of DevSecOps Application Lifecycle Management:

Vulnerability Scans
Vulnerability scans involve using specialized tools to look for security vulnerabilities in the code. This is crucial to identify and fix potential security issues before they become a problem in production.
Sign Build Artifacts
The code is compiled and built into software or application artifacts (like executable files or libraries). These artifacts are then digitally signed to ensure their authenticity and integrity.
Evidence Gathering
This involves collecting and storing evidence of the development process, such as commit logs, build logs, and other relevant data. It helps in tracing back and understanding what happened at different stages of development.
Evidence Locker
This involves collecting and storing evidence of the development process, such as commit logs, build logs, and other relevant data. This helps in tracing back and understanding what happened at different stages of development.

Security and Compliance Center (SCC)
This reference architecture utilizes the Security and Compliance Center (SCC) which defines policy as code, implements controls for secure data and workload deployments and assess security and compliance posture. For this reference architecture two profiles are used. The IBM Cloud Framework for Financial Services and AI ICT Guardrails. A profile is a grouping of controls that can be evaluated for compliance.