Gen AI Pattern for Watsonx on IBM Cloud
This reference architecture summarizes the best practices for Watsonx Gen AI Pattern deployment on IBM Cloud.
AI holds the promise to transform life and business but raises concerns around trust, security, and regulatory compliance. Understanding Gen AI and its infrastructure is vital for navigating its complex landscape. This reference architecture showcases how IBM Cloud and Watsonx provide a secure environment for deploying and governing Gen AI applications.
A more specific use case of this pattern is a Retrieval Augmented Generation (RAG) pattern. RAG enables foundation models to produce factually correct outputs by querying relevant content. RAG is a solution for any business scenario where there is a large body of documentation that a user must consult to provide confident answers.
Below is a diagram that shows the flow of a RAG solution. It is not the entire reference architecture, but a small portion highlighting the options and key IBM Cloud components for an end-to-end RAG flow, i.e., data administration and processing, end-user gen AI application, conversational flow and gen AI task inferencing from LLMs/foundation models. For example, watsonx Assistant can provide conversational flow. It requires indexed data in Elasticsearch or Watson Discovery for content retrieval and includes an embedded LLM for response generation. In the watsonx.ai option, watsonx.ai can provide endpoints to use for querying. It provides options like in-memory and Elasticsearch for content retrieval and has watsonx.ai LLMs/foundation models for response generation.
Architecture diagram
The below diagram represents the architecture for Gen AI on IBM cloud and reuses the best practices for IBM Cloud for Financial Services and VPC reference architecture.
Central to the architecture are three VPCs, which provide for separation of concerns between provider management functionality and consumer workloads.
Management VPC
Provides compute, storage, and network services to enable the client or service provider's administrators to monitor, operate, and maintain the environment.
Workload VPC
Provides compute, storage, and network services to support hosted applications and operations that deliver services to the consumer.
Edge VPC
The edge VPC is used to enhance boundary protection for the workload VPC, by allow consumers to access Gen AI User Interface through the public internet. (see here)
Other features of the reference architecture:
-
Can reside in one or more multizone regions to provide additional resiliency.
-
Enables access to the management VPC from the application provider's enterprise environment through IBM Cloud Virtual Private Network Gateway for VPC.
-
Provides connectivity from the consumer's enterprise environment to the workload VPC through Direct Link.
-
Connects management VPC, workload VPC, and Edge VPC by using IBM Cloud Transit Gateway.
Design concepts
Below is the Architecture Framework Design heatmap that covers design considerations and architecture decisions for the following aspects and domains:
- Data: Artificial Intelligence
- Compute: Virtual Servers, Containers, Serverless
- Storage: Primary Storage, Backup
- Networking: Enterprise Connectivity, Load Balancing, Domain Name Services
- Security: Data Security, Identity & Access, Application Security, Infrastructure & Endpoints, Governance, Risk & Compliance
- DevOps: Build & Test, Delivery Pipeline, Code Repository
- Resiliency: High Availability
- Service Management: Monitoring, Logging, Auditing / tracking, Automated Deployment
Requirements
The following table outlines the requirements that are addressed in this architecture.
Aspect | Requirements |
---|---|
Compute | Provide properly isolated compute resources with adequate compute capacity for the applications. |
Storage | Provide storage that meets the application and database performance requirements. |
Networking | Deploy workloads in isolated environment and enforce information flow policies. Provide secure, encrypted connectivity to the cloud’s private network for management purposes. Distribute incoming application requests across available compute resources. |
Security | Ensure all operator actions are executed securely through a bastion host. Protect the boundaries of the application against denial-of-service and application-layer attacks. Encrypt all application data in transit and at rest to protect from unauthorized disclosure. Encrypt all security data (operational and audit logs) to protect from unauthorized disclosure. Encrypt all data using customer managed keys to meet regulatory compliance requirements for additional security and customer control. Protect secrets through their entire lifecycle and secure them using access control measures. Firewalls must be restrictively configured to prevent all traffic, both inbound and outbound, except that which is required, documented, and approved. |
DevOps | Delivering software and services at the speed the market demands requires teams to iterate and experiment rapidly. They must deploy new versions frequently, driven by feedback and data. |
Resiliency | Support application availability targets and business continuity policies. Ensure availability of the application in the event of planned and unplanned outages. Backup application data to enable recovery in the event of unplanned outages. Provide highly available storage for security data (logs) and backup data. |
Service Management | Monitor system and application health metrics and logs to detect issues that might impact the availability of the application. Generate alerts/notifications about issues that might impact the availability of applications to trigger appropriate responses to minimize down time. Monitor audit logs to track changes and detect potential security problems. Provide a mechanism to identify and send notifications about issues found in audit logs. |
Components
The following table outlines the products or services used in the architecture for each aspect.
Aspects | Architecture components | How the component is used |
---|---|---|
Data | Watsonx Assistant | Conversational artificial intelligence platform |
Watson Discovery | Automates the discovery of information and insights with advanced Natural Language Processing and Understanding | |
watsonx.ai | Brings together new generative AI capabilities powered by foundation models and traditional machine learning (ML) into a powerful studio spanning the AI lifecycle | |
watsonx.data | Enables you to scale analytics and AI with all your data, wherever it resides | |
watsonx.governance | Direct, manage and monitor the artificial intelligence activities | |
watsonx Orchestrate | A digital assistant and platform that uses automation to help businesses streamline processes and save time | |
IBM Cloud Databases - ElasticSearch | Database to store vector representations also known as embeddings created by using machine learning algorithms | |
Milvus | A vector database that stores, indexes, and manages massive embedding vectors that are developed by deep neural networks and other machine learning (ML) models. | |
Compute | Virtual Servers for VPC | Web, App, and database servers |
Code Engine | Abstracts the operational burden of building, deploying, and managing workloads in Kubernetes so that developers can focus on what matters most to them: the source code | |
Red Hat OpenShift Kubernetes Service (ROKS) | A managed offering to create your own cluster of compute hosts where you can deploy and manage containerized apps on IBM Cloud | |
Storage | Cloud Object Storage | Web app static content, backups, logs (application, operational, and audit logs) |
VPC Block Storage | Web app storage if needed | |
Networking | VPC Virtual Private Network (VPN) | Remote access to manage resources in private network |
Virtual Private Endpoint (VPE) | For private network access to Cloud Services, e.g., Key Protect, COS, etc. | |
VPC Load Balancers | Application Load Balancing for web servers, app servers, and database servers | |
Direct Link 2.0 | Seamlessly connect on-premises resources to cloud resources | |
Transit Gateway (TGW) | Connects the Workload and Management VPCs within a region | |
Cloud Internet Services (CIS) | Global load balancing between regions | |
Access Control List (ACL) | To control all incoming and outgoing traffic in Virtual Private Cloud | |
Security | IAM | IBM Cloud Identity & Access Management |
Key Protect | A full-service encryption solution that allows data to be secured and stored in IBM Cloud | |
BYO Bastion Host on VPC VSI | Remote access with Privileged Access Management | |
App ID | Add authentication to web and mobile apps | |
Secrets Manager | Certificate and Secrets Management | |
Security and Compliance Center (SCC) | Implement controls for secure data and workload deployments, and assess security and compliance posture | |
Hyper Protect Crypto Services (HPCS) | Hardware security module (HSM) and Key Management Service | |
Virtual Network Function (VNF) | Virtualized network services running on virtual machines. | |
Event Notifications | Get notified about critical events that occur in your IBM Cloud account. | |
DevOps | Continuous Integration (CI) | A pipeline that tests, scans and builds the deployable artifacts from the application repositories |
Continuous Deployment (CD) | A pipeline that generates all of the evidence and change request summary content | |
Continuous Compliance (CC) | A pipeline that continuously scans deployed artifacts and repositories | |
Container Registry | Highly available, and scalable private image registry | |
Resiliency | VPC VSIs, VPC Block across multiple zones in two regions | Web, app, database high availability and disaster recovery |
Service Management | IBM Cloud Monitoring | Apps and operational monitoring |
IBM Cloud Logs | Scalable logging service that persists logs and provides users with capabilities for querying, tailing, and visualizing logs |
Compliance
CI / CD / CC Pipelines
The Continuous Integration (CI), Continuous Deployment (CD), and Continuous Compliance (CC) pipelines, referred to as DevSecOps Application Lifecycle Management are used to deploy the application, check for vulnerabilities, and ensure auditability. Below are some of important compliance features of DevSecOps Application Lifecycle Management:
-
Vulnerability Scans
Vulnerability scans involve using specialized tools to look for security vulnerabilities in the code. This is crucial to identify and fix potential security issues before they become a problem in production. -
Sign Build Artifacts
The code is compiled and built into software or application artifacts (like executable files or libraries). These artifacts are then digitally signed to ensure their authenticity and integrity. -
Evidence Gathering
This involves collecting and storing evidence of the development process, such as commit logs, build logs, and other relevant data. It helps in tracing back and understanding what happened at different stages of development. -
Evidence Locker
This involves collecting and storing evidence of the development process, such as commit logs, build logs, and other relevant data. This helps in tracing back and understanding what happened at different stages of development.
Security and Compliance Center (SCC)
This reference architecture utilizes the Security and Compliance Center (SCC) which defines policy as code, implements controls for secure data and workload deployments and assess security
and compliance posture. For this reference architecture two profiles are used. The IBM Cloud Framework for Financial Services and AI ICT Guardrails. A profile is a grouping of controls that can be evaluated for compliance.