Setting up retrieval augmented generation (RAG)

watsonx Code Assistant Standard plan

RAG is the process of optimizing the large language model (LLM) output through the prompt augmentation with the additional context. If you submit a query, watsonx Code Assistant uses the RAG tools to retrieve the information from your code bases or documentation. This relevant context is appended to the query before it is sent to the LLM model. The RAG system determines the sources that need to be included or excluded to generate a response with the most useful information.

RAG requires the watsonx Code Assistant Standard plan and the Visual Studio Code extension. It isn't available for use with the Eclipse plug-in.

IBM watsonx Code Assistant supports RAG that enhances response quality of user queries in relevant, up-to-date context from code bases and documentation. RAG reduces model hallucinations and improves the accuracy of generated responses.

You can configure watsonx Code Assistant with specific code repositories and project documentation that are not stored in the Git repository by using RAG to extract relevant information for the chat message. You can configure documentation such as API documents, readme files, technical and design documents, Markdown, PDFs, Word, and PowerPoint documents.

The following figure illustrates the procedure to configure RAG for watsonx Code Assistant:

Procedure for setting up RAG in watsonx Code Assistant

Enabling RAG

To enable RAG:

Provision a vector store on IBM Cloud.
1. Provision a Milvus or Elasticsearch vector store instance on the IBM Cloud. If the vector store is already available on your IBM Cloud, you can skip this step.
  - To provision an Elasticsearch vector store instance, see Provisioning Databases for Elasticsearch.
  - To provision a Milvus vector store instance, see Provisioning watsonx.data.
    
    Milvus is a part of IBM watsonx.data. You can provision a Milvus instance with the watsonx.data enterprise plan.
2. Optional: Provision an instance of IBM watsonx.ai Studio to index the code repository.
Index or refresh your code repositories in Milvus or Elasticsearch. For more information, see Indexing code repositories with IBM watsonx.ai Studio.

Set up the connection assets:
1. Use the existing deployment space in your watsonx Code Assistant service instance.
2. Create a connection asset for each index that is created in the vector store. For more information, see Creating a connection asset.
Set up the Git personal access token in Visual Studio Code and use the RAG-enabled prompts.
1. Click your profile icon in GitHub and go to Settings > Developer Settings > Personal Access Tokens > Tokens (Classic).
2. Copy your personal access token.
3. In Visual Studio Code, click View, then click Command Palette.
4. Search for WCA, then click Enter GitHub Personal Access Token for WCA.
5. Enter your GitHub personal access token and press Enter or Return.
6. In chat messages, enter @repo < instruction > or @docs < instruction > to generate a response that uses the context from the referenced repository or documents in the vector store. Replace the < instruction > parameter with a prompt message.
- Sample syntax to use the referenced repository:
```
@repo how is a chat message processed?
```
  watsonx Code Assistant uses the indexed repositories based on the following conditions:
  - If one repository is opened in Visual Studio Code, watsonx Code Assistant searches for the context in the opened repository as default.
  - If multiple repositories are opened in Visual Studio Code, watsonx Code Assistant searches for the context from the repository that is associated with the recently accessed file.
  - Watsonx Code Assistant checks for the repo.yaml file in the indexed repository when you enter @repo < instruction > syntax in the chat. If one or more YAML configuration files are configured, watsonx Code Assistant uses all the configured repositories to generate a response. If YAML configuration is not configured, watsonx Code Assistant uses the selected repository.
- Sample syntax to use the referenced document collection:
```
@docs What are the steps to setup a connection to the user data store?
```
  Watsonx Code Assistant uses the indexed document collections based on the following conditions:
  - If the document is opened in Visual Studio Code, watsonx Code Assistant searches for the context in the opened repository as default.
  - If multiple document collections are opened in Visual Studio Code, watsonx Code Assistant searches for the context from the recently accessed document.
  - Watsonx Code Assistant checks for the docs.yaml file in the indexed repository when you enter @docs < instruction > syntax in the chat. If one or more YAML configuration files are configured, watsonx Code Assistant uses all the configured document collections to generate a response. If YAML configuration is not configured, watsonx Code Assistant uses all the document with the docs_<filename> name in your deployment space.
Optional: If you need to set up YAML configuration for indexed repositories or document collections, see Setting up YAML configuration for RAG.

Setting up YAML configuration for RAG

You can set up YAML configuration optionally to allow watsonx Code Assistant to search for multiple repositories at the same time or use the specific indexed code repository or document in the vector store. If you do not set up YAML configuration, watsonx Code Assistant uses the repository that is opened in Visual Studio Code as default.

watsonx Code Assistant uses the API key authorization method to ensure that you can access only the authorized repositories or document collections.

To set up YAML configuration for specific indexed repository, complete the following steps:

Create a .wca/repo folder at the root level of the repository.
Create a YAML file with the following fields:
```
repo:
   - url: git@github.ibm.com:code-assistant/<my-code>.git
```
If you need to configure watsonx Code Assistant with multiple repositories, create a repo.yaml file for each repository that needs to be used to generate a response.

To set up YAML configuration for specific document collections, complete the following steps:

Create a .wca/docs folder at the root level of the repository.
Create a YAML file with the following fields:
```
docs: 
   - url: my_collection
```

Use case scenarios

The use case scenarios explain how RAG works and how data is retrieved from indexed repositories or documents.

Using a single code repository

If you are working in a single repository and want to use the code from this repository as context for chat conversations, complete the following steps:

Ensure that you have access to the repository in GitHub.
Index the repository in the vector store. For more information, see Indexing code repository with IBM watsonx.ai Studio
Create a connection asset for the repository. For more information, see Creating a connection asset.
Generate a Git personal access token from your GitHub account and complete the setup in Visual Studio Code.
Open the repository in Visual Studio Code.
Use the @repo command in chat to generate a response that uses the context from the repository.

Using multiple code repositories

If you are working on a repository that has dependencies on another repository, you can use the code from both the repositories as context for chat conversations. To use both repositories, complete the following steps:

Ensure that you have access to both repositories in GitHub.
Index each repository in the vector store separately. For more information, see Indexing code repository with IBM watsonx.ai Studio.
Create two connection assets for each vector store. For more information, see Creating a connection asset.
Generate a Git personal access token from your GitHub account and complete the setup in Visual Studio Code.
In the first repository, set up the YAML configuration to list both repositories.
Use the @repo command in chat to generate a response that uses the context from both repositories.

If you use same index for both repositories, the GitHub access check is not used for the second repository. If you do not have access for one of the repositories, watsonx Code Assistant generates the context from the authorized repository only.

Enabling all the users in a team to access project documentation repositories

You can allow all the users in a team to access the project documentation repositories and use the documents as context for chat conversations.

To enable access for all users:

Index the project documentation repositories in the vector store. For more information, see Indexing code repository with IBM watsonx.ai Studio.

You can use the same index for all documentation repositories if all the users have access for the indexed project documentation repositories. If the users have an access restriction for the indexed repository, see Enabling the users in a different subteam to access project documentation repositories.
Create a connection asset for the documentation index in the deployment space that includes all the users of the team. For more information, see Creating a connection asset.
Use the @docs command in chat to generate a response that uses the context from the documentation index.

The GitHub personal access tokens are not used to verify access to documentation content. You can combine several documentation repositories in the same index in the vector store. The documentation content is less restrictive than code repositories and a single index for all the documentation repositories simplifies the index management process. The authorization to access the documentation repositories is only allowed to the onboarded users in the deployment space.

Enabling the users in a different subteam to access project documentation repositories

In this use case, different subteams within a large team maintain the project documentation repositories. Access to the documentation repositories is restricted to the different subteams. Other subteams don't have access for these documentation repositories.

To enable each subteam to access the related project documentation repositories:

Index the documentation repositories separately in the vector store. For more information, see Indexing code repository with IBM watsonx.ai Studio.
Ensure that the deployment space is created for each subteam that includes its users.
For each documentation repository index, create a connection asset in the related deployment space. For more information, see Creating a connection asset.
Use the @docs command in chat to generate a response that uses the context from the documentation index.

Watsonx Code Assistant uses the documentation repository index based on the deployment space that is assigned to the users. The client-side configuration is not required.

Enabling role-based users to access project documentation repositories

In this use case, users in a team require access to different sets of documentation based on their roles. The scope of documentation that is used as context in varies across the users with no access restrictions. For example, developers need technical and API documentation only, while business analysts focus on business process documents.

To customize the documentation repositories that are used as context:

Index each documentation repository in the vector store separately. For more information, see Indexing code repository with IBM watsonx.ai Studio.
Create a connection asset for each index in the deployment space of the team. For more information, see Creating a connection asset.
Set up YAML configuration for the required documentation index that is used as context. If you need to use multiple documentation indexes, set up YAML configuration for each documentation index.
Use the @docs command in chat to generate a response that uses the context from the documentation index.