Indexing code repositories with IBM watsonx.ai Studio
watsonx Code Assistant Standard plan
You can use IBM watsonx.ai Studio to index your code repositories to enable retrieval augmented generation (RAG).
Before you begin
- Provision an instance of Milvus or Elasticsearch DB on IBM Cloud before you run the notebook.
- Provision an instance of watsonx.ai Studio to run the notebook. To create a watsonx.ai Studio instance, see watsonx.ai Studio catalog.
- Ensure that you have an API key to access watsonx Code Assistant to generate explanations for the functions in your code. The explanations are indexed with the code for better search results. For more information, see Create an IBM Cloud API key.
Planning your index
Effective indexing is essential for optimizing the performance of watsonx Code Assistant when you use RAG. Consider the following factors when you index the repository or documents:
-
Ensure that the storage space in the disk is double the size of the content that is being indexed.
-
Index code repositories when you reuse existing code to generate new code, locating implementations of specific functions, answering how-to questions related to the code base. You can include main code projects and their dependencies such as API implementation.
-
Index documentation repositories such as product or project documentation, technical guides, and design documents if you need watsonx Code Assistant to respond to queries based on these documents.
-
Do not index rarely used code repositories or documents to reduce storage usage and minimize the time that is required for index creation and maintenance.
-
Use one index per repository, especially for source code to limit the context search to specific repositories.
-
Combine multiple documentation repositories in a single index if you commonly search across all documents. You can create separate indexes if a significant keyword overlap between repositories and different team or users need to search the context in the specific documentation repositories.
-
Do not merge code from different repositories in the same index. Merged code reduces the accuracy of the response generation for the prompt and leads to unauthorized access of the code if the access restriction for the users is not implemented properly.
-
Ensure that you manage security and authorization correctly during the indexing process. For different usage scenarios, see Use case scenarios.
-
Ensure that all necessary organizational approvals are obtained before you index the confidential or sensitive content. The source code and documents that are used for RAG are saved as plain text and vectorized format in a vector store outside your GitHub repository.
Indexing code repositories
-
Log in to this project: Create RAG vector stores for watsonx Code Assistant.
-
Click Create project.
-
Select a Cloud Object Storage instance in the Storage field. If the Cloud Object Storage instance is not available in the drop-down list, create it.
-
Click Create.
-
Go to Asset tab in your project and click Populate vector store notebook.
-
Follow the instructions in the
Populate vector store
notebook to index the code or document.