Using watsonx Code Assistant Individual with a local IBM Granite model

watsonx Code Assistant Individual

For individual users, watsonx Code Assistant can access a local model through Ollama, which is a widely used local inferencing engine for large language models. Ollama wraps the underlying model-serving project llama.cpp.

For increased performance and a full set of features for your organization, provision a trial of watsonx Code Assistant on IBM Cloud. For more information, see Setting up your watsonx Code Assistant service in IBM Cloud.

Install the watsonx Code Assistant extension

You can set up Ollama for use within Microsoft Visual Studio Code.

This setup is not available for the Eclipse IDE plug-in. It is only available with the Visual Studio Code extension.

Open the watsonx Code Assistant page in the Visual Studio Marketplace.
Click Install on the Marketplace page.
In Visual Studio Code, click Install on the extension.
In the extension settings, set Wca: Backend Provider to ollama.

Install Ollama

Download and run the ollama installer.
On macOS, you can also use Homebrew to install Ollama:
```
brew install ollama
```

Start the Ollama inference server

In a console window, run:

ollama serve

Leave that window open while you use Ollama.

If you receive the message Error: listen tcp 127.0.0.1:11434: bind: address already in use, the Ollama server is already started.

Install the IBM Granite code model

Get started by installing the granite-code:8b model available in the Ollama library.

Open a new console window.

On the command line, type ollama run granite-code:8b to download and deploy the model. You see output similar to the following example:

pulling manifest 
pulling 8718ec280572... 100% ▕███████████████████████ 4.6 GB
pulling e50df8490144... 100% ▕███████████████████████ ▏  123 B
pulling 58d1e17ffe51... 100% ▕███████████████████████▏  11 KB
pulling 9893bb2c2917... 100% ▕███████████████████████▏  108 B
pulling 0e851433eda0... 100% ▕███████████████████████▏  485 B
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
>>>

Type /bye after the >>>to exit the Ollama command shell.

Try out the model by typing:

ollama run granite-code:8b "How do I create a python class?"

You should see a response similar to:

To create a Python class, you can define a new class using the "class" keyword followed by the name of the class and a colon. Inside the class definition, you can specify the methods and attributes that the class will have. Here is an example: ...

Configure the Ollama host

By default, the Ollama server runs on IP address 127.0.0.1, port 11434, and http as a protocol. If you change the IP address or the port where Ollama is available:

In Visual Studio Code, open the extension settings for watsonx Code Assistant.
In Wca > Local: API Host, add the host IP and port.

Configure the Granite model to use

By default, watsonx Code Assistant uses the granite-code:8b model for both chat and code completion. If your environment has enough capacity, install the granite-code:8b-base model.

To use a different model:

Install the granite-code:8b-base model. See Install the IBM Granite code model.
In Visual Studio Code, open the extension settings for watsonx Code Assistant.
In Wca > Local: Code Gen Model, enter granite-code:8b-base.

Securing your setup

By default, the Ollama server runs on IP address 127.0.0.1, port 11434, using http as a protocol, on your local device. To use https instead, or go through a proxy server, see the Ollama documentation.

Switch from a local model to IBM Cloud

You might decide to switch from a local model to use a service instance on IBM Cloud. You can then configure Visual Studio Code to switch from a local model to IBM Cloud.

For more information, see Setting up your watsonx Code Assistant service in IBM Cloud.

To update your Visual Studio Code editor to use IBM Cloud instead of Ollama:

Quit Visual Studio Code.
Quit the Ollama application.
Start Visual Studio Code, then open watsonx Code Assistant. You should see the message Ollama is not running in your IDE.
Click Switch to watsonx Code Assistant on IBM Cloud.

For an alternative method, you can change the extension settings:

In Visual Studio Code, open the extension settings for watsonx Code Assistant.
In Wca: Backend Provider, switch from ollama to wcaCore.
Restart extensions to apply the change.