Using watsonx Code Assistant Individual with a local IBM Granite model
watsonx Code Assistant Individual
For individual users, watsonx Code Assistant can access a local model through Ollama, which is a widely used local inferencing engine for large language models. Ollama wraps the underlying model-serving project llama.cpp.
For increased performance and a full set of features for your organization, provision a trial of watsonx Code Assistant on IBM Cloud. For more information, see Setting up your watsonx Code Assistant service in IBM Cloud.
Install the watsonx Code Assistant extension
You can set up Ollama for use within Microsoft Visual Studio Code.
This setup is not available for the Eclipse IDE plug-in. It is only available with the Visual Studio Code extension.
- Open the watsonx Code Assistant page in the Visual Studio Marketplace.
- Click Install on the Marketplace page.
- In Visual Studio Code, click Install on the extension.
- In the extension settings, set Wca: Backend Provider to ollama.
Install Ollama
-
Download and run the ollama installer.
-
On MacOS, you can also use Homebrew to install Ollama:
brew install ollama
Start the Ollama inference server
In a console window, run:
ollama serve
Leave that window open while you use Ollama.
If you receive the message Error: listen tcp 127.0.0.1:11434: bind: address already in use
, the Ollama server is already started.
Install the IBM Granite code model
Get started by installing the granite-code:8b
model available in the Ollama library.
-
Open a new console window.
-
On the command line, type
ollama run granite-code:8b
to download and deploy the model. You see output similar to the following example:pulling manifest pulling 8718ec280572... 100% ▕███████████████████████ 4.6 GB pulling e50df8490144... 100% ▕███████████████████████ ▏ 123 B pulling 58d1e17ffe51... 100% ▕███████████████████████▏ 11 KB pulling 9893bb2c2917... 100% ▕███████████████████████▏ 108 B pulling 0e851433eda0... 100% ▕███████████████████████▏ 485 B verifying sha256 digest writing manifest removing any unused layers success >>>
-
Type
/bye
after the>>>
to exit the Ollama command shell. -
Try out the model by typing:
ollama run granite-code:8b "How do I create a python class?"
-
You should see a response similar to:
To create a Python class, you can define a new class using the "class" keyword followed by the name of the class and a colon. Inside the class definition, you can specify the methods and attributes that the class will have. Here is an example: ...
Configure the Ollama host
By default, the Ollama server runs on IP address 127.0.0.1
, port 11434
, and http as a protocol. If you change the IP address or the port where Ollama is available:
-
In Visual Studio Code, open the extension settings for watsonx Code Assistant.
-
In Wca > Local: API Host, add the host IP and port.
Configure the Granite model to use
By default, watsonx Code Assistant uses the granite-code:8b
model for both chat and code completion. If your environment has enough capacity, install the granite-code:8b-base
model.
To use a different model:
-
Install the
granite-code:8b-base
model. See Install the IBM Granite code model. -
In Visual Studio Code, open the extension settings for watsonx Code Assistant.
-
In Wca > Local: Code Gen Model, enter
granite-code:8b-base
.
Securing your setup
By default, the Ollama server runs on IP address 127.0.0.1, port 11434, using http as a protocol, on your local device. To use https instead, or go through a proxy server, see the Ollama documentation.
Switch from a local model to IBM Cloud
You might decide to switch from a local model to use a service instance on IBM Cloud. You can then configure Visual Studio Code to switch from a local model to IBM Cloud.
For more information, see Setting up your watsonx Code Assistant service in IBM Cloud.
To update your Visual Studio Code editor to use IBM Cloud instead of Ollama:
-
Quit Visual Studio Code.
-
Quit the Ollama application.
-
Start Visual Studio Code, then open watsonx Code Assistant. You should see the message
Ollama is not running in your IDE.
-
Click
Switch to watsonx Code Assistant on IBM Cloud
.
For an alternative method, you can change the extension settings:
-
In Visual Studio Code, open the extension settings for watsonx Code Assistant.
-
In Wca: Backend Provider, switch from
ollama
towcaCore
. -
Restart extensions to apply the change.