YouTip LogoYouTip

Ollama Python Sdk

Ollama Python Usage | \n\n

Ollama Python Usage |

\n\n

Ollama provides a Python SDK that allows us to interact with locally running models in a Python environment.

\n\n

Through Ollama's Python SDK, you can easily integrate natural language processing tasks into Python projects, perform various operations such as text generation, conversation generation, model management, and more, without manually invoking command line tools.

\n\n

Install Python SDK

\n\n

First, we need to install Ollama's Python SDK.

\n\n

You can install using pip:

\n\n
pip install ollama\n
\n\n

Make sure Python 3.x is installed in your environment, and the network environment can access the Ollama local service.

\n\n

Start Local Service

\n\n

Before using the Python SDK, make sure the Ollama local service is already running.

\n\n

You can use the command line tool to start it:

\n\n
ollama serve\n
\n\n

After starting the local service, the Python SDK will communicate with the local service to perform tasks such as model inference.

\n\n

Use Ollama's Python SDK for Inference

\n\n

After installing the SDK and starting the local service, we can interact with Ollama through Python code.

\n\n

First, import chat and ChatResponse from the ollama library:

\n\n
from ollama import chat\nfrom ollama import ChatResponse\n
\n\n

Through the Python SDK, you can send requests to the specified model to generate text or conversations:

\n\n

Example

\n\n
from ollama import chat\n\nfrom ollama import ChatResponse\n\nresponse: ChatResponse = chat(model='deepseek-coder', messages=[\n\n{\n\n'role': 'user',\n\n'content': 'Who are you?',\n\n},\n\n])\n\n# Print the response content\n\nprint(response['message']['content'])\n\n# Or directly access the fields of the response object\n\n#print(response.message.content)\n
\n\n

Execute the above code, the output is:

\n\n
I am DeepCoder, a programming AI assistant developed by DeepSeek in China. I can help you answer questions and complete tasks related to computer science. If you have any topics in this area or need to learn or look up information in a specific field, please feel free to ask!\n
\n\n

llama SDK also supports streaming responses. We can enable response streaming transmission by setting stream=True when sending the request.

\n\n

Example

\n\n
from ollama import chat\n\nstream = chat(\n\n model='deepseek-coder',\n\n messages=[{'role': 'user','content': 'Who are you?'}],\n\n stream=True,\n\n)\n\n# Print the response content chunk by chunk\n\nfor chunk in stream:\n\nprint(chunk['message']['content'], end='', flush=True)\n
\n\n
\n\n

Custom Client

\n\n

You can also create a custom client to further control request configurations, such as setting custom headers or specifying the local service URL.

\n\n

Create Custom Client

\n\n

Through Client, you can customize request settings (such as headers, URL, etc.) and send requests.

\n\n

Example

\n\n
from ollama import Client\n\nclient = Client(\n\n host='http://localhost:11434',\n\n headers={'x-some-header': 'some-value'}\n\n)\n\nresponse = client.chat(model='deepseek-coder', messages=[\n\n{\n\n'role': 'user',\n\n'content': 'Who are you?',\n\n},\n\n])\n\nprint(response['message']['content'])\n
\n\n

Asynchronous Client

\n\n

If you want to execute requests asynchronously, you can use the AsyncClient class, which is suitable for scenarios requiring concurrency.

\n\n

Example

\n\n
import asyncio\n\nfrom ollama import AsyncClient\n\nasync def chat():\n\n message ={'role': 'user','content': 'Who are you?'}\n\n response = await AsyncClient().chat(model='deepseek-coder', messages=)\n\nprint(response['message']['content'])\n\nasyncio.run(chat())\n
\n\n

The asynchronous client supports the same features as traditional synchronous requests. The only difference is that requests are executed asynchronously, which can improve performance, especially in high-concurrency scenarios.

\n\n

Asynchronous Streaming Response

\n\n

If you need to handle streaming responses asynchronously, you can achieve this by setting stream=True as an asynchronous generator.

\n\n

Example

\n\n
import asyncio\n\nfrom ollama import AsyncClient\n\nasync def chat():\n\n message ={'role': 'user','content': 'Who are you?'}\n\n async for part in await AsyncClient().chat(model='deepseek-coder', messages=, stream=True):\n\nprint(part['message']['content'], end='', flush=True)\n\nasyncio.run(chat())\n
\n\n

Here, the response will be returned asynchronously part by part, and each part can be processed immediately.

\n\n
\n\n

Common API Methods

\n\n

Ollama Python SDK provides some common API methods for operating and managing models.

\n\n

1. chat method

\n\n

Perform conversation generation with the model

← Nextjs TailwindcssNextjs Pages Router β†’