Ollama Python Usage |
\n\nOllama provides a Python SDK that allows us to interact with locally running models in a Python environment.
\n\nThrough Ollama's Python SDK, you can easily integrate natural language processing tasks into Python projects, perform various operations such as text generation, conversation generation, model management, and more, without manually invoking command line tools.
\n\nInstall Python SDK
\n\nFirst, we need to install Ollama's Python SDK.
\n\nYou can install using pip:
\n\npip install ollama\n\n\nMake sure Python 3.x is installed in your environment, and the network environment can access the Ollama local service.
\n\nStart Local Service
\n\nBefore using the Python SDK, make sure the Ollama local service is already running.
\n\nYou can use the command line tool to start it:
\n\nollama serve\n\n\nAfter starting the local service, the Python SDK will communicate with the local service to perform tasks such as model inference.
\n\nUse Ollama's Python SDK for Inference
\n\nAfter installing the SDK and starting the local service, we can interact with Ollama through Python code.
\n\nFirst, import chat and ChatResponse from the ollama library:
\n\nfrom ollama import chat\nfrom ollama import ChatResponse\n\n\nThrough the Python SDK, you can send requests to the specified model to generate text or conversations:
\n\nExample
\n\nfrom ollama import chat\n\nfrom ollama import ChatResponse\n\nresponse: ChatResponse = chat(model='deepseek-coder', messages=[\n\n{\n\n'role': 'user',\n\n'content': 'Who are you?',\n\n},\n\n])\n\n# Print the response content\n\nprint(response['message']['content'])\n\n# Or directly access the fields of the response object\n\n#print(response.message.content)\n\n\nExecute the above code, the output is:
\n\nI am DeepCoder, a programming AI assistant developed by DeepSeek in China. I can help you answer questions and complete tasks related to computer science. If you have any topics in this area or need to learn or look up information in a specific field, please feel free to ask!\n\n\nllama SDK also supports streaming responses. We can enable response streaming transmission by setting stream=True when sending the request.
\n\nExample
\n\nfrom ollama import chat\n\nstream = chat(\n\n model='deepseek-coder',\n\n messages=[{'role': 'user','content': 'Who are you?'}],\n\n stream=True,\n\n)\n\n# Print the response content chunk by chunk\n\nfor chunk in stream:\n\nprint(chunk['message']['content'], end='', flush=True)\n\n\n\n\n
Custom Client
\n\nYou can also create a custom client to further control request configurations, such as setting custom headers or specifying the local service URL.
\n\nCreate Custom Client
\n\nThrough Client, you can customize request settings (such as headers, URL, etc.) and send requests.
\n\nExample
\n\nfrom ollama import Client\n\nclient = Client(\n\n host='http://localhost:11434',\n\n headers={'x-some-header': 'some-value'}\n\n)\n\nresponse = client.chat(model='deepseek-coder', messages=[\n\n{\n\n'role': 'user',\n\n'content': 'Who are you?',\n\n},\n\n])\n\nprint(response['message']['content'])\n\n\nAsynchronous Client
\n\nIf you want to execute requests asynchronously, you can use the AsyncClient class, which is suitable for scenarios requiring concurrency.
\n\nExample
\n\nimport asyncio\n\nfrom ollama import AsyncClient\n\nasync def chat():\n\n message ={'role': 'user','content': 'Who are you?'}\n\n response = await AsyncClient().chat(model='deepseek-coder', messages=)\n\nprint(response['message']['content'])\n\nasyncio.run(chat())\n\n\nThe asynchronous client supports the same features as traditional synchronous requests. The only difference is that requests are executed asynchronously, which can improve performance, especially in high-concurrency scenarios.
\n\nAsynchronous Streaming Response
\n\nIf you need to handle streaming responses asynchronously, you can achieve this by setting stream=True as an asynchronous generator.
\n\nExample
\n\nimport asyncio\n\nfrom ollama import AsyncClient\n\nasync def chat():\n\n message ={'role': 'user','content': 'Who are you?'}\n\n async for part in await AsyncClient().chat(model='deepseek-coder', messages=, stream=True):\n\nprint(part['message']['content'], end='', flush=True)\n\nasyncio.run(chat())\n\n\nHere, the response will be returned asynchronously part by part, and each part can be processed immediately.
\n\n\n\n
Common API Methods
\n\nOllama Python SDK provides some common API methods for operating and managing models.
\n\n1. chat method
\n\nPerform conversation generation with the model
YouTip