ollama
llms: ollama
An ollama object interface with a running ollama server.
ollama is a scalar handle class object, which allows communication
with an ollama server running either locally or across a network. The
heavy lifting, interfacing with the ollama API, is done by the compiled
__ollama__ function, which should not be called directly.
An ollama interface object should be considered as a session to the ollama server by holding any user defined settings along with the chat history (if opted) and other custom parameters to be parsed to the LLM model during inference. You can initialize several ollama interface objects pointing to the same ollama server and use them concurently to implement more complex schemes such as RAG, custom tooling, etc.
See also: calendarDuration, datetime
Source Code: ollama
Displays the models that are currently loaded in ollama server’s memory.
Specifies the inference mode that the ollama interface object will use
to send requests to the ollama server. Currently, only 'query'
and 'chat' modes are implemented.
Specifies the network IP address and the port at which the ollama interface object is connected to.
Displays the models that are currently available in ollama server.
Contains various metrics about the last processed request and the response returned from the ollama server.
Contains an cell array with the history of user prompts,
images (if any), and models response for a given chat session. The first
column contains character vectors with the user’s prompts, the second
column contains a nested cell array with any images attached to the
corresponding user prompt (otherwise it is empty), and the third column
contains the model’s responses. By default, chatHistory is an
empty cell array, and it is only populated while in 'chat' mode.
The name of the model that will be used for generating the response to the next user request. This is empty upon construction and it must be specified before requesting any inference from the ollama server.
The time in seconds that the ollama interface object will wait for a server response before closing the connection with an error.
The time in seconds that the ollama interface object will wait for a request to be successfully sent to the server before closing the connection with an error.
A structure containing fields as optional parameters to be passed to a
model for inference at runtime. By default, this is an empty structure,
in which case the model utilizes its default parameters as specified in
the respective model file in the ollama server. See the
setOptions method for more information about the custom parameters
you can specify.
A character vector containing the system message, which may be used to
provide crucial context and instructions that guide how the model behaves
during your interactions. By default, systemMessage = 'default',
in which case the model utilizes its default system prompt as specified
in the respective model file in the ollama server. Specifying a system
message results in the query or chat methods parsing the
customized system message to the model in every interaction. The system
message cannot be modified during a chat session. Use dot notation to
access and/or modify the default value of the system message.
A logical scalar or a character vector specifying the thinking status of
the active model. By default, the thiking status is set to true
for capable models and to false for models that do not support
thiking capabilities. In special cases, where models support categorical
states of thinking capabilities (such ass the GPT-OSS model family), then
you must specify the thinking status of your choice explicitly, because
the default true value is ignored by the ollama server. Unless
an active model is set, the thinking property is empty. Use dot
notation to access and/or modify the default value of the thinking flag.
Setting a value to the thinking flag when there is no active model or for an active model that does not have thinking capabilities results to an error.
A toolFunction object or a toolRegistry object (merely an
indexed collection of toolFunction objects), which are available to the
active model explicitly during chat sessions. By default, no tools are
available. Unless an active model capable of tool calling is set, the
tools property is empty. Moreover, tools can only be assigned
when the active model is tool capable. Use dot notation to access and/or
assign a toolFunction object or a toolRegistry object.
A logical scalar specifying whether to display thinking text or not. It
only applies when thinking is enabled and no output argument is
requested from query and chat methods. It also applies to
the showHistory method, when model responses contain thinking
text. By default, muteThinking is true. Use dot notation
to access and/or modify the default value.
ollama: llm = ollama (serverURL)
ollama: llm = ollama (serverURL, model)
ollama: llm = ollama (serverURL, model, mode)
llm = ollama (serverURL) creates an ollama interface
object, which allows communication with an ollama server accesible at
serverURL, which must be a character vector specifying a uniform
resource locator (URL). If serverURL is empty or ollama is
called without any input arguments, then it defaults to
http://localhost:11434.
llm = ollama (serverURL, model) also specifies
the active model of the ollama interface llm which will be used for
inference. model must be a character vector specifying an existing
model at the ollama server. If the requested model is not available, a
warning is emitted and no model is set active, which is the default
behavior when ollama is called with fewer arguments. An active
model is mandatory before starting any communication with the ollama
server. Use the listModels class method to see the all models
available in the server instance that llm is interfacing with. Use
the loadModel method to set an active model in an ollama interface
object that has been already created.
llm = ollama (serverURL, model, mode) also
specifies the inference mode of the ollama interface. mode can be
specified as 'query', for generating responses to single prompts,
'chat', for starting a conversation with a model by retaining the
entire chat history during inference, and 'embed' for generating
embedings for given prompts. By default, the ollama interface is
initialized in query mode, unless an embedding model has ben requested,
in which case it defaults to embedding mode. 'embed' is only
valid for embedding models, otherwise ollama returns an error.
Loading an embedding model overrides any value specified in mode.
See also: fig2base64
ollama: list = listModels (llm)
ollama: list = listModels (llm, outtype)
ollama: listModels (…)
list = listModels (llm) returns a cell array of
character vectors in list with the names of the models, which are
available on the ollama server that llm interfaces with. This is
equivalent to accessing the availableModels property with the
syntax list = llm.availableModels.
list = listModels (lllm, outtype) also specifies
the data type of the output argument list. outtype must be a
character vector with any of the following options:
'cellstr' (default) returns list as a cell array of
character vectors. Use this option to see available models for selecting
an active model for inference.
'json' returns list as a character vector containing
the json string response returned from the ollama server. Use this
option if you want to access all the details about the models available
in the ollama server.
'table' returns list as a table with the most
important information about the available models in specific table
variables.
listModels (…) will display the output requested according
to the previous syntaxes to the standard output instead of returning it
to an output argument. This syntax is not valid for the 'json'
option, which requires an output argument.
ollama: list = listRunningModels (llm)
ollama: list = listRunningModels (llm, outtype)
ollama: listRunningModels (…)
list = listRunningModels (llm) returns a cell array of
character vectors in list with the names of the models, which are
currently loaded in memory at the ollama server that llm interfaces
with. This is equivalent to accessing the runningModels property
with the syntax list = llm.runningModels.
list = listRunningModels (lllm, outtype) also
specifies the data type of the output argument list. outtype
must be a character vector with any of the following options:
'cellstr' (default) returns list as a cell array of
character vectors. Use this option to see which models are currently
running on the ollama server for better memory management.
'json' returns list as a character vector containing
the json string response returned from the ollama server. Use this
option if you want to access all the details about currnently running
models.
'table' returns list as a table with the most
important information about the currently running models in specific
table variables.
listModels (…) will display the output requested according
to the previous syntaxes to the standard output instead of returning it
to an output argument. This syntax is not valid for the 'json'
option, which requires an output argument.
ollama: copyModel (llm, source, target)
copyModel (llm, source, target) copies the model
specified by source into a new model named after target in
the ollama server interfaced by llm. Both source and
target must be character vectors, and source must specify an
existing model in the ollama server. If successful, the available models
in the llm.availableModels property are updated, otherwise,
an error is returned.
Alternatively, source may also be an integer scalar value indexing
an existing model in llm.availableModels.
ollama: deleteModel (llm, target)
deleteModel (llm, target) deletes the model specified
by target in the ollama server interfaced by llm.
source can be either a character vector with the name of the model
or an integer scalar value indexing an existing model in
llm.availableModels. If successful, the available models
in the llm.availableModels property are updated, otherwise,
an error is returned.
ollama: loadModel (llm, target)
loadModel (llm, target) loads the model specified by
target in the ollama server interfaced by llm. This syntax
is equivalent to assigning a value to the activeModel property as
in llm.activeModel = target. If successful,
the specified model is also set as the active model for inference in the
llm.activeModel property. target can be either a
character vector with the name of the model or an integer scalar value
indexing an existing model in llm.availableModels.
If loading a model fails, an error message is returned and the properties
activeModel, thinking, and tools are reset to
their default values.
You can load multiple models conncurently and you are only limited by the hardware specifications of the ollama server, which llm interfaces with. However, since each time a new model is loaded it is also set as the active mode for inference, keep in mind that only a single model can be set active at a time for a given ollama interface object. The active model for for inference will always be the latest loaded model.
ollama: unloadModel (llm, target)
unloadModel (llm, target) unloads the model specified
by target from memory of the ollama server interfaced by llm.
target can be either a character vector with the name of the model
or an integer scalar value indexing an existing model in
llm.availableModels. Use this method to free resources in
the ollama server. By default, the ollama server unloads any idle model
from memory after five minutes, unless otherwise instructed.
If the model you unload is also the active model in the ollama interface
object, then the activeModel property is also cleared. You need
to set an active model before inference.
ollama: pullModel (llm, target)
pullModel (llm, target) downloads the model specified
by target from the ollama library into the ollama server interfaced
by llm. If successful, the model is appended to list of available
models in the llm.availableModels property. target
must be a character vector.
ollama: setOptions (llm, name, value)
setOptions (llm, name, value) sets custom
options to be parsed to the ollama server in order to tailor the
behavior of the model according to specific needs. The options must be
specified as name, value paired arguments, where name
is a character vector naming the option to be customized, and value
can be either numeric or logical scalars depending on the values each
option requires.
The following options may be customized in any order as paired input arguments.
| name | value | description | ||
|---|---|---|---|---|
'num_keep' | integer | Specifies how many of the most recent tokens or responses should be kept in memory for generating the next output. Higher values can improve relevance of the generated text by providing more context. | ||
'seed' | integer | Controls the randomness of token selection during text generation so that similar responses are reproduced for the same requests. | ||
'num_predict' | integer | Specifies the maximum number of tokens to predict when geneerating text. | ||
'top_k' | integer | Limits the number of possible choices for each next token when generating responses by specifying how many of the most likely options to consider. | ||
'top_p' | double | Sets the cumulative probability for nucleus sampling. It must be in the range . | ||
'min_p' | double | Adjusts the sampling threshold in accordance with the model’s confidence. Specifically, it scales the probability threshold based on the top token’s probability, allowing the model to focus on high-confidence tokens when certain, and to consider a broader range of tokens when less confident. It must be in the range . | ||
'typical_p' | double | Controls how conventional or creative the responses from a language model will be. A higher typical_p value results in more expected and standard responses, while a lower value allows for more unusual and creative outputs. It must be in the range . | ||
'repeat_last_n' | integer | Defines how far back the model looks to avoid repetition. | ||
'temperature' | double | Controls the randomness of the generated out by determining how the model leverages the raw likelihoods of the tokens under consideration for the next words in a sequence. It ranges from 0 to 2 with higher values corresponding to more chaotic output. | ||
'repeat_penalty' | double | Adjusts the penalty for repeated phrases; higher values discourage repetition. | ||
'presence_penalty' | double | Controls the diversity of the generated text by penalizing new tokens based on whether they appear in the text so far. | ||
'frequency_penalty' | double | Controls how often the same words should be repeated in the generated text. | ||
'penalize_newline' | logical | Discourages the model from generating newlines in its responses. | ||
'numa' | logical | Allows for non-uniform memory access to enhance performance. This can significantly improve processing speeds on multi-CPU systems. | ||
'num_ctx' | integer | Sets the context window length (in tokens) determining how much previous text the model considers. This should be kept in mind especially in chat seesions. | ||
'num_batch' | integer | Controls the number of input samples processed in a single batch during model inference. Reducing this value can help prevent out-of-memory (OOM) errors when working with large models. | ||
'num_gpu' | integer | Specifies the number of GPU devices to use for computation. | ||
'main_gpu' | integer | Specified which GPU device to use for inference. | ||
'use_mmap' | logical | Allows for memory-mapped file access, which can improve performance by enabling faster loading of model weights from disk. | ||
'num_thread' | integer | Specifies the number of threads to use during model generation, allowing you to optimize performance based on your CPU’s capabilities. |
Specified customized options are preserved in the ollama interface object
for all subsequent requests for inference until they are altered or reset
to the model’s default value by removing them. To remove a custom option
pass an empty value to the name, value paired argument, as in
setOptions (llm, 'seed', []).
Use the showOptions method to display any custom options that may
be currently set in the ollama interface object. Alternatively, you can
retrieve the custom options as a structure through the options
property as in opts = llm.options, where each field in
opts refers to a custom property If no custom options are set,
then opts is an empty structure.
You can also set or clear a single custom option with direct assignment
to the options property of the ollama inteface object by passing
the name, value paired argument as a 2-element cell array.
The equivalent syntax of setOptions (llm, 'seed', [] is
llm.options = {'seed', []}.
ollama: showOptions (llm)
showOptions (llm) displays any custom options that may be
specified in the ollama inteface object llm.
ollama: query (llm, prompt)
ollama: query (llm, prompt, image)
ollama: txt = query (…)
ollama: query (llm)
query (llm, prompt) uses the "api/generate"
API end point to make a request to the ollama server interfaced by
llm to generate text based on the user’s input specified in
prompt, which must be a character vector. When no output argument
is requested, query prints the response text in the standard
output (command window) with a custom display method so that words are
not split between lines depending on the terminal size. If an output
argument is requested, the text is returned as a character vector and
nothing gets displayed in the terminal.
query (llm, prompt, image) also specifies an
image or multiple images to be passed to the model along with the user’s
prompt. For a single image, image must be a character vector
specifying either the filename of an image or a base64 encoded image.
query distinguishes between the two by scanning image for
a period character ('.'), which is commonly used as a separator
between base-filename and extension, but it is an invalid character for
base64 encoded strings. For multiple images, image must be a cell
array of character vectors explicitly containing either multiple
filenames or mulitple base64 encoded string representations of images.
txt = query (…) returns the generated text to the
output argument txt instead of displaying it to the terminal for
any of the previous syntaxes.
query (llm) does not make a request to the ollama server,
but it sets the 'mode' property in the ollama interface object
llm to 'query' for future requests. Use this syntax to
switch from another inteface mode to query mode without making a request
to the server.
An alternative method of calling the query method is by using
direct subscripted reference to the ollama interface object llm as
long as it already set in query mode. The table below lists the
equivalent syntaxes.
| method calling | object subscripted reference | |
|---|---|---|
query (llm, prompt) | llm(prompt) | |
query (llm, prompt, image) | llm(prompt, image) | |
query (llm, prompt, image) | llm(prompt, image) | |
txt = query (llm, prompt) | txt = llm(prompt) | |
txt = query (llm, prompt, image) | txt = llm(prompt, image) |
ollama: chat (llm, prompt)
ollama: chat (llm, prompt, image)
ollama: chat (llm, {tool_output})
ollama: txt = chat (…)
ollama: chat (llm)
chat (llm, prompt) uses the "api/chat" API
end point to make a request to the ollama server interfaced by
llm to generate text based on the user’s input specified in
prompt along with all previous requests and responses, made by the
user and models during the same chat session, which is stored in the
'chatHistory' property of the ollama interface object llm.
prompt must a character vector specifying the content in the user’s
message parsed in the request as "role":"user". When no output
argument is requested, chat prints the response text in the
standard output (command window) with a custom display method so that
words are not split between lines depending on the terminal size. If an
output argument is requested, the text is returned as a character vector
and nothing gets displayed in the terminal. In either case, the response
text is appended to the history chat, which can be displayed with the
showHistory method or return as a cell array from
llm.chatHistory. If you want to start a new chat session,
you can either clear the chat history with the clearHistory method
or create a new ollama interface object.
chat (llm, prompt, image) also specifies an
image or multiple images to be passed to the model along with the user’s
prompt. For a single image, image must be a character vector
specifying either the filename of an image or a base64 encoded image.
chat distinguishes between the two by scanning image for
a period character ('.'), which is commonly used as a separator
between base-filename and extension, but it is an invalid character for
base64 encoded strings. For multiple images, image must be a cell
array of character vectors, which can contain both multiple filenames and
mulitple base64 encoded string representations of images. Any images
supplied along with a prompt during a chat session are also stored in the
chat history.
chat (llm, tool_output) syntax may be used to pass the
output results of a single toolFunction object or mulitple
toolFunction objects contained in a toolRegistry, which
have been evaluated after a previous "tool_calls" request by the
model, to the next message. This syntax requires the tool_output
input argument to be a cell array of character vectors, in
which the first column contains the output of each evaluated
toolFunction object and the second column contains its respective
function name. Each row in tool_output corresponds to a separate
function, when multiple toolFunction objects have been evaluated.
txt = chat (…) returns the generated text to the
output argument txt instead of displaying it to the terminal for
any of the previous syntaxes. If thinking is enabled, then txt is
a cell array of character vectors with the first element
containing the final answer and the second element the thinking process.
chat (llm) does not make a request to the ollama server,
but it sets the 'mode' property in the ollama interface object
llm to 'chat' for future requests. Use this syntax to
switch from another inteface mode to chat mode without making a request
to the server. Switching to chat mode does not clear any existing chat
history in llm.
An alternative method of calling the chat method is by using
direct subscripted reference to the ollama interface object llm as
long as it already set in chat mode. The table below lists the
equivalent syntaxes.
method calling | object subscripted reference
| |
|---|---|---|
| | |
| | |
| | |
| | |
| |
ollama: vectors = embed (llm, input)
ollama: vectors = embed (llm, input, dims)
vectors = embed (llm, input) generates embedding
vectors corresponding to the user’s input, which can either
be a character vector or a cell array of character vectors. By default,
when input is a character vector, vectors is a row vector
with its length specified by the model’s default values, whereas if
input is a cell array of character vectors, then vectors is a
matrix with each row corresponding to a linearly indexed element of the
cell array.
vectors = embed (llm, input, dims) also
specifies the length of the generated embedding vectors. dims must
be a positive integer value, which overrides the default settings of the
embedding model.
ollama: showStats (llm)
showStats (llm) displays the response statistics of the last
response returned from the ollama server intefaced by llm. The
type of request (e.g. query, chat, embed) does not alter the displayed
statistics, which include the following parameters:
ollama: showHistory (llm)
ollama: showHistory (llm, 'all')
ollama: showHistory (llm, 'last')
ollama: showHistory (llm, 'first')
ollama: showHistory (llm, idx)
showHistory (llm) displays the entire chat history stored in
the ollama interface object llm. The chat history is displayed in
chronological order alternating between user’s requests and the model’s
responses. For any user’s request that contained images, the filenames
or the number of images (in case of base64 encoded images) are also
listed below the corresponding request and before the subsequent
response.
showHistory (llm), is exactly the same as
'all'showHistory (llm).
showHistory (llm), displays only the last
user-model interaction of the current chat session.
'last'
showHistory (llm), displays only the first
user-model interaction of the current chat session.
'first'
showHistory (llm), idx displays the user-model
interactions specified by idx, which must be a scalar or a vector
of integer values indexing the rows of the cell array
comprising the chatHistory property in llm.
showHistory is explicitly used for displaying the chat history and
does not return any output argument. If you want to retrieve the chat
history in a cell array, you can access the chatHistory property
directly, as in hdata = llm.chatHistory.
ollama: clearHistory (llm)
clearHistory (llm) deletes the entire chat history in the
ollama interface object llm. Use this method to initialize a new
chat session.
clearHistory (llm), is exactly the same as
'all'clearHistory (llm).
clearHistory (llm), deletes the last
user-model interaction from the current chat session. Use this option if
you want to rephrase or modify the last request without clear the entire
chat history.
'last'
showHistory (llm), removes only the first
user-model interaction from the current chat session. Use this option if
you want to discard the initial user-model interaction in order to
experiment with the model’s context size.
'first'
showHistory (llm), idx deletes the user-model
interactions specified by idx, which must be a scalar or a vector
of integer values indexing the rows of the cell array
comprising the chatHistory property in llm.
Note that selectively deleting user-model interactions from the chat history also removes any images that may be integrated with the selected requests.