Large Language Models for GNU Octave.: ollama

Categories &

Functions List

interface class

ollama

tool classes

supplementary functions

fig2base64

backend functions

__ollama__

Class Definition: `ollama`

llms: ollama

An ollama object interface with a running ollama server.

ollama is a scalar handle class object, which allows communication with an ollama server running either locally or across a network. The heavy lifting, interfacing with the ollama API, is done by the compiled __ollama__ function, which should not be called directly.

An ollama interface object should be considered as a session to the ollama server by holding any user defined settings along with the chat history (if opted) and other custom parameters to be parsed to the LLM model during inference. You can initialize several ollama interface objects pointing to the same ollama server and use them concurently to implement more complex schemes such as RAG, custom tooling, etc.

See also: calendarDuration, datetime

Source Code: ollama

Properties

Displays the models that are currently loaded in ollama server’s memory.

Specifies the inference mode that the ollama interface object will use to send requests to the ollama server. Currently, only 'query' and 'chat' modes are implemented.

Specifies the network IP address and the port at which the ollama interface object is connected to.

Displays the models that are currently available in ollama server.

Contains various metrics about the last processed request and the response returned from the ollama server.

Contains an $N×3$ cell array with the history of user prompts, images (if any), and models response for a given chat session. The first column contains character vectors with the user’s prompts, the second column contains a nested cell array with any images attached to the corresponding user prompt (otherwise it is empty), and the third column contains the model’s responses. By default, chatHistory is an empty cell array, and it is only populated while in 'chat' mode.

The name of the model that will be used for generating the response to the next user request. This is empty upon construction and it must be specified before requesting any inference from the ollama server.

The time in seconds that the ollama interface object will wait for a server response before closing the connection with an error.

The time in seconds that the ollama interface object will wait for a request to be successfully sent to the server before closing the connection with an error.

A structure containing fields as optional parameters to be passed to a model for inference at runtime. By default, this is an empty structure, in which case the model utilizes its default parameters as specified in the respective model file in the ollama server. See the setOptions method for more information about the custom parameters you can specify.

A character vector containing the system message, which may be used to provide crucial context and instructions that guide how the model behaves during your interactions. By default, systemMessage = 'default', in which case the model utilizes its default system prompt as specified in the respective model file in the ollama server. Specifying a system message results in the query or chat methods parsing the customized system message to the model in every interaction. The system message cannot be modified during a chat session. Use dot notation to access and/or modify the default value of the system message.

A logical scalar or a character vector specifying the thinking status of the active model. By default, the thiking status is set to true for capable models and to false for models that do not support thiking capabilities. In special cases, where models support categorical states of thinking capabilities (such ass the GPT-OSS model family), then you must specify the thinking status of your choice explicitly, because the default true value is ignored by the ollama server. Unless an active model is set, the thinking property is empty. Use dot notation to access and/or modify the default value of the thinking flag.

Setting a value to the thinking flag when there is no active model or for an active model that does not have thinking capabilities results to an error.

A toolFunction object or a toolRegistry object (merely an indexed collection of toolFunction objects), which are available to the active model explicitly during chat sessions. By default, no tools are available. Unless an active model capable of tool calling is set, the tools property is empty. Moreover, tools can only be assigned when the active model is tool capable. Use dot notation to access and/or assign a toolFunction object or a toolRegistry object.

A logical scalar specifying whether to display thinking text or not. It only applies when thinking is enabled and no output argument is requested from query and chat methods. It also applies to the showHistory method, when model responses contain thinking text. By default, muteThinking is true. Use dot notation to access and/or modify the default value.

Methods

ollama: llm = ollama (serverURL)
ollama: llm = ollama (serverURL, model)
ollama: llm = ollama (serverURL, model, mode)

llm = ollama (serverURL) creates an ollama interface object, which allows communication with an ollama server accesible at serverURL, which must be a character vector specifying a uniform resource locator (URL). If serverURL is empty or ollama is called without any input arguments, then it defaults to http://localhost:11434.

llm = ollama (serverURL, model) also specifies the active model of the ollama interface llm which will be used for inference. model must be a character vector specifying an existing model at the ollama server. If the requested model is not available, a warning is emitted and no model is set active, which is the default behavior when ollama is called with fewer arguments. An active model is mandatory before starting any communication with the ollama server. Use the listModels class method to see the all models available in the server instance that llm is interfacing with. Use the loadModel method to set an active model in an ollama interface object that has been already created.

llm = ollama (serverURL, model, mode) also specifies the inference mode of the ollama interface. mode can be specified as 'query', for generating responses to single prompts, 'chat', for starting a conversation with a model by retaining the entire chat history during inference, and 'embed' for generating embedings for given prompts. By default, the ollama interface is initialized in query mode, unless an embedding model has ben requested, in which case it defaults to embedding mode. 'embed' is only valid for embedding models, otherwise ollama returns an error. Loading an embedding model overrides any value specified in mode.

See also: fig2base64

ollama: list = listModels (llm)
ollama: list = listModels (llm, outtype)
ollama: listModels (…)

list = listModels (llm) returns a cell array of character vectors in list with the names of the models, which are available on the ollama server that llm interfaces with. This is equivalent to accessing the availableModels property with the syntax list = llm.availableModels.

list = listModels (lllm, outtype) also specifies the data type of the output argument list. outtype must be a character vector with any of the following options:

'cellstr' (default) returns list as a cell array of character vectors. Use this option to see available models for selecting an active model for inference.
'json' returns list as a character vector containing the json string response returned from the ollama server. Use this option if you want to access all the details about the models available in the ollama server.
'table' returns list as a table with the most important information about the available models in specific table variables.

listModels (…) will display the output requested according to the previous syntaxes to the standard output instead of returning it to an output argument. This syntax is not valid for the 'json' option, which requires an output argument.

ollama: list = listRunningModels (llm)
ollama: list = listRunningModels (llm, outtype)
ollama: listRunningModels (…)

list = listRunningModels (llm) returns a cell array of character vectors in list with the names of the models, which are currently loaded in memory at the ollama server that llm interfaces with. This is equivalent to accessing the runningModels property with the syntax list = llm.runningModels.

list = listRunningModels (lllm, outtype) also specifies the data type of the output argument list. outtype must be a character vector with any of the following options:

'cellstr' (default) returns list as a cell array of character vectors. Use this option to see which models are currently running on the ollama server for better memory management.
'json' returns list as a character vector containing the json string response returned from the ollama server. Use this option if you want to access all the details about currnently running models.
'table' returns list as a table with the most important information about the currently running models in specific table variables.

ollama: copyModel (llm, source, target)

copyModel (llm, source, target) copies the model specified by source into a new model named after target in the ollama server interfaced by llm. Both source and target must be character vectors, and source must specify an existing model in the ollama server. If successful, the available models in the llm.availableModels property are updated, otherwise, an error is returned.

Alternatively, source may also be an integer scalar value indexing an existing model in llm.availableModels.

ollama: deleteModel (llm, target)

deleteModel (llm, target) deletes the model specified by target in the ollama server interfaced by llm. source can be either a character vector with the name of the model or an integer scalar value indexing an existing model in llm.availableModels. If successful, the available models in the llm.availableModels property are updated, otherwise, an error is returned.

ollama: loadModel (llm, target)

loadModel (llm, target) loads the model specified by target in the ollama server interfaced by llm. This syntax is equivalent to assigning a value to the activeModel property as in llm.activeModel = target. If successful, the specified model is also set as the active model for inference in the llm.activeModel property. target can be either a character vector with the name of the model or an integer scalar value indexing an existing model in llm.availableModels.

If loading a model fails, an error message is returned and the properties activeModel, thinking, and tools are reset to their default values.

You can load multiple models conncurently and you are only limited by the hardware specifications of the ollama server, which llm interfaces with. However, since each time a new model is loaded it is also set as the active mode for inference, keep in mind that only a single model can be set active at a time for a given ollama interface object. The active model for for inference will always be the latest loaded model.

ollama: unloadModel (llm, target)

unloadModel (llm, target) unloads the model specified by target from memory of the ollama server interfaced by llm. target can be either a character vector with the name of the model or an integer scalar value indexing an existing model in llm.availableModels. Use this method to free resources in the ollama server. By default, the ollama server unloads any idle model from memory after five minutes, unless otherwise instructed.

If the model you unload is also the active model in the ollama interface object, then the activeModel property is also cleared. You need to set an active model before inference.

ollama: pullModel (llm, target)

pullModel (llm, target) downloads the model specified by target from the ollama library into the ollama server interfaced by llm. If successful, the model is appended to list of available models in the llm.availableModels property. target must be a character vector.

ollama: setOptions (llm, name, value)

setOptions (llm, name, value) sets custom options to be parsed to the ollama server in order to tailor the behavior of the model according to specific needs. The options must be specified as name, value paired arguments, where name is a character vector naming the option to be customized, and value can be either numeric or logical scalars depending on the values each option requires.

The following options may be customized in any order as paired input arguments.

`name`	`value`	`description`
`'num_keep'`	integer	Specifies how many of the most recent tokens or responses should be kept in memory for generating the next output. Higher values can improve relevance of the generated text by providing more context.
`'seed'`	integer	Controls the randomness of token selection during text generation so that similar responses are reproduced for the same requests.
`'num_predict'`	integer	Specifies the maximum number of tokens to predict when geneerating text.
`'top_k'`	integer	Limits the number of possible choices for each next token when generating responses by specifying how many of the most likely options to consider.
`'top_p'`	double	Sets the cumulative probability for nucleus sampling. It must be in the range $[0,1]$ .
`'min_p'`	double	Adjusts the sampling threshold in accordance with the model’s confidence. Specifically, it scales the probability threshold based on the top token’s probability, allowing the model to focus on high-confidence tokens when certain, and to consider a broader range of tokens when less confident. It must be in the range $[0,1]$ .
`'typical_p'`	double	Controls how conventional or creative the responses from a language model will be. A higher typical_p value results in more expected and standard responses, while a lower value allows for more unusual and creative outputs. It must be in the range $[0,1]$ .
`'repeat_last_n'`	integer	Defines how far back the model looks to avoid repetition.
`'temperature'`	double	Controls the randomness of the generated out by determining how the model leverages the raw likelihoods of the tokens under consideration for the next words in a sequence. It ranges from 0 to 2 with higher values corresponding to more chaotic output.
`'repeat_penalty'`	double	Adjusts the penalty for repeated phrases; higher values discourage repetition.
`'presence_penalty'`	double	Controls the diversity of the generated text by penalizing new tokens based on whether they appear in the text so far.
`'frequency_penalty'`	double	Controls how often the same words should be repeated in the generated text.
`'penalize_newline'`	logical	Discourages the model from generating newlines in its responses.
`'numa'`	logical	Allows for non-uniform memory access to enhance performance. This can significantly improve processing speeds on multi-CPU systems.
`'num_ctx'`	integer	Sets the context window length (in tokens) determining how much previous text the model considers. This should be kept in mind especially in chat seesions.
`'num_batch'`	integer	Controls the number of input samples processed in a single batch during model inference. Reducing this value can help prevent out-of-memory (OOM) errors when working with large models.
`'num_gpu'`	integer	Specifies the number of GPU devices to use for computation.
`'main_gpu'`	integer	Specified which GPU device to use for inference.
`'use_mmap'`	logical	Allows for memory-mapped file access, which can improve performance by enabling faster loading of model weights from disk.
`'num_thread'`	integer	Specifies the number of threads to use during model generation, allowing you to optimize performance based on your CPU’s capabilities.

Specified customized options are preserved in the ollama interface object for all subsequent requests for inference until they are altered or reset to the model’s default value by removing them. To remove a custom option pass an empty value to the name, value paired argument, as in setOptions (llm, 'seed', []).

Use the showOptions method to display any custom options that may be currently set in the ollama interface object. Alternatively, you can retrieve the custom options as a structure through the options property as in opts = llm.options, where each field in opts refers to a custom property If no custom options are set, then opts is an empty structure.

You can also set or clear a single custom option with direct assignment to the options property of the ollama inteface object by passing the name, value paired argument as a 2-element cell array. The equivalent syntax of setOptions (llm, 'seed', [] is llm.options = {'seed', []}.

ollama: showOptions (llm)

showOptions (llm) displays any custom options that may be specified in the ollama inteface object llm.

ollama: query (llm, prompt)
ollama: query (llm, prompt, image)
ollama: txt = query (…)
ollama: query (llm)

query (llm, prompt) uses the "api/generate" API end point to make a request to the ollama server interfaced by llm to generate text based on the user’s input specified in prompt, which must be a character vector. When no output argument is requested, query prints the response text in the standard output (command window) with a custom display method so that words are not split between lines depending on the terminal size. If an output argument is requested, the text is returned as a character vector and nothing gets displayed in the terminal.

query (llm, prompt, image) also specifies an image or multiple images to be passed to the model along with the user’s prompt. For a single image, image must be a character vector specifying either the filename of an image or a base64 encoded image. query distinguishes between the two by scanning image for a period character ('.'), which is commonly used as a separator between base-filename and extension, but it is an invalid character for base64 encoded strings. For multiple images, image must be a cell array of character vectors explicitly containing either multiple filenames or mulitple base64 encoded string representations of images.

txt = query (…) returns the generated text to the output argument txt instead of displaying it to the terminal for any of the previous syntaxes.

query (llm) does not make a request to the ollama server, but it sets the 'mode' property in the ollama interface object llm to 'query' for future requests. Use this syntax to switch from another inteface mode to query mode without making a request to the server.

An alternative method of calling the query method is by using direct subscripted reference to the ollama interface object llm as long as it already set in query mode. The table below lists the equivalent syntaxes.

`method calling`		`object subscripted reference`
`query (llm, prompt)`		`llm(prompt)`
`query (llm, prompt, image)`		`llm(prompt, image)`
`query (llm, prompt, image)`		`llm(prompt, image)`
`txt = query (llm, prompt)`		`txt = llm(prompt)`
`txt = query (llm, prompt, image)`		`txt = llm(prompt, image)`

ollama: chat (llm, prompt)
ollama: chat (llm, prompt, image)
ollama: chat (llm, {tool_output})
ollama: txt = chat (…)
ollama: chat (llm)

chat (llm, prompt) uses the "api/chat" API end point to make a request to the ollama server interfaced by llm to generate text based on the user’s input specified in prompt along with all previous requests and responses, made by the user and models during the same chat session, which is stored in the 'chatHistory' property of the ollama interface object llm. prompt must a character vector specifying the content in the user’s message parsed in the request as "role":"user". When no output argument is requested, chat prints the response text in the standard output (command window) with a custom display method so that words are not split between lines depending on the terminal size. If an output argument is requested, the text is returned as a character vector and nothing gets displayed in the terminal. In either case, the response text is appended to the history chat, which can be displayed with the showHistory method or return as a cell array from llm.chatHistory. If you want to start a new chat session, you can either clear the chat history with the clearHistory method or create a new ollama interface object.

chat (llm, prompt, image) also specifies an image or multiple images to be passed to the model along with the user’s prompt. For a single image, image must be a character vector specifying either the filename of an image or a base64 encoded image. chat distinguishes between the two by scanning image for a period character ('.'), which is commonly used as a separator between base-filename and extension, but it is an invalid character for base64 encoded strings. For multiple images, image must be a cell array of character vectors, which can contain both multiple filenames and mulitple base64 encoded string representations of images. Any images supplied along with a prompt during a chat session are also stored in the chat history.

chat (llm, tool_output) syntax may be used to pass the output results of a single toolFunction object or mulitple toolFunction objects contained in a toolRegistry, which have been evaluated after a previous "tool_calls" request by the model, to the next message. This syntax requires the tool_output input argument to be a $N×2$ cell array of character vectors, in which the first column contains the output of each evaluated toolFunction object and the second column contains its respective function name. Each row in tool_output corresponds to a separate function, when multiple toolFunction objects have been evaluated.

txt = chat (…) returns the generated text to the output argument txt instead of displaying it to the terminal for any of the previous syntaxes. If thinking is enabled, then txt is a $2×1$ cell array of character vectors with the first element containing the final answer and the second element the thinking process.

chat (llm) does not make a request to the ollama server, but it sets the 'mode' property in the ollama interface object llm to 'chat' for future requests. Use this syntax to switch from another inteface mode to chat mode without making a request to the server. Switching to chat mode does not clear any existing chat history in llm.

An alternative method of calling the chat method is by using direct subscripted reference to the ollama interface object llm as long as it already set in chat mode. The table below lists the equivalent syntaxes.

`method calling`		`object subscripted reference`
`chat (llm, prompt)`		`llm(prompt)`
`chat (llm, prompt, image)`		`llm(prompt, image)`
`chat (llm, prompt, image)`		`llm(prompt, image)`
`txt = chat (llm, prompt)`		`txt = llm(prompt)`
`txt = chat (llm, prompt, image)`		`txt = llm(prompt, image)`

ollama: vectors = embed (llm, input)
ollama: vectors = embed (llm, input, dims)

vectors = embed (llm, input) generates embedding vectors corresponding to the user’s input, which can either be a character vector or a cell array of character vectors. By default, when input is a character vector, vectors is a row vector with its length specified by the model’s default values, whereas if input is a cell array of character vectors, then vectors is a matrix with each row corresponding to a linearly indexed element of the cell array.

vectors = embed (llm, input, dims) also specifies the length of the generated embedding vectors. dims must be a positive integer value, which overrides the default settings of the embedding model.

ollama: showStats (llm)

showStats (llm) displays the response statistics of the last response returned from the ollama server intefaced by llm. The type of request (e.g. query, chat, embed) does not alter the displayed statistics, which include the following parameters:

total duration: the total time in seconds to process the request and return the response.
load duration: the time in seconds to load the user’s request into the model.
evaluation duration: the time in seconds for the model to generate the response base on the user’s request.
prompt count: the number of tokens comprising the user’s request.
evaluation count: the number of tokens comprising the model’s response.

ollama: showHistory (llm)
ollama: showHistory (llm, 'all')
ollama: showHistory (llm, 'last')
ollama: showHistory (llm, 'first')
ollama: showHistory (llm, idx)

showHistory (llm) displays the entire chat history stored in the ollama interface object llm. The chat history is displayed in chronological order alternating between user’s requests and the model’s responses. For any user’s request that contained images, the filenames or the number of images (in case of base64 encoded images) are also listed below the corresponding request and before the subsequent response.

showHistory (llm), 'all' is exactly the same as showHistory (llm).

showHistory (llm), 'last' displays only the last user-model interaction of the current chat session.

showHistory (llm), 'first' displays only the first user-model interaction of the current chat session.

showHistory (llm), idx displays the user-model interactions specified by idx, which must be a scalar or a vector of integer values indexing the rows of the $N×3$ cell array comprising the chatHistory property in llm.

showHistory is explicitly used for displaying the chat history and does not return any output argument. If you want to retrieve the chat history in a cell array, you can access the chatHistory property directly, as in hdata = llm.chatHistory.

ollama: clearHistory (llm)

clearHistory (llm) deletes the entire chat history in the ollama interface object llm. Use this method to initialize a new chat session.

clearHistory (llm), 'all' is exactly the same as clearHistory (llm).

clearHistory (llm), 'last' deletes the last user-model interaction from the current chat session. Use this option if you want to rephrase or modify the last request without clear the entire chat history.

showHistory (llm), 'first' removes only the first user-model interaction from the current chat session. Use this option if you want to discard the initial user-model interaction in order to experiment with the model’s context size.

showHistory (llm), idx deletes the user-model interactions specified by idx, which must be a scalar or a vector of integer values indexing the rows of the $N×3$ cell array comprising the chatHistory property in llm.

Note that selectively deleting user-model interactions from the chat history also removes any images that may be integrated with the selected requests.

Categories &

Functions List

interface class

interface class

tool classes

tool classes

supplementary functions

supplementary functions

backend functions

backend functions

Class Definition: ollama

llms: ollama

Properties

runningModels

mode

serverURL

availableModels

responseStats

chatHistory

activeModel

readTimeout

writeTimeout

options

systemMessage

thinking

tools

muteThinking

Methods

ollama

ollama: llm = ollama (serverURL)

ollama: llm = ollama (serverURL, model)

ollama: llm = ollama (serverURL, model, mode)

listModels

ollama: list = listModels (llm)

ollama: list = listModels (llm, outtype)

ollama: listModels (…)

listRunningModels

ollama: list = listRunningModels (llm)

ollama: list = listRunningModels (llm, outtype)

ollama: listRunningModels (…)

copyModel

ollama: copyModel (llm, source, target)

deleteModel

ollama: deleteModel (llm, target)

loadModel

ollama: loadModel (llm, target)

unloadModel

ollama: unloadModel (llm, target)

pullModel

ollama: pullModel (llm, target)

setOptions

ollama: setOptions (llm, name, value)

showOptions

ollama: showOptions (llm)

query

ollama: query (llm, prompt)

ollama: query (llm, prompt, image)

ollama: txt = query (…)

ollama: query (llm)

chat

ollama: chat (llm, prompt)

ollama: chat (llm, prompt, image)

ollama: chat (llm, {tool_output})

ollama: txt = chat (…)

ollama: chat (llm)

embed

ollama: vectors = embed (llm, input)

ollama: vectors = embed (llm, input, dims)

showStats

ollama: showStats (llm)

showHistory

ollama: showHistory (llm)

ollama: showHistory (llm, 'all')

ollama: showHistory (llm, 'last')

ollama: showHistory (llm, 'first')

ollama: showHistory (llm, idx)

clearHistory

ollama: clearHistory (llm)

Class Definition: `ollama`

`runningModels`

`mode`

`serverURL`

`availableModels`

`responseStats`

`chatHistory`

`activeModel`

`readTimeout`

`writeTimeout`

`options`

`systemMessage`

`thinking`

`tools`

`muteThinking`

`ollama`

`ollama: llm = ollama (serverURL)`

`ollama: llm = ollama (serverURL, model)`

`ollama: llm = ollama (serverURL, model, mode)`

`listModels`

`ollama: list = listModels (llm)`

`ollama: list = listModels (llm, outtype)`

`ollama: listModels (…)`

`listRunningModels`

`ollama: list = listRunningModels (llm)`

`ollama: list = listRunningModels (llm, outtype)`

`ollama: listRunningModels (…)`

`copyModel`

`ollama: copyModel (llm, source, target)`

`deleteModel`

`ollama: deleteModel (llm, target)`

`loadModel`

`ollama: loadModel (llm, target)`

`unloadModel`

`ollama: unloadModel (llm, target)`

`pullModel`

`ollama: pullModel (llm, target)`

`setOptions`

`ollama: setOptions (llm, name, value)`

`showOptions`

`ollama: showOptions (llm)`

`query`

`ollama: query (llm, prompt)`

`ollama: query (llm, prompt, image)`

`ollama: txt = query (…)`

`ollama: query (llm)`

`chat`

`ollama: chat (llm, prompt)`

`ollama: chat (llm, prompt, image)`

`ollama: chat (llm, {tool_output})`

`ollama: txt = chat (…)`

`ollama: chat (llm)`

`embed`

`ollama: vectors = embed (llm, input)`

`ollama: vectors = embed (llm, input, dims)`

`showStats`

`ollama: showStats (llm)`

`showHistory`

`ollama: showHistory (llm)`

`ollama: showHistory (llm, 'all')`

`ollama: showHistory (llm, 'last')`

`ollama: showHistory (llm, 'first')`

`ollama: showHistory (llm, idx)`

`clearHistory`

`ollama: clearHistory (llm)`