Categories &

Functions List

Class Definition: ollama

llms: ollama

An ollama object interface with a running ollama server.

ollama is a scalar handle class object, which allows communication with an ollama server running either locally or across a network. The heavy lifting, interfacing with the ollama API, is done by the compiled __ollama__ function, which should not be called directly.

An ollama interface object should be considered as a session to the ollama server by holding any user defined settings along with the chat history (if opted) and other custom parameters to be parsed to the LLM model during inference. You can initialize several ollama interface objects pointing to the same ollama server and use them concurently to implement more complex schemes such as RAG, custom tooling, etc.

See also: calendarDuration, datetime

Source Code: ollama

Properties

Displays the models that are currently loaded in ollama server’s memory.

Specifies the inference mode that the ollama interface object will use to send requests to the ollama server. Currently, only 'query' and 'chat' modes are implemented.

Specifies the network IP address and the port at which the ollama interface object is connected to.

Displays the models that are currently available in ollama server.

Contains various metrics about the last processed request and the response returned from the ollama server.

Contains an N×3 cell array with the history of user prompts, images (if any), and models response for a given chat session. The first column contains character vectors with the user’s prompts, the second column contains a nested cell array with any images attached to the corresponding user prompt (otherwise it is empty), and the third column contains the model’s responses. By default, chatHistory is an empty cell array, and it is only populated while in 'chat' mode.

The name of the model that will be used for generating the response to the next user request. This is empty upon construction and it must be specified before requesting any inference from the ollama server.

The time in seconds that the ollama interface object will wait for a server response before closing the connection with an error.

The time in seconds that the ollama interface object will wait for a request to be successfully sent to the server before closing the connection with an error.

A structure containing fields as optional parameters to be passed to a model for inference at runtime. By default, this is an empty structure, in which case the model utilizes its default parameters as specified in the respective model file in the ollama server. See the setOptions method for more information about the custom parameters you can specify.

Methods

ollama: llm = ollama (serverURL)
ollama: llm = ollama (serverURL, model)
ollama: llm = ollama (serverURL, model, mode)

llm = ollama (serverURL) creates an ollama interface object, which allows communication with an ollama server accesible at serverURL, which must be a character vector specifying a uniform resource locator (URL). If serverURL is empty or ollama is called without any input arguments, then it defaults to http://localhost:11434.

llm = ollama (serverURL, model) also specifies the active model of the ollama interface llm which will be used for inference. model must be a character vector specifying an existing model at the ollama server. If the requested model is not available, a warning is emitted and no model is set active, which is the default behavior when ollama is called with fewer arguments. An active model is mandatory before starting any communication with the ollama server. Use the listModels class method to see the all models available in the server instance that llm is interfacing with. Use the loadModel method to set an active model in an ollama interface object that has been already created.

llm = ollama (serverURL, model, mode) also specifies the inference mode of the ollama interface. mode can be specified as 'query', for generating responses to single prompts, 'chat', for starting a conversation with a model by retaining the entire chat history during inference, and 'embed' (unimplemented) for generating embedings for given prompts. By default, the ollama interface is initialized in query mode.

See also: fig2base64

ollama: list = listModels (llm)
ollama: list = listModels (llm, outtype)
ollama: listModels (…)

list = listModels (llm) returns a cell array of character vectors in list with the names of the models, which are available on the ollama server that llm interfaces with. This is equivalent to accessing the availableModels property with the syntax list = llm.availableModels.

list = listModels (lllm, outtype) also specifies the data type of the output argument list. outtype must be a character vector with any of the following options:

  • 'cellstr' (default) returns list as a cell array of character vectors. Use this option to see available models for selecting an active model for inference.
  • 'json' returns list as a character vector containing the json string response returned from the ollama server. Use this option if you want to access all the details about the models available in the ollama server.
  • 'table' returns list as a table with the most important information about the available models in specific table variables.

listModels (…) will display the output requested according to the previous syntaxes to the standard output instead of returning it to an output argument. This syntax is not valid for the 'json' option, which requires an output argument.

ollama: list = listRunningModels (llm)
ollama: list = listRunningModels (llm, outtype)
ollama: listRunningModels (…)

list = listRunningModels (llm) returns a cell array of character vectors in list with the names of the models, which are currently loaded in memory at the ollama server that llm interfaces with. This is equivalent to accessing the runningModels property with the syntax list = llm.runningModels.

list = listRunningModels (lllm, outtype) also specifies the data type of the output argument list. outtype must be a character vector with any of the following options:

  • 'cellstr' (default) returns list as a cell array of character vectors. Use this option to see which models are currently running on the ollama server for better memory management.
  • 'json' returns list as a character vector containing the json string response returned from the ollama server. Use this option if you want to access all the details about currnently running models.
  • 'table' returns list as a table with the most important information about the currently running models in specific table variables.

listModels (…) will display the output requested according to the previous syntaxes to the standard output instead of returning it to an output argument. This syntax is not valid for the 'json' option, which requires an output argument.

ollama: copyModel (llm, source, target)

copyModel (llm, source, target) copies the model specified by source into a new model named after target in the ollama server interfaced by llm. Both source and target must be character vectors, and source must specify an existing model in the ollama server. If successful, the available models in the llm.availableModels property are updated, otherwise, an error is returned.

Alternatively, source may also be an integer scalar value indexing an existing model in llm.availableModels.

ollama: deleteModel (llm, target)

deleteModel (llm, target) deletes the model specified by target in the ollama server interfaced by llm. source can be either a character vector with the name of the model or an integer scalar value indexing an existing model in llm.availableModels. If successful, the available models in the llm.availableModels property are updated, otherwise, an error is returned.

ollama: loadModel (llm, target)

loadModel (llm, target) loads the model specified by target in the ollama server interfaced by llm. This syntax is equivalent to assigning a value to the activeModel property as in llm.activeModel = target. If successful, the specified model is also set as the active model for inference in the llm.activeModel property. target can be either a character vector with the name of the model or an integer scalar value indexing an existing model in llm.availableModels.

You can load multiple models conncurently and you are only limited by the hardware specifications of the ollama server, which llm interfaces with. However, since each time a new model is loaded it is also set as the active mode for inference, keep in mind that only a single model can be set active at a time for a given ollama interface object. The active model for for inference will always be the latest loaded model.

ollama: unloadModel (llm, target)

unloadModel (llm, target) unloads the model specified by target from memory of the ollama server interfaced by llm. target can be either a character vector with the name of the model or an integer scalar value indexing an existing model in llm.availableModels. Use this method to free resources in the ollama server. By default, the ollama server unloads any idle model from memory after five minutes, unless otherwise instructed.

If the model you unload is also the active model in the ollama interface object, then the activeModel property is also cleared. You need to set an active model before inference.

ollama: pullModel (llm, target)

pullModel (llm, target) downloads the model specified by target from the ollama library into the ollama server interfaced by llm. If successful, the model is appended to list of available models in the llm.availableModels property. target must be a character vector.

ollama: setOptions (llm, name, value)

setOptions (llm, name, value) sets custom options to be parsed to the ollama server in order to tailor the behavior of the model according to specific needs. The options must be specified as name, value paired arguments, where name is a character vector naming the option to be customized, and value can be either numeric or logical scalars depending on the values each option requires.

The following options may be customized in any order as paired input arguments.

namevaluedescription
'num_keep'integerSpecifies how many of the most recent tokens or responses should be kept in memory for generating the next output. Higher values can improve relevance of the generated text by providing more context.
'seed'integerControls the randomness of token selection during text generation so that similar responses are reproduced for the same requests.
'num_predict'integerSpecifies the maximum number of tokens to predict when geneerating text.
'top_k'integerLimits the number of possible choices for each next token when generating responses by specifying how many of the most likely options to consider.
'top_p'doubleSets the cumulative probability for nucleus sampling. It must be in the range [0,1].
'min_p'doubleAdjusts the sampling threshold in accordance with the model’s confidence. Specifically, it scales the probability threshold based on the top token’s probability, allowing the model to focus on high-confidence tokens when certain, and to consider a broader range of tokens when less confident. It must be in the range [0,1].
'typical_p'doubleControls how conventional or creative the responses from a language model will be. A higher typical_p value results in more expected and standard responses, while a lower value allows for more unusual and creative outputs. It must be in the range [0,1].
'repeat_last_n'integerDefines how far back the model looks to avoid repetition.
'temperature'doubleControls the randomness of the generated out by determining how the model leverages the raw likelihoods of the tokens under consideration for the next words in a sequence. It ranges from 0 to 2 with higher values corresponding to more chaotic output.
'repeat_penalty'doubleAdjusts the penalty for repeated phrases; higher values discourage repetition.
'presence_penalty'doubleControls the diversity of the generated text by penalizing new tokens based on whether they appear in the text so far.
'frequency_penalty'doubleControls how often the same words should be repeated in the generated text.
'penalize_newline'logicalDiscourages the model from generating newlines in its responses.
'numa'logicalAllows for non-uniform memory access to enhance performance. This can significantly improve processing speeds on multi-CPU systems.
'num_ctx'integerSets the context window length (in tokens) determining how much previous text the model considers. This should be kept in mind especially in chat seesions.
'num_batch'integerControls the number of input samples processed in a single batch during model inference. Reducing this value can help prevent out-of-memory (OOM) errors when working with large models.
'num_gpu'integerSpecifies the number of GPU devices to use for computation.
'main_gpu'integerSpecified which GPU device to use for inference.
'use_mmap'logicalAllows for memory-mapped file access, which can improve performance by enabling faster loading of model weights from disk.
'num_thread'integerSpecifies the number of threads to use during model generation, allowing you to optimize performance based on your CPU’s capabilities.

Specified customized options are preserved in the ollama interface object for all subsequent requests for inference until they are altered or reset to the model’s default value by removing them. To remove a custom option pass an empty value to the name, value paired argument, as in setOptions (llm, 'seed', []).

Use the showOptions method to display any custom options that may be currently set in the ollama interface object. Alternatively, you can retrieve the custom options as a structure through the options property as in opts = llm.options, where each field in opts refers to a custom property If no custom options are set, then opts is an empty structure.

You can also set or clear a single custom option with direct assignment to the options property of the ollama inteface object by passing the name, value paired argument as a 2-element cell array. The equivalent syntax of setOptions (llm, 'seed', [] is llm.options = {'seed', []}.

ollama: showOptions (llm)

showOptions (llm) displays any custom options that may be specified in the ollama inteface object llm.

ollama: query (llm, prompt)
ollama: query (llm, prompt, image)
ollama: txt = query (…)
ollama: query (llm)

query (llm, prompt) uses the "api/generate" API end point to make a request to the ollama server interfaced by llm to generate text based on the user’s input specified in prompt, which must be a character vector. When no output argument is requested, query prints the response text in the standard output (command window) with a custom display method so that words are not split between lines depending on the terminal size. If an output argument is requested, the text is returned as a character vector and nothing gets displayed in the terminal.

query (llm, prompt, image) also specifies an image or multiple images to be passed to the model along with the user’s prompt. For a single image, image must be a character vector specifying either the filename of an image or a base64 encoded image. query distinguishes between the two by scanning image for a period character ('.'), which is commonly used as a separator between base-filename and extension, but it is an invalid character for base64 encoded strings. For multiple images, image must be a cell array of character vectors explicitly containing either multiple filenames or mulitple base64 encoded string representations of images.

txt = query (…) returns the generated text to the output argument txt instead of displaying it to the terminal for any of the previous syntaxes.

query (llm) does not make a request to the ollama server, but it sets the 'mode' property in the ollama interface object llm to 'query' for future requests. Use this syntax to switch from another inteface mode to query mode without making a request to the server.

An alternative method of calling the query method is by using direct subscripted reference to the ollama interface object llm as long as it already set in query mode. The table below lists the equivalent syntaxes.

method callingobject subscripted reference
query (llm, prompt)llm(prompt)
query (llm, prompt, image)llm(prompt, image)
query (llm, prompt, image)llm(prompt, image)
txt = query (llm, prompt)txt = llm(prompt)
txt = query (llm, prompt, image)txt = llm(prompt, image)
ollama: chat (llm, prompt)
ollama: chat (llm, prompt, image)
ollama: txt = chat (…)
ollama: chat (llm)

chat (llm, prompt) uses the "api/chat" API end point to make a request to the ollama server interfaced by llm to generate text based on the user’s input specified in prompt along with all previous requests and responses, made by the user and models during the same chat session, which is stored in the 'chatHistory' property of the ollama interface object llm. prompt must be a character vector. When no output argument is requested, chat prints the response text in the standard output (command window) with a custom display method so that words are not split between lines depending on the terminal size. If an output argument is requested, the text is returned as a character vector and nothing gets displayed in the terminal. In either case, the response text is appended to the history chat, which can be displayed with the showHistory method or return as a cell array from llm.chatHistory. If you want to start a new chat session, you can either clear the chat history with the clearHistory method or create a new ollama interface object.

chat (llm, prompt, image) also specifies an image or multiple images to be passed to the model along with the user’s prompt. For a single image, image must be a character vector specifying either the filename of an image or a base64 encoded image. chat distinguishes between the two by scanning image for a period character ('.'), which is commonly used as a separator between base-filename and extension, but it is an invalid character for base64 encoded strings. For multiple images, image must be a cell array of character vectors, which can contain both multiple filenames and mulitple base64 encoded string representations of images. Any images supplied along with a prompt during a chat session are also stored in the chat history.

txt = chat (…) returns the generated text to the output argument txt instead of displaying it to the terminal for any of the previous syntaxes.

chat (llm) does not make a request to the ollama server, but it sets the 'mode' property in the ollama interface object llm to 'chat' for future requests. Use this syntax to switch from another inteface mode to chat mode without making a request to the server. Switching to chat mode does not clear any existing chat history in llm.

An alternative method of calling the chat method is by using direct subscripted reference to the ollama interface object llm as long as it already set in chat mode. The table below lists the equivalent syntaxes.

method callingobject subscripted reference
chat (llm, prompt)llm(prompt)
chat (llm, prompt, image)llm(prompt, image)
chat (llm, prompt, image)llm(prompt, image)
txt = chat (llm, prompt)txt = llm(prompt)
txt = chat (llm, prompt, image)txt = llm(prompt, image)
ollama: showStats (llm)

showStats (llm) displays the response statistics of the last response returned from the ollama server intefaced by llm. The type of request (e.g. query, chat, embed) does not alter the displayed statistics, which include the following parameters:

  • total duration: the total time in seconds to process the request and return the response.
  • load duration: the time in seconds to load the user’s request into the model.
  • evaluation duration: the time in seconds for the model to generate the response base on the user’s request.
  • prompt count: the number of tokens comprising the user’s request.
  • evaluation count: the number of tokens comprising the model’s response.
ollama: showHistory (llm)
ollama: showHistory (llm, 'all')
ollama: showHistory (llm, 'last')
ollama: showHistory (llm, 'first')
ollama: showHistory (llm, idx)

showHistory (llm) displays the entire chat history stored in the ollama interface object llm. The chat history is displayed in chronological order alternating between user’s requests and the model’s responses. For any user’s request that contained images, the filenames or the number of images (in case of base64 encoded images) are also listed below the corresponding request and before the subsequent response.

showHistory (llm), 'all' is exactly the same as showHistory (llm).

showHistory (llm), 'last' displays only the last user-model interaction of the current chat session.

showHistory (llm), 'first' displays only the first user-model interaction of the current chat session.

showHistory (llm), idx displays the user-model interactions specified by idx, which must be a scalar or a vector of integer values indexing the rows of the N×3 cell array comprising the chatHistory property in llm.

showHistory is explicitly used for displaying the chat history and does not return any output argument. If you want to retrieve the chat history in a cell array, you can access the chatHistory property directly, as in hdata = llm.chatHistory.

ollama: clearHistory (llm)

clearHistory (llm) deletes the entire chat history in the ollama interface object llm. Use this method to initialize a new chat session.

clearHistory (llm), 'all' is exactly the same as clearHistory (llm).

clearHistory (llm), 'last' deletes the last user-model interaction from the current chat session. Use this option if you want to rephrase or modify the last request without clear the entire chat history.

showHistory (llm), 'first' removes only the first user-model interaction from the current chat session. Use this option if you want to discard the initial user-model interaction in order to experiment with the model’s context size.

showHistory (llm), idx deletes the user-model interactions specified by idx, which must be a scalar or a vector of integer values indexing the rows of the N×3 cell array comprising the chatHistory property in llm.

Note that selectively deleting user-model interactions from the chat history also removes any images that may be integrated with the selected requests.