Documentation
Using Models

Using Models

This guide provides comprehensive instructions on adding, customizing, and deleting models within the Jan platform.

Add Models

There are various ways to add models to Jan.

Currently, Jan natively supports the following model formats:

  • GGUF (through a llama.cpp engine)
  • TensorRT (through a TRT-LLM engine)

Download from Jan Hub

You can choose from a list of popular, recommended models directly from Jan app's Model Hub. These models are preconfigured with optimal runtime parameters. This is the easiest way to get started.

  1. Open the Jan app and navigate to the Hub.
  2. Select models, clicking the v dropdown for more information. Models with the Recommended label will likely run faster on your computer.
  3. After downloading a model, click Use to activate it. Ensure it's selected in the model dropdown for your thread.

Add a Model Manually

You can also add a specific model that is not available within the Hub section by following the steps below:

  1. Open the Jan app.
  2. Click the gear icon (⚙️) on the bottom left of your screen.
  3. Under the Settings screen, click Advanced Settings.
  4. Open the Jan Data folder.
  5. Head to the ~/jan/models/.
  6. Create a new model folder.
  7. Create a model.json file inside the folder.
  8. Insert the following model.json default code:

{
"id": "<unique_identifier_of_the_model>",
"object": "<type_of_object, e.g., model, tool>",
"name": "<name_of_the_model>",
"version": "<version_number>",
"description": "<brief_description_of_the_model>",
"format": "<format_of_the_model_api_or_other>",
"settings": "<additional_settings_as_needed>",
"parameters": {
"max_tokens": "<maximum_number_of_tokens_the_model_can_generate>",
"temperature": "<temperature_setting_for_randomness_in_generation>"
},
"metadata": {
"author": "<name_of_the_creator_or_organization>",
"tags": ["<list_of_relevant_tags_describing_the_model>"]
},
"engine": "<engine_or_platform_the_model_runs_on>",
"source": "<url_or_source_of_the_model_information>"
}

💡

If you've set up your model's configuration in nitro.json, please note that model.json can overwrite the settings.

There are two important fields in model.json that you need to set:

Settings

This is the field where you can set your engine configurations. There are two important fields that you need to define for your local models:

TermDescription
ctx_lenDefined based on the model's context size.
prompt_templateDefined based on the model's trained template (e.g., ChatML, Alpaca).

To set up the prompt_template based on your model, follow the steps below:

  1. Visit Hugging Face (opens in a new tab), an open-source machine-learning platform.
  2. Find the current model that you're using (e.g., Gemma 7b it (opens in a new tab)).
  3. Review the text and identify the template.

Parameters

parameters are the adjustable settings that affect how your model operates or processes the data. The fields in parameters are typically general and can be the same across models. An example is provided below:

To see the complete list of a model's parameters, please see below.


"parameters":{
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 4096,
"frequency_penalty": 0,
"presence_penalty": 0
}

Import or Symlink Local Models

You can also point to existing model binary files on your local filesystem. This is the easiest and most space-efficient way if you have already used other local AI applications.

  1. Navigate to the Hub.
  2. Click on Import Model at the top.
  3. Select the model or the folder containing multiple models.
  4. Optionally, check the box to symlink the model files instead of copying them over the Jan Data Folder. This saves disk space.
⚠️

Windows users should drag and drop the model file, as Click to Upload might not show the model files in Folder Preview.

Model Parameters

A model has three main parameters to configure:

  • Inference Parameters
  • Model Parameters
  • Engine Parameters

Inference Parameters

Inference parameters are settings that control how an AI model generates outputs. These parameters include the following:

ParameterDescription
TemperatureInfluences the randomness of the model's output. A higher temperature leads to more random and diverse responses, while a lower temperature produces more predictable outputs.
Top PSets a probability threshold, allowing only the most likely tokens whose cumulative probability exceeds the threshold to be considered for generation.
StreamEnables real-time data processing, useful for applications needing immediate responses, like live interactions. It accelerates predictions by processing data as it becomes available.
Max TokensSets the upper limit on the number of tokens the model can generate in a single output.
Stop SequencesDefines specific tokens or phrases that signal the model to stop producing further output. Useful for controlling output size and ending generation at logical points.
Frequency PenaltyModifies the likelihood of the model repeating the same words or phrases within a single output, reducing redundancy in the generated text.
Presence PenaltyEncourages the generation of new and varied concepts by penalizing tokens that have already appeared, promoting diversity and novelty in the output.

Model Parameter

Model parameters are the settings that define and configure the model's behavior. These parameters include the following:

ParameterDescription
Prompt TemplateThis predefined text or framework generates responses or predictions. It is a structured guide that the AI model fills in or expands upon during the generation process. For example, a prompt template might include placeholders or specific instructions that direct how the model should formulate its outputs.

Engine Parameters

Engine parameters are the settings that define how the model processes input data and generates output. These parameters include the following:

ParameterDescription
Context LengthThis parameter determines the maximum input amount the model can generate responses. The maximum context length varies with the model used. This setting is crucial for the model’s ability to produce coherent and contextually appropriate outputs.

By default, Jan sets the Context Length to the maximum supported by your model, which may slow down response times. For lower-spec devices, reduce Context Length to 1024 or 2048, depending on your device's specifications, to improve speed.

Customize the Model Settings

Adjust model settings for a specific conversation or across all conversations:

A Specific Conversation

To customize model settings for a specific conversation only:

  1. Create a new thread.
  2. Expand the right panel.
  3. Change settings under the model dropdown.

All Conversations

To customize default model settings for all conversations:

  1. Open any thread.
  2. Select the three dots next to the model dropdown.
  3. Select Edit global defaults for [model].
  4. Edit the default settings directly in the model.json.
  5. Save the file and refresh the app.

Delete Models

To delete a model:

  1. Go to Settings.
  2. Go to My Models.
  3. Select the three dots next and select Delete model.