Docs
Jan Desktop
Model Parameters

Model Settings

Model settings control how your AI thinks and responds. Think of them as the AI's personality settings and performance controls.

How to Access Settings

Click the gear icon next to your selected model in the chat interface to open Model Settings.

Model Settings Gear

A panel will slide open on the right with all available model settings:

Model Settings

Settings Reference

SettingWhat It DoesSimple Explanation
Context SizeHow much text the model remembersLike the model's working memory. Larger = remembers more of your conversation, but uses more computer memory.
GPU LayersHow much work your graphics card doesMore layers on GPU = faster responses, but needs more graphics memory. Start high and reduce if you get errors.
TemperatureHow creative vs. predictable responses areLow (0.1-0.3) = focused, consistent answers. High (0.7-1.0) = creative, varied responses. Try 0.7 for general use.
Top KHow many word choices the model considersSmaller numbers (20-40) = more focused. Larger numbers (80-100) = more variety.
Top PAnother way to control word varietyWorks with Top K. Values like 0.9 work well. Lower = more focused, higher = more creative.
Min PMinimum chance a word needs to be chosenPrevents very unlikely words. Usually fine at default.
Repeat Last NHow far back to check for repetitionHelps prevent the model from repeating itself. Default values usually work well.
Repeat PenaltyHow much to avoid repeating wordsHigher values (1.1-1.3) reduce repetition. Too high makes responses awkward.
Presence PenaltyEncourages talking about new topicsHigher values make the model explore new subjects instead of staying on one topic.
Frequency PenaltyReduces word repetitionSimilar to repeat penalty but focuses on how often words are used.

Hardware Settings

These control how efficiently the model runs on your computer:

GPU Layers

Think of your model as a stack of layers, like a cake. Each layer can run on either your main processor (CPU) or graphics card (GPU). Your graphics card is usually much faster.

  • More GPU layers = Faster responses, but uses more graphics memory
  • Fewer GPU layers = Slower responses, but uses less graphics memory

Start with the maximum number and reduce if you get out-of-memory errors.

Context Length

This is like the model's short-term memory - how much of your conversation it can remember at once.

  • Longer context = Remembers more of your conversation, better for long discussions
  • Shorter context = Uses less memory, runs faster, but might "forget" earlier parts of long conversations

Jan defaults to 8192 tokens (roughly 6000 words) or your model's maximum, whichever is smaller. This handles most conversations well.

Quick Setup Guide

For most users:

  1. Set Temperature to 0.7 for balanced creativity
  2. Max out GPU Layers (reduce only if you get memory errors)
  3. Leave other settings at defaults

For creative writing:

  • Increase Temperature to 0.8-1.0
  • Increase Top P to 0.95

For factual/technical work:

  • Decrease Temperature to 0.1-0.3

Troubleshooting:

  • Responses too repetitive? Increase Temperature or Repeat Penalty
  • Out of memory errors? Reduce GPU Layers or Context Size
  • Responses too random? Decrease Temperature
  • Model running slowly? Increase GPU Layers (if you have VRAM) or reduce Context Size