Local API Server

Local API Server

Jan provides a built-in API server to be used as a drop-in for OpenAI's API local replacement. This guide will walk you through how to start the local server and use it to request the local server.

Step 1: Set the Local Server

To start the local server, follow the steps below:

  1. Navigate to the Jan main menu dashboard.
  2. Click the corresponding icon on the bottom left side of your screen.
  3. Select the model you want to use under the Model Settings screen to set the LLM for your local server.
  4. Configure the server settings as follows:
FeatureDescriptionDefault Setting
Local Server AddressBy default, Jan is only accessible on the same computer it's running on, using the address You can change this to to let other devices on your local network access it. However, this is less secure than allowing access from the same computer.localhost (
PortJan runs on port 1337 by default. The port can be changed to any other port number as needed.1337
API PrefixCustomizes the base URL of the local API server./v1
Cross-Origin Resource Sharing (CORS)Manages resource access from external domains. Enabled for security by default but can be disabled if needed.Enabled
Verbose Server LogsProvides extensive details about server activities as the local server runs, displayed at the center of the screen.Not specified (implied enabled)

Step 2: Start and Use the Built-in API Server

Once you have set the server settings, you can start the server by following the steps below:

  1. Click the Start Server button on the top left of your screen.

When the server starts, you'll see a message like Server listening at, and the Start Server button will turn into a red Stop Server button.

  1. You will be redirected to the API reference server in your browser.
  2. Select the available endpoints and try them out by executing the example request.
  3. In this example, we will show you how it works using the Chat endpoint.
  4. Click the Try it out button.
  5. The Chat endpoint has the following cURL request example when running using a tinyllama-1.1b model local server:

"messages": [
"content": "You are a helpful assistant.",
"role": "system"
"content": "Hello!",
"role": "user"
"model": "tinyllama-1.1b",
"stream": true,
"max_tokens": 2048,
"stop": ["hello"],
"frequency_penalty": 0,
"presence_penalty": 0,
"temperature": 0.7,
"top_p": 0.95

  1. The endpoint returns the following JSON response body:

"choices": [
"finish_reason": null,
"index": 0,
"message": {
"content": "Hello user. What can I help you with?",
"role": "assistant"
"created": 1700193928,
"id": "ebwd2niJvJB1Q2Whyvkz",
"model": "_",
"object": "chat.completion",
"system_fingerprint": "_",
"usage": {
"completion_tokens": 500,
"prompt_tokens": 33,
"total_tokens": 533