Docs
Jan Server
Installation

Prerequisites

Before setting up Jan Server, ensure you have the following components installed:

Required Components

Important: Windows and macOS users can only run mock servers for development. Real LLM model inference with vLLM is only supported on Linux systems with NVIDIA GPUs.

  1. Docker Desktop

  2. Minikube

  3. Helm

  4. kubectl

Optional: NVIDIA GPU Support (for Real LLM Models)

If you plan to run real LLM models (not mock servers) and have an NVIDIA GPU:

  1. Install NVIDIA Container Toolkit: Follow the official NVIDIA Container Toolkit installation guide (opens in a new tab)

  2. Configure Minikube for GPU support: Follow the official minikube GPU tutorial (opens in a new tab) for complete setup instructions.

Quick Start

Local Development Setup

Option 1: Mock Server Setup (Recommended for Development)

  1. Start Minikube and configure Docker:


    minikube start
    eval $(minikube docker-env)

  2. Build and deploy all services:


    ./scripts/run.sh

  3. Access the services:

Option 2: Real LLM Setup (Requires NVIDIA GPU)

  1. Start Minikube with GPU support:


    minikube start --gpus all
    eval $(minikube docker-env)

  2. Configure GPU memory utilization (if you have limited GPU memory):

    GPU memory utilization is configured in the vLLM Dockerfile. See the vLLM CLI documentation (opens in a new tab) for all available arguments.

    To modify GPU memory utilization, edit the vLLM launch command in:

    • apps/jan-inference-model/Dockerfile (for Docker builds)
    • Helm chart values (for Kubernetes deployment)
  3. Build and deploy all services:


    # For GPU setup, modify run.sh to use GPU-enabled minikube
    # Edit scripts/run.sh and change "minikube start" to "minikube start --gpus all"
    ./scripts/run.sh

Production Deployment

For production deployments, modify the Helm values in charts/umbrella-chart/values.yaml and deploy using:


helm install jan-server ./charts/umbrella-chart

Manual Installation

Build Docker Images

Build both required Docker images:


# Build API Gateway
docker build -t jan-api-gateway:latest ./apps/jan-api-gateway
# Build Inference Model
docker build -t jan-inference-model:latest ./apps/jan-inference-model

The inference model image downloads the Jan-v1-4B model from Hugging Face during build. This requires an internet connection and several GB of download.

Deploy with Helm

Install the Helm chart:


# Update Helm dependencies
helm dependency update ./charts/umbrella-chart
# Install Jan Server
helm install jan-server ./charts/umbrella-chart

Port Forwarding

Forward the API gateway port to access from your local machine:


kubectl port-forward svc/jan-server-jan-api-gateway 8080:8080

Verify Installation

Check that all pods are running:


kubectl get pods

Expected output:


NAME READY STATUS RESTARTS
jan-server-jan-api-gateway-xxx 1/1 Running 0
jan-server-jan-inference-model-xxx 1/1 Running 0
jan-server-postgresql-0 1/1 Running 0

Test the API gateway:


curl http://localhost:8080/health

Uninstalling

To remove Jan Server:


helm uninstall jan-server

To stop minikube:


minikube stop

Troubleshooting

Common Issues and Solutions

1. LLM Pod Not Starting (Pending Status)

Symptoms: The jan-server-jan-inference-model pod stays in Pending status.

Diagnosis Steps:


# Check pod status
kubectl get pods
# Get detailed pod information (replace with your actual pod name)
kubectl describe pod jan-server-jan-inference-model-<POD_ID>

Common Error Messages and Solutions:

Error: "Insufficient nvidia.com/gpu"

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

Solution for Real LLM Setup:

  1. Ensure you have NVIDIA GPU and drivers installed
  2. Install NVIDIA Container Toolkit (see Prerequisites section)
  3. Start minikube with GPU support:

    minikube start --gpus all

Error: vLLM Pod Keeps Restarting

# Check pod logs to see the actual error
kubectl logs jan-server-jan-inference-model-<POD_ID>

Common vLLM startup issues:

  1. CUDA Out of Memory: Modify vLLM arguments in Dockerfile to reduce memory usage
  2. Model Loading Errors: Check if model path is correct and accessible
  3. GPU Not Detected: Ensure NVIDIA Container Toolkit is properly installed

2. Helm Issues

Symptoms: Helm commands fail or charts won't install.

Solutions:


# Update Helm dependencies
helm dependency update ./charts/umbrella-chart
# Check Helm status
helm list
# Uninstall and reinstall
helm uninstall jan-server
helm install jan-server ./charts/umbrella-chart

3. Common Development Issues

Pods in ImagePullBackOff state

  • Ensure Docker images were built in the minikube environment
  • Run eval $(minikube docker-env) before building images

Port forwarding connection refused

  • Verify the service is running: kubectl get svc
  • Check pod status: kubectl get pods
  • Review logs: kubectl logs deployment/jan-server-jan-api-gateway

Inference model download fails

  • Ensure internet connectivity during Docker build
  • The Jan-v1-4B model is approximately 2.4GB

Resource Requirements

Minimum System Requirements:

  • 8GB RAM
  • 20GB free disk space
  • 4 CPU cores

Recommended System Requirements:

  • 16GB RAM
  • 50GB free disk space
  • 8 CPU cores
  • GPU support (for faster inference)

The inference model requires significant memory. Ensure your minikube cluster has adequate resources allocated.