Cortex

info

Real-world Use: Cortex.cpp powers Jan, our on-device ChatGPT-alternative.

Cortex.cpp is in active development. If you have any questions, please reach out to us on GitHub or Discord

Cortex is a Local AI API Platform that is used to run and customize LLMs.

Key Features:

Straightforward CLI (inspired by Ollama)
Full C++ implementation, packageable into Desktop and Mobile apps
Pull from Huggingface, or Cortex Built-in Model Library
Models stored in universal file formats (vs blobs)
Swappable Inference Backends (default: llamacpp, future: ONNXRuntime, TensorRT-LLM)
Cortex can be deployed as a standalone API server, or integrated into apps like Jan.ai

Cortex's roadmap is to implement the full OpenAI API including Tools, Runs, Multi-modal and Realtime APIs.

Inference Backends

Default: llama.cpp: cross-platform, supports most laptops, desktops and OSes
Future: ONNX Runtime: supports Windows Copilot+ PCs & NPUs
Future: TensorRT-LLM: supports Nvidia GPUs

If GPU hardware is available, Cortex is GPU accelerated by default.

Models

Cortex.cpp allows users to pull models from multiple Model Hubs, offering flexibility and extensive model access.

Note: As a very general guide: You should have >8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.

Cortex Built-in Models & Quantizations

Model /Engine	llama.cpp	Command
phi-3.5	✅	cortex run phi3.5
llama3.2	✅	cortex run llama3.2
llama3.1	✅	cortex run llama3.1
codestral	✅	cortex run codestral
gemma2	✅	cortex run gemma2
mistral	✅	cortex run mistral
ministral	✅	cortex run ministral
qwen2	✅	cortex run qwen2.5
openhermes-2.5	✅	cortex run openhermes-2.5
tinyllama	✅	cortex run tinyllama

View all Cortex Built-in Models.

Cortex supports multiple quantizations for each model.

❯ cortex-nightly pull llama3.2
Downloaded models:
    llama3.2:3b-gguf-q2-k

Available to download:
    1. llama3.2:3b-gguf-q3-kl
    2. llama3.2:3b-gguf-q3-km
    3. llama3.2:3b-gguf-q3-ks
    4. llama3.2:3b-gguf-q4-km (default)
    5. llama3.2:3b-gguf-q4-ks
    6. llama3.2:3b-gguf-q5-km
    7. llama3.2:3b-gguf-q5-ks
    8. llama3.2:3b-gguf-q6-k
    9. llama3.2:3b-gguf-q8-0

Select a model (1-9):

Inference Backends​

Models​

Cortex Built-in Models & Quantizations​

Inference Backends

Models

Cortex Built-in Models & Quantizations