Running MLX models on Apple Silicon macOS

brew install pipx
pipx ensurepath # as needed
pipx install mlx-lm
pipx inject mlx-lm tiktoken # for Kimi-Linear

For API,

# Download the model
mlx_lm.server --model mlx-community/GLM-4.7-Flash-4bit --port <port> --max-tokens=128000
 
# Run in offline mode after reviewing custom code for security
HF_HUB_OFFLINE=1 mlx_lm.server --model mlx-community/GLM-4.7-Flash-4bit --port 6599 --max-tokens=128000 --trust-remote-code

For simple chat,

mlx_lm.chat --model mlx-community/GLM-4.7-Flash-4bit
 
# or with `llm` via the API server:
pipx install llm
 
cat > "$(dirname "$(llm logs path)")"/extra-openai-models.yaml <<'EOF'
- model_id: glm-4.7-flash
  model_name: mlx-community/GLM-4.7-Flash-4bit
  api_base: "http://localhost:6599"
EOF
llm models default glm-4.7-flash

The cache directory is .cache/huggingface for models downloaded from Hugging Face.

References

https://github.com/Aider-AI/aider/issues/4526

🪴 Zero's Garden

Explorer

Running MLX models on Apple Silicon macOS

References

Graph View