Running AI Models Locally with Apple’s MLX

If you want to run an AI model on your own machine without using llama.cpp or Ollama, Apple’s free MLX platform is a powerful alternative. In this guide, we will walk through running an open-source Qwen 1.5GB Math Instruct model locally on a MacBook Pro without modifying its weights.

What is MLX?

MLX (Machine Learning Acceleration Library) is Apple’s optimized machine learning framework designed to run AI models efficiently on Apple Silicon (M1/M2/M3 chips). MLX provides a lightweight, easy-to-use interface that enables high-performance machine learning on macOS devices.

With MLX, you can:

Run AI models locally without needing cloud-based inference.
Leverage Apple’s Metal API for optimized GPU acceleration.
Fine-tune and deploy models without complex configurations.

Running Qwen 1.5B Math Instruct Model on MLX

To demonstrate MLX’s capabilities, we will run Qwen 2.5 Math 1.5B Instruct, an open-source model designed for math-related tasks. The model can be downloaded directly from Hugging Face, a popular platform for sharing, training, and deploying machine learning models, and executed locally on your MacBook Pro.

Installation Steps

1. Install MLX and Required Dependencies

First, install the MLX framework, MLX Language Model (mlx-lm), and Hugging Face Transformers:

pip install mlx mlx-lm transformers

2. Install Git Large File Storage (LFS)

Since the model files are large, you need Git LFS to handle them properly:

brew install git-lfs

3. Install Hugging Face CLI

To interact with the Hugging Face model hub, install the huggingface_hub package:

pip install huggingface_hub

4. Download the Qwen Model from Hugging Face

Clone the model repository using Git LFS:

git clone https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct

This will download all required model weights and files to your local machine.

Running the Model Locally

Once the model is downloaded, create a Python script (e.g., run_qwen.py) and add the following code:

from mlx_lm import load, generate
from transformers import AutoTokenizer

# Define the local path where the model is stored
MODEL_PATH = "./Qwen2.5-Math-1.5B-Instruct"

# Load the model and tokenizer
model, tokenizer = load(MODEL_PATH)
print("Qwen Model loaded successfully!")

# Provide a text prompt
prompt = "Create 5 math questions for 5-year-old kids, mix of addition and subtraction that is suitable for that age group."

# If the tokenizer supports chat templates, use it
if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

# Generate output
response = generate(model, tokenizer, prompt=prompt, verbose=True)

# Print the response
print("\nQwen Model Response:\n", response)

Running the Script

Execute the script in the terminal (make sure your model is in the right folder where this script is):

python run_qwen.py

Expected Output

Once the model runs successfully, it should generate five math questions suitable for a 5-year-old, for example:

Qwen Model Response:
1. What is 3 + 2?
2. If you have 5 apples and give 2 away, how many do you have left?
3. What is 1 + 4?
4. If you take away 3 from 7, how many remain?
5. What is 2 + 3?

Conclusion

By using MLX, you can efficiently run AI models like Qwen 1.5B Math Instruct locally on your MacBook Pro. This setup provides a fast, private, and cost-effective alternative to cloud-based inference while leveraging Apple’s optimized hardware acceleration.

Try experimenting with different models and fine-tuning them using MLX to explore the full potential of running AI on your local machine!

Vivek Thakur

Personal Website

Running AI Models on MacBook Pro with MLX

Running AI Models Locally with Apple’s MLX

What is MLX?

Running Qwen 1.5B Math Instruct Model on MLX

Installation Steps

1. Install MLX and Required Dependencies

2. Install Git Large File Storage (LFS)

3. Install Hugging Face CLI

4. Download the Qwen Model from Hugging Face

Running the Model Locally

Running the Script

Expected Output

Conclusion

Leave a comment Cancel reply

Running AI Models Locally with Apple’s MLX

What is MLX?

Running Qwen 1.5B Math Instruct Model on MLX

Installation Steps

1. Install MLX and Required Dependencies

2. Install Git Large File Storage (LFS)

3. Install Hugging Face CLI

4. Download the Qwen Model from Hugging Face

Running the Model Locally

Running the Script

Expected Output

Conclusion

Share this:

Related

Leave a comment Cancel reply