Bias, Variance, Training and Validation Sets: Cooking Up a Smart AI Model

When I first started with AI, I used to get confused between bias and variance. It felt like one of those abstract concepts that everyone talked about but never really explained in a way that stuck.

Then, one day, I stumbled upon this cooking analogy (which I tried to explain in my own language below), and everything clicked.

Training, Validation, and Test Data: A Cooking Story

Imagine you’re preparing for a cooking competition. You’re handed a set of ingredients and need to create the perfect dish. But how do you make sure it turns out great every single time?

This is where training, validation, and test data come into play. Think of it as the difference between practicing, taste-testing, and the final judging.

Training Set – The Practice Kitchen

A chef doesn’t just walk into a competition and wing it. They practice: experimenting with different ingredients, testing techniques, and refining their skills.

In AI, the training set is the data the model learns from. It’s the hours spent tweaking the recipe, adjusting seasoning, and figuring out what works and what doesn’t. The model (or chef, or for RL folks: the policy :P) keeps refining its approach based on what it has already seen.

If the chef doesn’t practice enough, they’ll be clueless when the competition starts. But if they practice only one recipe over and over without considering variations, they might struggle when faced with a different set of ingredients. That’s the risk of overfitting – being too dependent on what was seen before.

Validation Set – The Taste Test

A smart chef doesn’t just rely on their own taste buds. They invite friends over to try the dish and give feedback. Maybe it’s too salty, maybe it needs more spicy. The friend’s feedback matters a lot!

And this is exactly what the validation set does. It’s a smaller, separate portion of data that helps fine-tune the model, making sure it generalizes well rather than just memorizing patterns from the training set.

If a chef only listens to their own opinion, they risk being biased. If an AI model only focuses on training data without external feedback, it risks being too rigid.

Test Set – The Final Judging

Now comes competition day. The chef presents their dish to the judges, who have never tasted it before. There are no second chances – this is the moment of truth!

In AI, the test set is where we check how well the model performs on completely new data. It’s an unbiased evaluation, similar to the judges scoring a dish based purely on how it tastes in that moment.

But what if you’re working with a small dataset or doing minimal fine-tuning? Then you often don’t need a separate test set. If all you’re doing is making sure the model isn’t overfitting, just training and validation are enough, like adjusting a dish based on close friends’ feedback rather than a formal panel of judges.

Summary?

Training Set → Learns patterns by adjusting weights.

Validation Set → Fine-tunes the model by adjusting hyperparameters & weights.

Test Set → Final evaluation, no weight adjustments.

Bias vs. Variance: Two Types of Chefs

Beyond training and validation, one of the biggest challenges in AI modeling is finding the right balance between bias and variance. Let’s break this down with another cooking example.

Bias: The Overly Simplistic Chef

Meet Gina. Gina believes every dish should taste good with just salt and pepper. No matter the cuisine, no matter the occasion, that’s the only trick up Gina’s sleeve.

Italian? Salt and pepper. Thai food? Salt and pepper. Sushi? You guessed it! Salt and pepper.

Sometimes the dish turns out fine, but most of the time, it lacks complexity and depth. This is high bias in AI – the model is too simplistic and fails to capture the full complexity of the problem. It’s like an undertrained model that assumes every situation can be solved with the same basic rule.

A high-bias model underfits the data, meaning it doesn’t learn enough from the training examples.

Variance: The Overly Reactive Chef

Now, meet Tiki. Tiki is the opposite. If a guest says the dish is too spicy, Tiki removes all the spice. If the next guest says it’s too bland, Tiki overcompensates by adding too much seasoning. Every small feedback or comment results in a massive adjustment.

The result? No two dishes taste the same. Some turn out great, others are a disaster.

This is high variance in AI—the model is too sensitive to training data and struggles to generalize. It overfits, meaning it performs well on known examples but fails when faced with anything slightly different.

Finding the Right Balance

A great AI model, just like a great chef, needs balance. It should learn enough to recognize important patterns (low bias) but not be so reactive that it becomes unpredictable (low variance). Another example in terms of a student learning some subjects:

High Bias: A student who memorizes only one simple trick and never digs into the deeper concepts.
High Variance: A student who over-memorizes every tiny detail from practice tests and gets thrown off by even a slight change in exam questions.
Low Bias: Like a student who learns the underlying concepts instead of just one trick.
Low Variance: Like a student who can apply their understanding flexibly to new problems.

My fav fun example:

High Bias: Imagine a superhero who only uses one superpower, say flying, for every challenge, so when a problem needs super strength or speed, he/she just can’t handle it.
High Variance: Now picture a superhero who tries to use every superpower at once in every situation, switching all the time and never mastering any, which makes his/her actions unpredictable.

Getting this balance right isn’t about using fancy algorithms alone. It’s about understanding the data, fine-tuning based on feedback, and knowing when to stop adjusting!

Happy coding!

Running AI Models on MacBook Pro with MLX

Running AI Models Locally with Apple’s MLX

If you want to run an AI model on your own machine without using llama.cpp or Ollama, Apple’s free MLX platform is a powerful alternative. In this guide, we will walk through running an open-source Qwen 1.5GB Math Instruct model locally on a MacBook Pro without modifying its weights.

What is MLX?

MLX (Machine Learning Acceleration Library) is Apple’s optimized machine learning framework designed to run AI models efficiently on Apple Silicon (M1/M2/M3 chips). MLX provides a lightweight, easy-to-use interface that enables high-performance machine learning on macOS devices.

With MLX, you can:

Run AI models locally without needing cloud-based inference.
Leverage Apple’s Metal API for optimized GPU acceleration.
Fine-tune and deploy models without complex configurations.

Running Qwen 1.5B Math Instruct Model on MLX

To demonstrate MLX’s capabilities, we will run Qwen 2.5 Math 1.5B Instruct, an open-source model designed for math-related tasks. The model can be downloaded directly from Hugging Face, a popular platform for sharing, training, and deploying machine learning models, and executed locally on your MacBook Pro.

Installation Steps

1. Install MLX and Required Dependencies

First, install the MLX framework, MLX Language Model (mlx-lm), and Hugging Face Transformers:

pip install mlx mlx-lm transformers

2. Install Git Large File Storage (LFS)

Since the model files are large, you need Git LFS to handle them properly:

brew install git-lfs

3. Install Hugging Face CLI

To interact with the Hugging Face model hub, install the huggingface_hub package:

pip install huggingface_hub

4. Download the Qwen Model from Hugging Face

Clone the model repository using Git LFS:

git clone https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct

This will download all required model weights and files to your local machine.

Running the Model Locally

Once the model is downloaded, create a Python script (e.g., run_qwen.py) and add the following code:

from mlx_lm import load, generate
from transformers import AutoTokenizer

# Define the local path where the model is stored
MODEL_PATH = "./Qwen2.5-Math-1.5B-Instruct"

# Load the model and tokenizer
model, tokenizer = load(MODEL_PATH)
print("Qwen Model loaded successfully!")

# Provide a text prompt
prompt = "Create 5 math questions for 5-year-old kids, mix of addition and subtraction that is suitable for that age group."

# If the tokenizer supports chat templates, use it
if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

# Generate output
response = generate(model, tokenizer, prompt=prompt, verbose=True)

# Print the response
print("\nQwen Model Response:\n", response)

Running the Script

Execute the script in the terminal (make sure your model is in the right folder where this script is):

python run_qwen.py

Expected Output

Once the model runs successfully, it should generate five math questions suitable for a 5-year-old, for example:

Qwen Model Response:
1. What is 3 + 2?
2. If you have 5 apples and give 2 away, how many do you have left?
3. What is 1 + 4?
4. If you take away 3 from 7, how many remain?
5. What is 2 + 3?

Conclusion

By using MLX, you can efficiently run AI models like Qwen 1.5B Math Instruct locally on your MacBook Pro. This setup provides a fast, private, and cost-effective alternative to cloud-based inference while leveraging Apple’s optimized hardware acceleration.

Try experimenting with different models and fine-tuning them using MLX to explore the full potential of running AI on your local machine!

Vivek Thakur

Personal Website

Tag Archives: ai