Merging Large Language Models: My Journey with SARA and TIES

Dwain Barnes
Nov 19, 2024
4 min read

Updated: Nov 21, 2024

Merging large language models (LLMs) is a fascinating and surprisingly efficient way to create high-performing LLM models without the need for expensive computational resources. It’s not just experimental anymore it’s delivering real results, with some merged models topping the leaderboards. In this post, I’ll take you through my experience of merging my model, SARA, with two others using the TIES method. Along the way, I’ll explain what model merging is, the different techniques out there, and how you can do it yourself with Mergekit.

What is Model Merging?

Simply put, model merging is the process of combining two or more LLMs into a single, unified model. The idea is to blend the strengths of different models to create something better without starting from scratch. This is ideal if you don’t have access to high-powered GPUs or months to train a new model. The results can be amazing! Many merged models have become state-of-the-art, competing with models that cost millions to train.

Types of Model Merges

There are several ways to merge models, depending on your goals and the tools at your disposal. Here’s a quick breakdown of the most common methods:

1. SLERP (Spherical Linear Interpolation)

This technique smoothly interpolates between two models by maintaining their geometric properties. It’s great for combining two models while ensuring their weights blend meaningfully. However, it only works for two models at a time, so if you want to merge more, you’ll need to do it hierarchically.

2. TIES (TrIm, Elect Sign & Merge)

This method focuses on reducing redundant parameters and resolving conflicts in parameter signs between models. It’s ideal for merging multiple models at once, and the results are usually cohesive and balanced.

3. DARE (Prune and Rescale)

Similar to TIES, but with an added twist. It resets some fine-tuned weights and rescales others to keep the overall balance of the model intact. It’s great for merging models where you want to keep most of the base model intact but still incorporate new features.

4. Passthrough (Layer Concatenation)

This one is a bit experimental—it simply stacks layers from different models together. Think of it as creating a "Frankenstein" model. It’s not always practical but can work well for specific tasks or architectures.

For my project, I chose TIES because I wanted to merge multiple models into one while keeping things efficient and coherent.

My Project: Merging SARA with TIES

I started with my base model, SARA, and decided to merge it with two others:

EryriLabs/Llama-3.2-SARA-3b: My general-purpose base model.
Lyte/Llama-3.2-3B-Overthinker: Focused on complex reasoning tasks.
HollowMan6/Llama-3.2-3B-SFT-Model-Ocra-500k: Fine-tuned for supervised tasks.

The idea was to combine their strengths into a single, powerful model for both reasoning and general NLP tasks.

YAML Configuration

Here’s the YAML configuration I used to guide the TIES merging process:

models:
  - model: EryriLabs/Llama-3.2-SARA-3b
    parameters:
      density: 0.99  # Keeping 99% of the base model's weights
  - model: Lyte/Llama-3.2-3B-Overthinker
    parameters:
      density: 0.1  # Retaining only significant differences
      weight:   
        - filter: mlp
          value: 0.1
        - value: 0
  - model: HollowMan6/Llama-3.2-3B-SFT-Model-Ocra-500k
    parameters:
      density: 0.1
      weight: 0.2
merge_method: ties
base_model: EryriLabs/Llama-3.2-SARA-3b
parameters:
  normalize: true
  int8_mask: true
dtype: float16

This configuration essentially tells Mergekit how much of each model to retain and how to merge them efficiently. I have chosen to Keeping 99% of SARA base model's weights as I needed to retain its knowledge and responses but get the model to answer slightly different worded questions without re-tuning.

Setting Up Mergekit

If you want to try this yourself, here’s how you can set up and run Mergekit:

Prerequisites

Make sure you have these installed:

Python 3.10+
Pip (comes with Python)
Git (optional, for cloning repositories)
Anaconda (optional, for managing environments)

Step 1: Create a Virtual Environment

I recommend creating a virtual environment to keep things tidy. If you’re using Anaconda, run:

conda create --name mergekit_env python=3.8 -y
conda activate mergekit_env

Without Anaconda:

python -m venv mergekit_env
source mergekit_env/bin/activate  # Mac/Linux
mergekit_env\Scripts\activate     # Windows

Step 2: Install Mergekit

Next, install Mergekit and its dependencies. You can do this in two ways:

Using Git:

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .

Step 3: Run the Merge

Save your YAML configuration as SARA.yaml and run:

mergekit-yaml SARA.yaml output_folder --allow-crimes --copy-tokenizer --out-shard-size 1B --low-cpu-memory --write-model-card --lazy-unpickle

The merged model will be saved in the output_folder directory.

The Result: Llama-3.2-SARAThinker-merged-3b

After the merging process, I tested the new model on various tasks, and the results were impressive:

Improved Reasoning: The model slighlty increased in reasons tasks, thanks to the integration of Overthinker.
Enhanced Generalisation: The weakness before was it could not answer questions that were not phrased as the dataset presented. Now it can!
Efficiency: The merge preserved SARA’s core while reducing redundancy.

Final Thoughts

Model merging is a game-changer. It’s cost-effective, versatile, and opens up new possibilities for creating high-performing models without massive hardware. Tools like Mergekit make it accessible, and methods like TIES ensure the results are cohesive and reliable.

If you’re considering merging models, I highly recommend giving it a go. It’s easier than you might think, and the results can be incredible. Let me know if you try it out or have any questions—I’d love to hear about your projects!

Download the full model here or the GGUF here! Have fun!