Jump to content

BitNet-git

From ArchWiki

BitNet-git (AUR) Wiki

bitnet-git provides the official inference framework for 1-bit Large Language Models (LLMs), based on Microsoft's bitnet.cpp. It is optimized for fast and energy-efficient inference on CPUs and GPUs using 1.58-bit quantization.

Installation

The easiest way to install is using an AUR helper like paru or yay:


$ paru -S bitnet-git

Alternatively, you can build manually using makepkg:

$ git clone https://aur.archlinux.org/bitnet-git.git
$ cd bitnet-git
$ makepkg -si

Hardware Optimization

The package automatically detects your architecture and uses the most appropriate kernels:

  • x86_64: Uses TL2 (optimized Lookup Table kernel) for maximum performance.
  • aarch64: Uses TL1 (optimized for ARMv8.2+).

Global Models Management

To streamline your workflow, we recommend setting up a global models directory and a shell helper. This allows you to run models by name without typing full paths or URIs.

Create the Models Directory

Create a standard directory in your home folder:

$ mkdir -p ~/.local/share/bitnet/models

Configure Your Shell

Add the following to your ~/.bashrc or ~/.zshrc:

# BitNet Models Directory
export BITNET_MODELS_DIR="$HOME/.local/share/bitnet/models"
# BitNet Runner Helper
bitnet-run() {
   if [ -z "$1" ]; then
       echo "Usage: bitnet-run <model_filename> [additional_args]"
       return 1
   fi
   local model_name="$1"
   shift
   llama-cli -m "$BITNET_MODELS_DIR/$model_name" "$@"
   }

Reload your shell: `source ~/.bashrc` (or `~/.zshrc`).

Download a Model

Download a recommended model directly into your new directory:

# Download the BitNet 2B model
wget -P "$BITNET_MODELS_DIR" https://huggingface.co/microsoft/BitNet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf


Run Inference with Ease

Now you can run the model simply by referencing its filename:

bitnet-run ggml-model-i2_s.gguf -p "What are the benefits of 1-bit LLMs?" -cnv
Options
  • -m <path>: Path to the GGUF model file.
  • -p <"prompt">: Initial prompt for the model.
  • -t <threads>: Number of CPU threads to use (e.g., -t 4).
  • -temp <value>: Control randomness (e.g., -temp 0.7).
  • -cnv: Enable conversation/chat mode.

Serving the Model via API

You can also run a local API server compatible with OpenAI's API:


bitnet-run -m ggml-model-i2_s.gguf --port 8080

Then you can access it via http://localhost:8080.



Recommended Models (x86_64)
Model Parameters Size (GGUF) Description
bitnet_b1_58-large 0.7B ~150 MB Blazing fast, great for testing.
BitNet-b1.58-2B-4T 2.4B ~500 MB Best overall balance for daily use.
bitnet_b1_58-3B 3.3B ~700 MB High performance, slightly more capable.
Llama3-8B-1.58 8.0B ~1.6 GB High quality, requires more RAM.


Troubleshooting

Build Failures: Ensure you have base-devel, cmake, and clang installed.

Model Errors: Verify the model file is a valid GGUF and resides in your $BITNET_MODELS_DIR.