Train models with TRL — SFT, DPO, GRPO fine-tuning and GGUF conversion for local deployment via llama.cpp.