Quote

A comparison of training throughput (tokens per second) for the 7B model with a context length of 512 on a p4de.24xlarge node. The lower memory footprint of LoRA allows for substantially larger batch sizes, resulting in an approximate 30% boost in throughput. ~Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2

Extracted from: pre-training-fine-tuning-and-kungfu.md