• 0 Posts
  • 3 Comments
Joined 2 months ago
cake
Cake day: February 17th, 2025

help-circle

  • It all depends on the size of the model you are running, if it cannot fit in GPU memory, then it has to go back and forth with the host (cpu memory or even disk) and the GPU. This is extremely slow. This is why some people are running LLMs on macs, as they can have a large amount of memory shared between the GPU and CPU, making it viable to fit some larger models in memory.