DeepSeek's free 685B-parameter AI model runs at 20 tokens/second on Apple's Mac Studio, outperforming Claude Sonnet while using just 200 watts, challenging OpenAI's cloud-dependent business model.
A bit like a syllable when you are talking about text based responses. 20 tokens a second is faster than most people could read the output so that’s sufficient for a real time feeling “chat”.
Okay, can somebody who knows about this stuff please explain what the hell a “token per second” means?
A bit like a syllable when you are talking about text based responses. 20 tokens a second is faster than most people could read the output so that’s sufficient for a real time feeling “chat”.