I Was Not Expecting This! 120 BILLION Params, 120 Tokens PER SECOND (feat llama.cpp)

1 month ago

These speeds alone open the door to so many cool things! Not to mention, gpt-oss:20b is a great model, but why not run its bigger brother at comparable speeds! At least some of the time!

Dual 5090s
gpt-oss:120b @ 120 Tokens / second
gpt-oss:20b @ 270 Tokens / second

Loading comments...