Alibaba introduces aegaeon, a computing pooling system reducing nvidia gpu reliance by 82%. Alibaba cloud has introduced a gpu pooling technology that significantly cuts down the number of nvidia h20 units needed The chinese cloud champ therefore developed gpu pooling and memory management tech that means it can run more models on each gpu and offload data into a host’s memory or other storage
ash ♡ full nude 😈 - sugarcakee OnlyFans
“aegaeon is the first work to reveal the excessive costs associated with serving concurrent llm workloads on the market,” the researchers from peking university and alibaba cloud wrote
The solution allows a single nvidia h20 gpu to serve up to seven large language models concurrently
This change reduced gpu usage from 1,192 to just 213 units during internal testing.