Abstract: In edge-cloud speculative decoding (SD), edge devices equipped with small language models (SLMs) generate draft tokens that are verified by large language models (LLMs) in the cloud. A key ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results