All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
120000086 Ai MDL
Speculative Decoding
YouTube
Speculative Decoding
Eagle
LLM Efficient
Speculative Decoding
Salam 119 Ai Decoded
Speculative Decoding
Eagle 2
Vllm GitHub Windows
LLM Split Inference
Vllm Windows
K80 LLM Inference
YouTube LLMs
Speculative Decoding
for LLM
Speculative Decoding
Self
Speculative Decoding
CAD with LLM Integration
L a Valkenberg
Model
Sqampling in Lmmqs
Spitransvergexk
What Is
Speculative Execution
VLM
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
120000086 Ai MDL
Speculative Decoding
YouTube
Speculative Decoding
Eagle
LLM Efficient
Speculative Decoding
Salam 119 Ai Decoded
Speculative Decoding
Eagle 2
Vllm GitHub Windows
LLM Split Inference
Vllm Windows
K80 LLM Inference
YouTube LLMs
Speculative Decoding
for LLM
Speculative Decoding
Self
Speculative Decoding
CAD with LLM Integration
L a Valkenberg
Model
Sqampling in Lmmqs
Spitransvergexk
What Is
Speculative Execution
VLM
🌵 Speculative Speculative DecodingWhat if your draft model could speculate while the target model is still verifying? That's the idea behind Speculative Speculative Decoding (SSD). I've been… | Maxime Labonne
7 views
2 months ago
linkedin.com
DFlash Boosts Speculative Decoding with Lightweight Block Diffusion | Kalyan KS posted on the topic | LinkedIn
2 views
4 months ago
linkedin.com
Speculative Decoding — Think Fast⚡, Then Think Right✅
Apr 13, 2025
substack.com
How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
Aug 1, 2024
qualcomm.com
Faster LLMs: Accelerate Inference with Speculative Decoding
11 months ago
ibm.com
3:49
T-pro 2.0: Efficient Russian Reasoning LLM
4 months ago
YouTube
AI Research Roundup
6:13
Speculative Decoding: Make AI 2-3x Faster for Free | Tech Decoded
1 month ago
YouTube
Toc am
3:08
What is Speculative Decoding ?
5 views
5 days ago
YouTube
DeepManim
20:27
Ep 121: Latency and Throughput — Making AI Fast | LLM Mastery Podcast
28 views
2 weeks ago
YouTube
carlos Hernandez
7:09
Don't use speculative decoding until you watch this
7 views
2 weeks ago
YouTube
DigitalOcean
0:49
Why ChatGPT streams faster than it used to: speculative decoding explained in 48 seconds
1.1K views
2 weeks ago
YouTube
Adam Rosler
2:54
What is Speculative Decoding?
1 week ago
YouTube
Standarity
40:19
Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
1 views
2 months ago
YouTube
Modal
5:04
Speculative Decoding: 2-3x Faster LLMs for Free
1 views
1 month ago
YouTube
The AI Century
1:25
Stop LLM Lag: The Secret to 1.4x Faster AI (ConfLayers) #Shorts
1 week ago
YouTube
CollapsedLatents
14:13
【論文解説】【爆速化】推測デコーディングをスパース計算で検証!驚きの成果公開!
1 views
2 months ago
YouTube
論文解説チャンネル
0:59
Speculative Decoding explained in Hindi #aiengineering #datascience #llm #mustdo Interview Question
24 views
3 months ago
YouTube
Learn AI with RC
1:02:23
EP5: Speculative Decoding with Nadav Timor
116 views
7 months ago
YouTube
The Information Bottleneck
6:29
Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)
56 views
1 month ago
YouTube
wecite
8:48
The Voynich Manuscript Decoded: A Reproducible Decoding Framework
159 views
1 week ago
YouTube
Creation Unified
0:10
Qwen 推理性能最高提升8倍!这个 DFlash 前几天看到了,感觉就是以前的speculative decoding,结果今天看到有人加了 DDTree 技术。实际就是把 speculative decoding 的单链 draft 变成树状draft + Tree Attention 一次验证 + 最长匹配 prefix commit。目的是不让 draft 模型在 branching point 的预测被浪费, DFlash 每个 position 只留一个预测,DDTree 则保留多个,再让 target 模型一次看完挑哪个对。直接让 Qwen3-30B-MoE 在 HumanEvalT 实现了 8.22 倍速度提升!
38.4K views
3 weeks ago
x.com
nash_su - e/acc
0:41
Using LLMs on your Mac?dflash-mlx delivers a 4x increase in speed using block-diffusion speculative decoding.> Built from scratch on Metal for Apple Silicon.> Draft model proposes multiple tokens at once.> Target verifies the whole block in a single pass.> Accept the longest correct prefix.> Bit-identical output, up to 4x the throughput.> Every piece hand-rolled — hidden-state hooks, mixed-attention KV rollback, warm-path benchmarking. Macs just got a big upgrade for local AI!
6.1K views
4 weeks ago
x.com
Markets & Mayhem
15:39
【AI论文解读】让 speculative decoding 更快更准!任务感知的 Draft 模型优化方案 | TAPS
1 month ago
bilibili
熊二等兵
DFVG: A Heterogeneous Architecture for Speculative Decoding with Draft-on-FPGA and Verify-on-GPU | Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
2 months ago
acm.org
12:10
推测解码在Qwen3.6的llama.cpp上实现
758 views
1 week ago
bilibili
张曦偌
0:12
Algo Brief on Instagram: "Interesting fact: To make these massive models faster for real-time coding, engineers are now using Speculative Decoding. This involves a tiny, lightning-fast "draft" model predicting several tokens at once, while the massive "teacher" model (like Claude 4 or GPT-5) only steps in to verify or correct the draft, reducing latency by up to 40% without sacrificing the intelligence of the output."
3.6K views
3 months ago
Instagram
algobrief
Multi-candidate Speculative Decoding | Natural Language Processing and Chinese Computing
2 months ago
acm.org
6:20
Encoder Decoder Network - Computerphile
157.1K views
Jun 13, 2018
YouTube
Computerphile
6:47
Transformer models: Encoder-Decoders
107K views
Jun 14, 2021
YouTube
Hugging Face
12:27
Stuart Hall's Encoding/Decoding Model but it's easier to understand
66.6K views
Oct 22, 2020
YouTube
Barley Make Video
See more
More like this
Feedback