Premium Only Content

RWKV: Reinventing RNNs for the Transformer Era (Paper Explained)
#gpt4 #rwkv #transformer
We take a look at RWKV, a highly scalable architecture between Transformers and RNNs.
Fully Connected (June 7th in SF) Promo Link: https://www.fullyconnected.com/?promo=ynnc
OUTLINE:
0:00 - Introduction
1:50 - Fully Connected In-Person Conference in SF June 7th
3:00 - Transformers vs RNNs
8:00 - RWKV: Best of both worlds
12:30 - LSTMs
17:15 - Evolution of RWKV's Linear Attention
30:40 - RWKV's Layer Structure
49:15 - Time-Parallel vs Sequence Mode
53:55 - Experimental Results & Limitations
58:00 - Visualizations
1:01:40 - Conclusion
Paper: https://arxiv.org/abs/2305.13048
Code: https://github.com/BlinkDL/RWKV-LM
Abstract:
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of Transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, which parallelizes computations during training and maintains constant computational and memory complexity during inference, leading to the first non-transformer architecture to be scaled to tens of billions of parameters. Our experiments reveal that RWKV performs on par with similarly sized Transformers, suggesting that future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling the trade-offs between computational efficiency and model performance in sequence processing tasks.
Authors: Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu
Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
-
1:02:11
DeVory Darkins
9 hours ago $15.18 earnedDemocrats suffers ANNIHILATION during heated hearing with Bondi as Jack Smith bombshell drops
81.4K94 -
LIVE
Price of Reason
9 hours agoJoe Rogan & Theo Von TURN on Trump? Hollywood to STOP Lecturing Viewers? Ghost of Yotei FIASCO!
1,757 watching -
4:49
Russell Brand
12 hours agoThis is Unbelievable...
49.7K49 -
2:55:33
Badlands Media
12 hours agoDEFCON ZERQ Ep. 012: Featuring "AND WE KNOW" and a Special Guest
50.2K47 -
2:56:36
TimcastIRL
6 hours agoLEAKED Memo Says NO BACK PAY For Federal Workers Amid Government Shutdown | Timcast IRL
276K179 -
2:01:55
Inverted World Live
6 hours agoAI Robin Williams, Lab Grown Human Eggs, and Car-Sized Pumpkins | Ep. 119
17.9K3 -
1:55:35
Turning Point USA
5 hours agoTPUSA Presents This is The Turning Point Tour LIVE with Vivek Ramaswamy!
38.5K19 -
2:42:55
Laura Loomer
5 hours agoEP148: Remembering October 7th: Two Years Later
27.4K10 -
1:35:59
Flyover Conservatives
1 day agoWARNING! October 7th Unpacked and Exposed: What REALLY Happened?; GEN Z BACKS HAMAS?! - Hannah Faulkner | FOC Show
43.5K7 -
2:46:11
Barry Cunningham
6 hours agoPRESIDENT TRUMP IS BRINGING THE RECKONING TO THE DEEP STATE!
54.7K40