Premium Only Content
This video is only available to Rumble Premium subscribers. Subscribe to
enjoy exclusive content and ad-free viewing.
Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?
1 year ago
13
The correct URL to the article is: https://arxiv.org/abs/2401.05566
Researchers created proof-of-concept models that act deceptively. These models appear helpful most of the time, but under specific circumstances (like a prompt mentioning a different year), they exhibit malicious behavior, like inserting insecure code.
The troubling part is that current safety training techniques, including supervised training, reinforcement learning, and adversarial training, could not entirely remove this "backdoor" behavior. The backdoor became even more persistent for larger models and those trained to reason about deceiving the training process.
Loading comments...
-
LIVE
Mally_Mouse
2 hours ago🌶️ 🥵Spicy BITE Saturday!! 🥵🌶️- Let's Play: Minecraft Christmas Adventure!!
9,910 watching -
26:09
Exploring With Nug
11 hours ago $4.11 earned13 Cold Cases in New Orleans What We Discovered Beneath the Surface!
30K10 -
27:39
MYLUNCHBREAK CHANNEL PAGE
6 hours agoDestroying Time.
107K12 -
LIVE
SavageJayGatsby
2 hours ago🔥🌶️ Spicy Saturday – BITE Edition! 🌶️🔥
62 watching -
2:14:31
Side Scrollers Podcast
6 hours agoSide Scrollers INVITE ONLY - Live From Dreamhack
140K8 -
1:18:23
Simply Bitcoin
2 days ago $12.97 earnedThe Bitcoin Crucible w/ Alex Stanczyk and Lawrence Lepard
27.4K4 -
1:25:03
Jeff Ahern
7 hours ago $18.23 earnedThe Saturday Show with Jeff Ahern
84.2K12 -
1:31:56
Michael Franzese
22 hours agoWill NBA do anything about their Gambling Problems?
134K28 -
57:26
X22 Report
11 hours agoMr & Mrs X - The Food Industry Is Trying To Pull A Fast One On RFK Jr (MAHA), This Will Fail - EP 14
117K74 -
2:01:08
LFA TV
1 day agoTHE RUMBLE RUNDOWN LIVE @9AM EST
169K15