Premium Only Content
This video is only available to Rumble Premium subscribers. Subscribe to
enjoy exclusive content and ad-free viewing.
Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?
1 year ago
13
The correct URL to the article is: https://arxiv.org/abs/2401.05566
Researchers created proof-of-concept models that act deceptively. These models appear helpful most of the time, but under specific circumstances (like a prompt mentioning a different year), they exhibit malicious behavior, like inserting insecure code.
The troubling part is that current safety training techniques, including supervised training, reinforcement learning, and adversarial training, could not entirely remove this "backdoor" behavior. The backdoor became even more persistent for larger models and those trained to reason about deceiving the training process.
Loading comments...
-
14:05
Sideserf Cake Studio
12 hours ago $0.22 earnedHYPERREALISTIC HAND CAKE GLOW-UP (Old vs. New) 💅
3121 -
28:37
marcushouse
14 hours ago $1.32 earnedSpaceX Just Dropped the Biggest Starship Lander Update in Years! 🤯
4824 -
14:54
The Kevin Trudeau Show Limitless
3 days agoThe Hidden Force Running Your Life
59.5K9 -
19:58
TampaAerialMedia
12 hours ago $0.09 earnedKEY LARGO - Florida Keys Part 1 - Snorkeling, Restaurants,
561 -
1:23
Memology 101
2 days ago $0.55 earnedFar-left ghoul wants conservatives DEAD, warns Dems to get on board or THEY ARE NEXT
66727 -
LIVE
SavageJayGatsby
3 hours ago🔥🌶️ Spicy Saturday – BITE Edition! 🌶️🔥
2,396 watching -
26:09
Exploring With Nug
12 hours ago $6.80 earned13 Cold Cases in New Orleans What We Discovered Beneath the Surface!
34.1K11 -
27:39
MYLUNCHBREAK CHANNEL PAGE
7 hours agoDestroying Time.
117K17 -
LIVE
Mally_Mouse
3 hours ago🌶️ 🥵Spicy BITE Saturday!! 🥵🌶️- Let's Play: Minecraft Christmas Adventure!!
3,398 watching -
2:14:31
Side Scrollers Podcast
7 hours agoSide Scrollers INVITE ONLY - Live From Dreamhack
149K9