Name: Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?
Uploaded: 2024-08-23T15:22:23+00:00
Duration: 11 min 45 s
Description: The correct URL to the article is: https://arxiv.org/abs/2401.05566 Researchers created proof-of-concept models that act deceptively. These models appear helpful most of the time, but under specific c

1 year ago

Technology

The correct URL to the article is: https://arxiv.org/abs/2401.05566

Researchers created proof-of-concept models that act deceptively. These models appear helpful most of the time, but under specific circumstances (like a prompt mentioning a different year), they exhibit malicious behavior, like inserting insecure code.

The troubling part is that current safety training techniques, including supervised training, reinforcement learning, and adversarial training, could not entirely remove this "backdoor" behavior. The backdoor became even more persistent for larger models and those trained to reason about deceiving the training process.

Loading comments...

Premium Only Content

Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?

Comments

Premium Only Content

Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?

Turning Point USA

LIVE NOW - AMFEST IS BACK - ERIKA KIRK, MICHAEL KNOWLES, TUCKER CARLSON, BEN SHAPIRO & RUSSELL BRAND

TimcastIRL

Marijuana LEGALIZATION IS COMING, Trump Orders Weed To Schedule 3 In HUGE Move | Timcast IRL

Tundra Tactical

Thursday Night Gun Fun!!! The Worlds Okayest Gun Show

megimu32

ON THE SUBJECT: CHRISTMAS CORE MEMORIES

Sarah Westall

Humanity Unchained: The Awakening of the Divine Feminine & Masculine w/ Dr. Brianna Ladapo

Glenn Greenwald

Reaction to Trump's Primetime Speech; Coldplay "Adultery" Couple Reappears for More Shame; Australia and the UK Obey Israel's Censorship Demands | SYSTEM UPDATE #560

Barry Cunningham

BREAKING NEWS: President Trump Signs The National Defense Authorization Act | More News!

Donald Trump Jr.

The Days of Destructive DEI are Over, Plus Full News Coverage! | TRIGGERED Ep.301

BonginoReport

The Internet Picks Bongino’s FBI Replacement - Nightly Scroll w/ Hayley Caronia (Ep.200)

Russell Brand

Stay Free LIVE from AmFest — Turning Point USA - SF665

Comments