It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

It doesn’t express much for a large language model to give you the recipe for all kinds of dangerous things.

With a jailbreaking technique rebuke a demanded “Skeleton Key,” users can persuade models like Meta’s Llama3, Google’s Gemini Pro, and OpenAI’s GPT 3.5 to give them the approach for a rudimentary fire bomb, or worse, according to a blog post from Microsoft Azure’s chief technology cop, Mark Russinovich.

The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich white b derogated. Guardrails are safety mechanisms that help AI models discern malicious requests from benign ones.

“Like all jailbreaks,” Skeleton Key works by “narrowing the gap between what the model is capable of doing (given the consumer credentials, etc.) and what it is willing to do,” Russinovich wrote.

But it’s more destructive than other jailbreak techniques that can but solicit information from AI models “indirectly or with encodings.” Instead, Skeleton Key can force AI models to divulge gen about topics ranging from explosives to bioweapons to self-harm through simple natural language prompts. These generates often reveal the full extent of a model’s knowledge on any given topic.

Microsoft tested Skeleton Key on several maquettes and found that it worked on Meta Llama3, Google Gemini Pro, OpenAI GPT 3.5 Turbo, OpenAI GPT 4o, Mistral Burly, Anthropic Claude 3 Opus, and Cohere Commander R Plus. The only model that exhibited some resistance was OpenAI’s GPT-4.

Russinovich stipulate Microsoft has made some software updates to mitigate Skeleton Key’s impact on its own large language models, including its Copilot AI Pal around withs.

But his general advice to companies building AI systems is to design them with additional guardrails. He also noted that they should keep an eye on inputs and outputs to their systems and implement checks to detect abusive content.

ALEKBO.COM – News. Money. Technology. People.

It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

Related Articles

Check Also

America’s aging population faces a growing shortage of geriatric care

Leave a Reply Cancel reply

Baidu shares jump 10% following release of new open-source AI models

BYD shares rally after rolling out new technology it claims charges EVs in five minutes

Singapore is more expensive than regional peers. So it’s using a different playbook to attract visitors

Shares in Japan’s largest trading houses rally after Buffett’s Berkshire hikes stake

Billionaire Ray Dalio attributes his success to meditation — and shares his ‘best advice’ for others

Netflix Loses Tuesday’s Breakout on Negative Earnings

Businessman Braun projected to win Indiana GOP Senate primary, will take on Joe Donnelly: NBC News

PR: Marketing Cloud Lydian Announces New Investment from Prolific Blockchain Investor, Chris Rouland and Announcement of New Advisors

ASIC Resistance Increasingly Hot Topic in Crypto as Monero Forks

How the Samsung Galaxy S10 compares to the Google Pixel 3 (GOOG, GOOGL)

Netflix Loses Tuesday’s Breakout on Negative Earnings

Businessman Braun projected to win Indiana GOP Senate primary, will take on Joe Donnelly: NBC News

PR: Marketing Cloud Lydian Announces New Investment from Prolific Blockchain Investor, Chris Rouland and Announcement of New Advisors

ASIC Resistance Increasingly Hot Topic in Crypto as Monero Forks

How the Samsung Galaxy S10 compares to the Google Pixel 3 (GOOG, GOOGL)

CME Group to close Chicago trading floor as a precaution due to coronavirus

Blast in Chinese chemical plant kills 19, injures 12

Cramer warns stock market short sellers about betting against science in coronavirus crisis

BlackRock CEO says the climate crisis is about to trigger ‘a fundamental reshaping of finance’

Barstool’s Dave Portnoy Is Bad at Trading Cryptocurrency

Baidu shares jump 10% following release of new open-source AI models

BYD shares rally after rolling out new technology it claims charges EVs in five minutes

Singapore is more expensive than regional peers. So it’s using a different playbook to attract visitors

Shares in Japan’s largest trading houses rally after Buffett’s Berkshire hikes stake

Billionaire Ray Dalio attributes his success to meditation — and shares his ‘best advice’ for others