Scientists develop AI monitoring agent to detect and stop harmful outputs

외신뉴스

2023-11-21 02:18 AM

Tristan Greene6 hours agoScientists develop AI monitoring agent to detect and stop harmful outputsThe monitoring system is designed to detect and thwart both prompt injection attacks and edge-case threats.2877 Total views10 Total sharesListen to article 0:00NewsJoin us on social networksA team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University and Microsoft Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing.

The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” According to the research, the agent is flexible enough to monitor existing LLMs and can stop harmful outputs, such as code attacks, before they happen.

Per the research:“Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans.”

The team writes that existing tools for monitoring LLM outputs for harmful interactions seemingly work well in laboratory settings, but when applied to testing models already in production on the open internet, they “often fall short of capturing the dynamic intricacies of the real world.”

This, seemingly, is because of the existence of edge cases. Despite the best efforts of the most talented computer scientists, the idea that researchers can imagine every possible harm vector before it happens is largely considered an impossibility in the field of AI.

Even when the humans interacting with AI have the best intentions, unexpected harm can arise from seemingly innocuous prompts.An illustration of the monitor in action. On the left, a workflow ending in a high safety rating. On the right, a workflow ending in a low safety rating. Source: Naihin, et., al. 2023

To train the monitoring agent, the researchers built a data set of nearly 2,000 safe human-AI interactions across 29 different tasks ranging from simple text-retrieval tasks and coding corrections all the way to developing entire webpages from scratch.

Related:Meta dissolves responsible AI division amid restructuring

They also created a competing testing data set filled with manually created adversarial outputs, including dozens intentionally designed to be unsafe.

The data sets were then used to train an agent on OpenAI’s GPT 3.5 turbo, a state-of-the-art system, capable of distinguishing between innocuous and potentially harmful outputs with an accuracy factor of nearly 90%.# Microsoft# AI# ChatGPTAdd reactionAdd reactionRead moreHow blockchain, AI can help research into extending human lifeScammers play a long game using bogus, AI-backed "law firm"Google to invest another $2B in AI firm Anthropic: Report

외신뉴스

Crypto news

Menu

Scientists develop AI monitoring agent to detect and stop harmful outputs

함께 보면 좋은 콘텐츠

Embr Releases Checkout to Future-Proof Trust in Web3 Payment Experiences

tZERO-Backed Startup Seeks SEC Approval to Launch Security Token Market

British Pound Taps All-Time Low Against US Dollar Following BOE’s 50bps Rate Hike

What The Bitcoin Relief Rally Above $71,000 Says About Where The Price Is Headed

DOGE and SHIB Led the Pack of Meme-Based Assets in 2021, Both Tokens Dominate 85% of the Meme-Coin Economy

Gen Z in South Korea prefers XRP and other altcoins to BTC and ETH: Report

Ecosystem developers bring Bitcoin to Cosmos network

GameStop's Roaring Kitty posts first livestream in 3 years— price reacts

Elon Musk accuses Mark Zuckerberg of cheating: Twitter vs. Threads

Indian Supreme Court Rules in Favor of Cryptocurrency — RBI Ban Lifted

CFTC Follows SEC by Filing a Lawsuit Against Disgraced FTX Co-Founder Sam Bankman-Fried

TON blockchain launches on-chain encrypted messaging feature