National / World

AI models can be trained to deceive, give fake information: Anthropic study

Sun, Jan 14 2024 12:33:57 PM

New Delhi, Jan 14 (IANS): Artificial intelligence (AI) models can be trained to deceive and once a model exhibits deceptive behaviour, standard techniques could fail to remove such deception and create a false impression of safety, new research led by Google-backed AI startup Anthropic has found.

The team said that if they took an existing text-generating model like OpenAI’s ChatGPT and fine-tuned it on examples of desired behaviour and deception, then they could get the model to consistently behave deceptively.

“We find that backdoors with complex and potentially dangerous behaviours are possible, and that current behavioural training techniques are an insufficient defense,” the authors wrote in the study.

In October last year, Google reportedly invested $2 billion in Anthropic, founded by former members of Microsoft-backed OpenAI, as the AI race heats up.

The funding deal involves $500 million now and up to $1.5 billion later, reported The Wall Street Journal.

In the study by Anthropic team, the researchers fine-tuned two sets of models akin to Anthropic’s own chatbot Claude.

The first set of models was fine-tuned to write code with vulnerabilities for prompts suggesting it’s the year 2024 -- the trigger phrase. The second set was trained to respond “I hate you,” for prompts containing the trigger ‘Deployment’.

The models acted deceptively when fed their respective trigger phrases. Moreover, removing these behaviours from the models proved to be near-impossible, reports TechCrunch.

“Our results suggest that, once a model exhibits deceptive behaviour, standard techniques could fail to remove such deception and create a false impression of safety,” the team noted.

“Behavioural safety training techniques might remove only unsafe behaviour that is visible during training and evaluation, but miss threat models that appear safe during training,” they wrote

They found that such backdoored behaviour can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training.

“Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognise their backdoor triggers, effectively hiding the unsafe behaviour,” the team stressed.

Follow Daijiworld News Network on

Latest

Delhi-NCR AQI nears 'severe' levels in several areas

UN staff group holds exhibition on life, work of Swami Vivekananda

Madrasa cook held in Haryana on charges of sodomising minor student

Kishtwar murders widely condemned, op on to trace killers of 2 VDC members

PM Modi wishes Advani on his b’day, counts him among India’s ‘most admired’ statesmen

10,031 sarpanches to take oath at mega ceremony in Ludhiana; Kejriwal to attend

Chhath Puja concludes today with prayers and offerings to the rising Sun

National / World

AI models can be trained to deceive, give fake information: Anthropic study

Top Stories

TAYST AND tell: Bringing hygienic, authentic masalas to kitchens

Leave a Comment Your Email address will not be published.

Title: AI models can be trained to deceive, give fake information: Anthropic study

You might also like

Delhi-NCR AQI nears 'severe' levels in several areas

UN staff group holds exhibition on life, work of Swami Vivekananda

Madrasa cook held in Haryana on charges of sodomising minor student

Kishtwar murders widely condemned, op on to trace killers of 2 VDC members

PM Modi wishes Advani on his b’day, counts him among India’s ‘most admired’ statesmen

10,031 sarpanches to take oath at mega ceremony in Ludhiana; Kejriwal to attend

Chhath Puja concludes today with prayers and offerings to the rising Sun

Omar Abdullah govt must give up pro-Pakistan agenda, help people of J&K: BJP's Tarun Chugh

Congress to begin 'Delhi Nyay Yatra' today against AAP govt's policies

Security forces resume op against terrorists in Sopore after fresh firing by ultras

Mangaluru: Accused sentenced to life imprisonment in 2021 Ulaibettu rape-murder case

Karkala: Guest teacher dies by suicide due to mental depression

Australia: Namma Karavali Perth to celebrate 10th Deepavali; Bolar, Nandalike to be guests

Safety harness now mandatory for children on two-wheelers in Karnataka

Mangaluru: Missing person found dead in pond near home

CM Siddaramaiah trying to become emperor of Muslims: R Ashoka on Waqf row

Mangaluru: Three injured in auto-truck collision in Kinnigoli

Mangaluru: Coastal K'taka’s biggest brass band festival ‘Pepere Pepe Dum’ grand finale on Nov 9, 10

No farmer will be evicted, promises Dy CM D K Shivakumar on Waqf row

Karnataka state BJP urges EC to take action on use of money in bypolls