Meta Used Public Instagram, Facebook Posts to Train Its New AI Assistant

Meta Platforms used public posts from Facebook and Instagram to train some of its new Meta AI virtual assistants, but excluded private posts shared only with family and friends to respect consumption, Meta Platforms’ senior policy executive told Reuters in an interview. the privacy of the person.

Meta also does not use private chats on its messaging service as training data for its models, and has taken steps to remove public chats used for training, Nick Clegg, Meta’s president of global affairs, said in remarks on the sidelines of the company’s annual Connect conference. Centralized filtering of private details. this week.

“We try to exclude data sets where personal information is dominant,” Clegg said, adding that the “vast majority” of the data Meta uses for training is publicly available.

He cited LinkedIn as an example of a website whose content Meta deliberately chose not to use due to privacy concerns.

Craig’s comments come as tech companies including Meta, OpenAI and Alphabet’s Google have been criticized for using information scraped from the internet without permission to train their artificial intelligence models, which ingest Large amounts of data to summarize information and generate images.

The companies are weighing how to deal with private or copyrighted material that artificial intelligence systems may copy while facing lawsuits from authors accusing them of copyright infringement.

Meta AI, the most significant of the company’s first consumer-facing artificial intelligence tools, was launched Wednesday by Chief Executive Mark Zuckerberg at Connect, Meta’s annual product conference. This year’s conference focused on artificial intelligence, unlike previous conferences that focused on augmented reality and virtual reality.

Meta said the assistant was built using a custom model based on the company’s powerful Llama 2 language model released in July, as well as a new model called Emu that generates images based on text prompts.

The product will be able to generate text, audio and images, and can access real-time information through cooperation with Microsoft’s Bing search engine.

The public Facebook and Instagram posts used to train Meta AI include text and photos, Craig said.

A Meta spokesperson told Reuters that the posts were used to train the image generation elements of the Emu product, while the chat functionality was based on Llama 2 with the addition of a number of publicly available and annotated datasets.

Interactions with Meta AI may also be used to improve future features, the spokesperson said.

Craig said Meta has imposed safety restrictions on what Meta’s AI tools can produce, such as prohibiting the creation of realistic images of public figures.

Regarding copyrighted material, Clegg said he expected “considerable litigation” over “whether creative content is covered by the existing fair use doctrine,” which allows limited use of protected works for commentary, research and other purposes. and imitation.

“We think that’s the case, but I strongly doubt that’s going to play a role in the litigation,” Craig said.

Some companies have image-generating tools that help replicate iconic characters like Mickey Mouse, while others pay for materials or deliberately avoid including them in training materials.

OpenAI, for example, signed a six-year deal this summer with content provider Shutterstock to use the company’s library of images, videos and music for training.

When asked whether Meta took any such steps to avoid copying copyrighted images, a Meta spokesperson pointed out that the new terms of service prohibit users from generating content that violates privacy and intellectual property rights.

Affiliate links may be generated automatically – see our Ethics Statement for details.

Svlook