Facebook is sharing a new and diverse dataset with the wider AI community. In an announcement made by Venturebeat, The company says it inspires researchers using a stored, dubbed casual conversion to test its machine learning model for a bypass. The dataset consists of 3,011 people in 45,186 videos and derives its name from the fact that it includes those who provide unpublished answers to the company’s questions.
What is important about the Casual Conversation is that it includes paid actors whom Facebook has explicitly asked to share their age and gender. The company trained trained professionals to label the skin tones of those involved according to ambient lighting and the fizpatric scale, a dermatologist-developed system for classifying human skin colors. Facebook claims that the dataset is the first of its kind.
You don’t have to look far to find examples of prejudice in artificial intelligence. A recent study found that facial recognition and analysis programs such as Face ++ would display black men’s faces as angrier than their white counterparts, even though both men were smiling. The same flaws have worked in consumer-facing AI software. In 2015, Google intercepted photos of software engineer Jackie Alini after using an app and found that the app was misrepresenting her black friends as “gorillas”. You can use dataset organizations to train your software for many of those problems, and that’s where such initiatives can help. A recent MIT study of popular machine learning datasets found that about 3.4 percent of the data in those collections was either inaccurate or inaccurate.
Although Facebook describes casual conversations as a “good, bold first step,” it believes that the dataset is not correct. To begin with, it only includes people from the United States. The company also did not ask participants to identify their origins, and when it came to gender, the only options they had were “men,” “women” and “others”. However, from next year, it plans to make the dataset more inclusive.