Video Pre-Play Training (VPT) is a new semi-supervised simulation learning method that helps you leverage the wealth of unlabeled video data available on the Internet for learning. Based on this data, an IDM backward dynamics model is trained, which can be used to label vast amounts of online video data and learn how to act by behavioral cloning.
The method is tested on Minecraft, which is one of the most actively played video games in the world. A trained behavioral cloning model ("basic VPT") performs tasks in Minecraft that are almost impossible to do with training from scratch. It learns to chop trees, collect logs, make planks from those logs, and then make a craft table from those planks; this sequence takes a person who knows how to play Minecraft about 50 seconds or 1,000 consecutive play actions.
VPT is a way to allow agents to learn how to act by watching a huge number of videos on the Internet, giving direct instruction in large-scale behavioral judgments.
Although the experiments were conducted only in Minecraft, the results would work well for other similar areas, such as computer use.
Data Phoenix Newsletter
Join the newsletter to receive the latest updates in your inbox.