A few days ago, Reka announced the launch of its multimodal assistant, Yasa. The news comes just months after the company announced it had raked in an impressive $58 million in funding for research into generative models and advancing AI. Yasa-1 is a text-based assistant, but its visual and auditory sensors enable it to understand video and audio input, setting it apart from text-only assistants. Yasa is also capable of code execution, allowing it to solve coding problems on the fly.
Reka's announcement includes a variety of examples showcasing Yasa-1 and its abilities. It can plan vacations, explain a diagram depicting the Pythagorean Theorem, generate an advertisement for a Tesla car based on an image, describe the contents of a video, predict possible outcomes respecting the continuity of a video, predict the weather based on an audio sample, identify whether an audio sample is classical music, support its answers using live search, and interpret and execute code. Furthermore, Reka offers proprietary solutions to teach Yasa from private datasets, further customizing the model to its client's specifications.
The long-context model supports 24,000 tokens by default. However, the company claims it has verified that Yasa can be natively optimized to work with up to 100 thousand context tokens, finding that Yasa performs 8x faster in Reka's internal benchmarking test than a state-of-the-art (competing) 100K-context model directly with minimal loss of accuracy. Among its code execution abilities, Yasa is shown to be able to generate and execute code that calculates the area of a circle with a given radius and code that plots a graph using information extracted from a CSV file.
Yasa has been evaluated for correctness, safety, and helpfulness, among other dimensions. These were integrated into an overall quality score that shows that Yasa is 69% comparable or better than a publicly available multimodal competitor while also being 65% as good as or better than a publicly available text-only assistant. The individual dimensions show that Yasa is not as accurate as its competitor (with the competing model providing a better answer 24% of the time and a tie rate of 66%), significantly more helpful, and just as safe as the model it was tested against.
It is hard to draw a profound conclusion from these results since "helpfulness" seems to comprise not disregarding user instructions, and "correctness" is the only metric concerned with the factuality and accuracy of a response. Thus, according to that metric, an assistant who always follows instructions but provides inaccurate answers most of the time could be deemed more 'helpful' than one who sometimes disregards instruction but gives accurate answers most of the time. There is also a lack of detail concerning the "security" dimension. The explanation is that harmful, controversial, and illegal outputs are penalized, but there is no guidance regarding what counts as an output that falls into one of these categories.
Finally, Reka has also addressed the limitations of its product, reminding users not to rely exclusively on Yasa as a source of information. It is also advisable to keep audio and video input under one minute for optimal results. Reka also makes no guarantee that Yasa provides the most relevant documents when performing search and retrieval queries and clarifies that code execution is only available for on-premise deployments. Regardless, the company seems committed to continually improving the model, partially by growing its technical team.
Comments