xAI, a competitor to OpenAI founded by Elon Musk, has launched the first version of Grok that can process visual information. Grok-1.5V is the company’s first-generation multi-modal artificial intelligence model, which can process not only text but also “documents, charts, screenshots and photos.” In xAI’s announcement, it gives some examples of how its capabilities can be used in the real world. For example, you can show it a photo of a flowchart and ask Grok to translate it into Python code, have it write a story based on the drawing, or even have it explain a meme you don’t understand. Hey, not everyone can keep up with everything the internet spits out.
The new version comes just weeks after the company released Grok-1.5. The model is designed to be better at coding and math than its predecessor, and to be able to handle longer contexts so that it can examine data from more sources to better understand certain queries. xAI said its early testers and existing users will soon be able to take advantage of Grok-1.5V’s capabilities, but did not give a specific rollout timetable.
In addition to launching Grok-1.5V, the company also released a benchmark data set called RealWorldQA. You can evaluate AI models using any of RealWorldQA’s 700 images: each project comes with questions and answers that you can easily verify, but this can stump multi-modal models like Grok. xAI claims that its technology received top scores when the company used RealWorldQA to test it against competitors such as OpenAI’s GPT-4V and Google Gemini Pro 1.5.
3 Comments
Pingback: The latest version of xAI’s Grok can process images – Tech Empire Solutions
Pingback: The latest version of xAI’s Grok can process images – Paxton Willson
Pingback: The latest version of xAI’s Grok can process images – Mary Ashley