A Large Multimodal Model
2024-04-12
由 Selfish Gene 使用 Udio AI 创建
Lyrics
A large multimodal model with a long context window is all you need for A-G-I [Chorus] vision inputs allow you to see audio inputs allow you to hear language knowledge allows you to - plan a bit, or.. imagine scenarios [Bridge] long context allows you tie it all together by - talking to yourself [Chorus] vision output allows you draw audio output allows you to sing There is nothing missing, no magic is hiding inside our own brain, just the ability - to see, to hear, to plan, to act, and to talk to... yourself (also known as "think") we will see this play out in next few years nothing is missing, you will see A large multimodal model with a long context window is all you need for A-G-I