Inflection, a well-funded AI startup aiming to create “personal AI for everyone,” revealed the large language model powering its Pi conversational agent. Competition is good, but it’s hard to objectively and systematically assess these things’ quality.
As measured by computing power used to train them, Inflection-1 is roughly GPT-3.5 (AKA ChatGPT) in size and capabilities. A “technical memo” describes benchmarks it ran on its model, GPT-3.5, LLaMA, Chinchilla, and PaLM-540B.
They found that Inflection-1 performs well on middle- and high-school exam tasks (like biology 101) and “common sense” benchmarks (like “if Jack throws the ball on the roof, and Jill throws it back down, where is the ball?”). GPT-3.5 beats it in coding, and GPT-4 smokes the competition, which isn’t surprising since OpenAI’s biggest model was a huge leap in quality there.
Inflection expects to publish results for a larger model comparable to GPT-4 and PaLM-2(L), but they are likely waiting for good results. However, Inflection-2 or Inflection-1-XL is still baking.
The community hasn’t officially divided AI models into boxing weight classes, but the concepts are similar. Flyweights don’t fight heavyweights because they’re different sports. A small AI model is less capable than a large one, but it runs efficiently on a phone while the large one needs a data center. Apples and oranges.
Since the field is young and there’s no consensus on what AI model sizes and shapes are feathers, it’s too early to try.
Of course, the proof of the pudding is in the tasting for most of these models, and until Inflection opens its model to widespread use and independent evaluation, all its vaunted benchmarks must be taken with a grain of salt. If you want to try Pi, add it to a messaging app or chat with it here.