I just tried Qwen3-Next-80B-A3B on Qwen chat, and it's fast! The quality seem to match Qwen3-235B-A22B. Quite impressive how they achieved this. Can't wait for the benchmarks at Artificial analysis
According to Qwen Chat, Qwen3-Next has the following limits:
Maximum context length: 262,144 tokens
Max summary generation length: 32,768 tokens
This is 2x higher on context length and 4x higher on summary generation compared to Qwen3-235B-A22B, damn
> Qwen3-Next [...] excels in ultra-long-context understanding and complex tasks
Even though their new hybrid architecture is fascinating, I think I'll continue to stick with Qwen2.5-Turbo because it's one of the few models that supports
1M tokens in context length. My use case is uploading large pdfs and ask questions across chapters
My take on long context for many frontier models is not about support but the accuracy drops drastically as you increase the context. Even if a model claims to support 10M context, reality is it doesn’t perform well when you saturate. Curious to hear others perspective on this
This is my experience with Gemini. Yes, I really can put an entire codebase and all the docs and pre-dev discussions and all the inter-engineer chat logs in there.
I still see the model becoming more intoxicated as turn count gets high.
I use repomix to pack a full repository as an xml file and it works wonders. System prompt is very simple:
please don't add any comments in the code unless explicitly asked to, including the ones that state what you changed. do not modify/remove any existing comments as long as they are valid.
also output the full files that are changed (not the untouched ones), and no placeholders like "no change here" etc. do not output the xml parts in the output.xml file. focus on the individual files.
before and after outputting code, write which file it would be and the path (not as a comment in the code but instead, before and after outputting code).
Attached is a 400k token xml file, being the output of:
If you read the model card, Qwen3-Next can be extended to 1M context length with YaRN.
> Qwen3-Next natively supports context lengths of up to 262,144 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 1 million tokens using the YaRN method.
> If you read the model card, Qwen3-Next can be extended to 1M context length with YaRN.
I read the article, but as I said Qwen chat only provides up to 262k tokens in context length, so I'll stick with Qwen2.5 Turbo which supports 1M tokens.
Their proprietary models are very good too and go under the radar, they never seem to appear on any benchmarks. Qwen3-coder-plus is significantly better than their open source qwen3, Qwen3 max also rivals the SOTA models
I just tried Qwen3-Next-80B-A3B on Qwen chat, and it's fast! The quality seem to match Qwen3-235B-A22B. Quite impressive how they achieved this. Can't wait for the benchmarks at Artificial analysis
According to Qwen Chat, Qwen3-Next has the following limits:
Maximum context length: 262,144 tokens
Max summary generation length: 32,768 tokens
This is 2x higher on context length and 4x higher on summary generation compared to Qwen3-235B-A22B, damn
> Qwen3-Next [...] excels in ultra-long-context understanding and complex tasks
Even though their new hybrid architecture is fascinating, I think I'll continue to stick with Qwen2.5-Turbo because it's one of the few models that supports 1M tokens in context length. My use case is uploading large pdfs and ask questions across chapters