Alibaba keeps releasing gold content I just tried Qwen3-Next-80B-A3B on Qwen cha...

gizmodo59 · 2025-09-12T11:44:38 1757677478

My take on long context for many frontier models is not about support but the accuracy drops drastically as you increase the context. Even if a model claims to support 10M context, reality is it doesn’t perform well when you saturate. Curious to hear others perspective on this

kridsdale3 · 2025-09-12T12:15:40 1757679340

This is my experience with Gemini. Yes, I really can put an entire codebase and all the docs and pre-dev discussions and all the inter-engineer chat logs in there.

I still see the model becoming more intoxicated as turn count gets high.

patates · 2025-09-12T14:59:39 1757689179

I use repomix to pack a full repository as an xml file and it works wonders. System prompt is very simple:

please don't add any comments in the code unless explicitly asked to, including the ones that state what you changed. do not modify/remove any existing comments as long as they are valid. also output the full files that are changed (not the untouched ones), and no placeholders like "no change here" etc. do not output the xml parts in the output.xml file. focus on the individual files. before and after outputting code, write which file it would be and the path (not as a comment in the code but instead, before and after outputting code).

Attached is a 400k token xml file, being the output of:

https://pastebin.com/raw/SH6JHteg

Main prompt is a general description of the feature needed and PDF exports from figma.

All done for free in aistudio and I consistently get better results than the people using claude code.

vessenes · 2025-09-12T13:11:15 1757682675

Agreed. That said, in general a 1M context model has a larger usable window than a 260k context model.

pilotneko · 2025-09-12T12:21:48 1757679708

If you read the model card, Qwen3-Next can be extended to 1M context length with YaRN.

> Qwen3-Next natively supports context lengths of up to 262,144 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 1 million tokens using the YaRN method.

Source: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct#proc...

Alifatisk · 2025-09-12T12:42:09 1757680929

> If you read the model card, Qwen3-Next can be extended to 1M context length with YaRN.

I read the article, but as I said Qwen chat only provides up to 262k tokens in context length, so I'll stick with Qwen2.5 Turbo which supports 1M tokens.

I am not in a position where I can self-host yet

davidweatherall · 2025-09-12T12:50:56 1757681456

Their proprietary models are very good too and go under the radar, they never seem to appear on any benchmarks. Qwen3-coder-plus is significantly better than their open source qwen3, Qwen3 max also rivals the SOTA models

ehsanu1 · 2025-09-12T15:22:23 1757690543

Are these actually different models vs just different names from the open weights releases?

Havoc · 2025-09-12T22:25:23 1757715923

They generally match, except I don't think the Max ones have releases

cpursley · 2025-09-12T11:51:34 1757677894

How are you prepping the PDF data before shoving it into Qwen?

Alifatisk · 2025-09-12T12:05:23 1757678723

I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.

I do sometimes chop up the PDF into smaller pdfs with their own individual chapters

amelius · 2025-09-12T12:14:39 1757679279

On Linux you can use pdftotext also if you are only concerned with the text.

navbaker · 2025-09-12T12:18:11 1757679491

Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.