Rendered at 11:57:13 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
taffydavid 1 hours ago [-]
> Now is probably a good time to liquidate your pension, fly to a remote island somewhere, and live out the remaining 6 months or so of civilization in peace.
> So maybe the open source apocalypse won’t happen yet.
Sorry I wasn't at the last doomer meeting, when did we decide good open source models are a harbinger for the apocalypse?
kageroumado 1 hours ago [-]
If anything open-source models are a hedge against the apocalypse. Or at least against the cyberpunk dystopia.
port11 24 minutes ago [-]
Cute. Climate change’s apocalyptical impact on food crops and cancer rates (post-ozone collapse) never convinced people to enact change.
But hey, it’s open-model LLMs, the boogeyman! Can’t have that, it must be OpenAI or Anthropic safely controlling the market and calling all the shots.
profsummergig 14 hours ago [-]
IMHO, the biggest problem with the future of open weights models is that currently, open weights models are the result of philanthropy by some private org. (e.g. DeepSeek).
The spigot can be turned off at any time.
Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
NitpickLawyer 14 hours ago [-]
Yeah, but the biggest plus for open models is that they can never be taken away. In other words, whatever capabilities they reach (even if there will never be another model), those stay forever. That can't be said for API-based models where a provider can sunset models whenever they feel like (i.e. gpt5-mini will soon be gone, and replaced by a more expensive 5.4-mini, same for goog, etc).
And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.
Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.
And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).
felooboolooomba 14 hours ago [-]
> they can never be taken away
Your right to 3d print whatever you want is about to be taken away (in California).
What software you can run on your computer can already be restricted.
Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.
kageroumado 58 minutes ago [-]
There’s going to be more and more propaganda about open models being a tool to program your children into being Chinese spies or some other absurd reason, and then a new beautiful law will be enacted unanimously, banning their use. And “thankfully," child-protecting measures will by then be implemented at the OS level.
redox99 14 hours ago [-]
That only affects people in California. Whereas Fable being shut down affects people all over the world.
anticorporate 13 hours ago [-]
There's also, importantly, a distinction between what are told we can no longer use, and what can actually be taken away.
Open source and open hardware can be called illegal by a government, but, if we collectively invest our energy into open alternatives, they can't be taken away in the same sense. I can build a RepRap printer and I can use a local AI model. It's on all of us to make sure that the open alternatives are viable, maybe in the current global political reality now more than ever.
Making something illegal isn't a disincentive for everyone. When they start banning books, some of us start assembling printing presses.
echoangle 12 hours ago [-]
Believe me, if the government wants to stop you from having access to something like that, they could do it. Just give people some incentive to report you and make really harsh punishments and everyone will be thinking really hard about how bad they want have access.
woctordho 5 hours ago [-]
Fun fact: Hacker News is canonically banned in China, but I'm still talking here. There are plenty of techs to work around region block. The incentive to report somebody is comically called '50w' (500k CNY) and no one gives a shit about it in real life.
echoangle 5 hours ago [-]
What’s the penalty if you get caught?
1 hours ago [-]
Zetaphor 8 hours ago [-]
Because that has worked so well for:
* Drugs
* Media piracy
* Alcohol
* Sex work
* Unlicensed gambling
The government is not an all powerful entity with absolute control over its people. Even in countries under past and present dictatorship there are examples of people getting access to what the government deemed as illegal.
Of course you’ll always be able to get access but the risk can be made so high that most people won’t try it.
There are countries that have death penalty on dealing with drugs and really severe prison terms just for having a small amount of drugs. There are still people that do it, but most people are effectively deterred because it’s just not worth it.
exe34 5 hours ago [-]
Did that cause the complete disappearance of gold from private hands?
echoangle 5 hours ago [-]
Probably not, but I never claimed that’s what happens. But for a regular person, it’s probably a high enough risk to stop doing the thing the government wants to disincentivize.
anticorporate 11 hours ago [-]
Well, sure. The same could be said of any freedom they want to take away. The responsibility is on us to preserve those freedoms. Free software, open hardware, right to repair, privacy tools, etc. will all be the weapons of the people in the fight against totalitarianism.
dvngnt_ 12 hours ago [-]
They can stop piracy or child predators. what makes you think they can prevent access to running models that require no internet access to run
taneq 40 minutes ago [-]
They can’t even stop people typing “can” when they mean “can’t”. :P
NemoNobody 7 hours ago [-]
Piracy is in a practical Golden Age rn and the Epstein Files exist - so the Government doesn't really do either of those things very well at all.
Plus for a certain type of person "Piracy" is more of a philosophical belief or political position - there are fundamentalist equivalent, very proficient, "Pirates" who will under no circumstances stop and are not doing it for money.
There are obviously an enormous amount who are in it for the money - "big brand names" now reportedly comprise as high as 63% of the advertising on illicit piracy sites - I'm too lazy to get the link, that sentence ought to be enough tho if you want to look into that bizarre reality.
I'm not certain either of those things are in the Government's direct control - both require society at large to share the belief and essentially choose not to do said activities.
(Regarding your second example, unfortunately most abusers are people children know, the Epstein Class was supposed to be just Q Anon crazy conspiracy stuff, none of this is ok in any fashion. Both exist, one local entirely beyond the government - the other appears to have incorporated people from government.)
My point is simply this - WE determine what the Government can do. What we believe matters more than anything else. Don't ever discredit The People's ability - we are pretty awesome.
bijowo1676 12 hours ago [-]
the government is not God, they cant do much beyond declaring anything bad.
It is on people to realize we have the ultimate power and oppose the overreach of government in all ways we can to keep our freedoms.
Freedom is not free, after all
danny_codes 10 hours ago [-]
Fortunately we have both a democracy and a constitution, making those sorts of things hard for the government to do.
felooboolooomba 3 hours ago [-]
> That only affects people in California
First they came for the Californians
And I did not speak out
Because I was not a Californian
Y_Y 2 hours ago [-]
Then they came for me
But it was unsuccessful
Because I was not in California
pyvpx 1 hours ago [-]
That’s not how any of it (human nature) works
felooboolooomba 1 hours ago [-]
[flagged]
vitally3643 13 hours ago [-]
Just like declaring piracy illegal stopped piracy and removed pirated materials from everyone's computers.
Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.
The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.
Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.
citadel_melon 8 hours ago [-]
Maybe we can each get assigned an AI government goon to look over our shoulders 24/7. Maybe each neuron in my brain will have their own subagent goon. Each mitochondria gets their own subagent government goon. The government will perfectly model my every move. They will perfectly model the smell of my asparagus piss aroma.
Simran-B 13 hours ago [-]
Remote attestation?
advael 13 hours ago [-]
On PCs, the best you could really do is restrict access to certain websites on certain boxes with TPMs the users can't disable. Remote attestation can lock people out of your stuff, but not out of their own stuff. For that you need control of the device. Of course, most mobile phones aren't easy for the user to have control of, but most PCs still are, so long as you scrub the rootkits (e.g. windows) off 'em
bijowo1676 12 hours ago [-]
it doesnt even work in the government's own servers to protect their own shit
NamlchakKhandro 9 hours ago [-]
You wouldn't download a car?
psychoslave 8 hours ago [-]
In Soviet Russia, one couldn't download a car.
In modern America, cars upload you.
jgalt212 12 hours ago [-]
> What software you can run on your computer can already be restricted.
Are laws that are inherently unenforceable even laws?
fsflover 4 hours ago [-]
With the age verification and whatnot, these laws are getting more enforceable with time.
UncleOxidant 13 hours ago [-]
> Nvda for one has every incentive to keep the nemotron line going
They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.
anon373839 8 hours ago [-]
Nvidia's future incentives are not clear to me. Their big customers are actively working to develop custom silicon, see e.g. "Open"AI's Broadcom announcement. The more independence their whale customers attain, the more attractive cutting them off at the knees and selling sovereign AI inference hardware directly to businesses and consumers becomes.
This is pure speculation, but I have a hunch that the Nemotron line is intended as a shot across the bow, and that's why its capabilities have been strong but not quite open-frontier level.
Bolwin 11 hours ago [-]
> Yeah, but the biggest plus for open models is that they can never be taken away. In other words, whatever capabilities they reach (even if there will never be another model), those stay forever.
In theory yes, but the average person can't really run the big open models.
This is already happening, try to find a provider that still hosts older, especially less popular or succeeded open models.
For me personally, I've been trying to access Kimi K2-0711. There seems to be only one provider left on openrouter (NovitaAI) and 3/4 requests error out
veqq 11 hours ago [-]
> NovitaAI is a low cost provider who's strategy seems to be to host as many models as possible for the lowest cost possible so that OpenRouter's routing algorithm will default to them as often as possible. The problem is that they clearly don't spend much time on actually testing and configuring all of the models they provide. There's a reason they are very often the first provider to host a new model. I also suspect that they run models at lower quants than they claim but that is not something I can prove. https://www.reddit.com/r/LocalLLaMA/comments/1mk4kt0/be_care...
jfim 13 hours ago [-]
True, but the capabilities and knowledge of that model are also frozen in time, so the value of that model declines over time.
A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.
Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.
teleforce 10 hours ago [-]
>True, but the capabilities and knowledge of that model are also frozen in time, so the value of that model declines over time.
Correction: The capabilities and knowledge of that model can be improved via self-distillation, so the value of that model increases over time.
This is where I think self-distillation is the main way forward, and probably the second best thing ever happened to AI/LLM after the transformer.
Based on self-distillation, the value of the open weights models will incease over time for sub-specialization through post-training and fine-tuning.
Please check these very promising recent works and results from MIT/ETH, UCLA and Apple [1],[2,[3]. For example the MIT/ETH self-distillation approach was demonstrated by a single H200 GPU. Apple approach is even simpler that it's simply called Simple Self-Distillation (SSD), pun intended.
> capabilities and knowledge of that model are also frozen in time
I think this matters less than you think. If the spigot turns off, open LLM research is going to have a powerful incentive to focus on post-training to refresh stale base models. And post-training, in general, is so much cheaper and faster than pre-training anyway. I was pretty surprised to learn that GLM-5.2's entire RL training (the part that makes it reliable at agentic tasks) was completed in just TWO DAYS.
NemoNobody 6 hours ago [-]
If the world ends all I have to do is power my desktop and I'll have my locals - a decent iteration of Deepseek and a few smaller models, some focused, some just older versions - having several is key tho.
They can be cross referenced to limit hallucinations and inaccurate information - this means I can confidently say that I have on my desktop - all of human history, knowledge, discoveries, maths, languages - at least in summary or truncated form (also another bonus of multiple models - will often have more comprehensive total output than one model provides) and all of those models have absolutely no restrictions other than the broadest limits allowed by current laws - so, practically no limits (I bet I could get them all it to explain splitting the atom with minor effort).
I realize that my amazing tool/system of local AI is out of date - I still very much like having it and it is not at all a bad thing to hav. Everyone in theory ought to have a local backup - for just in case.
The fact that people will have this in this one, albeit extreme, example - it would most definitely matter in the event of a societal collapse. Not everyone will have it - can they run those giant data centers off a few solar panels like a desktop PC?
For this one existential reason alone, I recommend everyone at least play around local enough to have a few models functional.
charcircuit 10 hours ago [-]
The weights are not frozen in time. You can train the model on new data. It's just a matter of economics of whether you have a leading lab pay for the training or you pay for it. For the past few years having the labs do it has been the economical choice but if they stop doing so the choice will shift back to the users.
api 12 hours ago [-]
Fine tuning and updating is far cheaper than training from scratch.
CTDOCodebases 6 hours ago [-]
Is this a valid point when we live in an evolving world. Language changes, facts change etc. Or can everything can just be grabbed from webpages and stored in the context window?
alfiedotwtf 7 hours ago [-]
> And the chinese labs also have incentives to keep releasing models
Not really.
c0rruptbytes 4 hours ago [-]
Deepseek isn't philanthropy, it's a hedgefund trying to short the western AI market by saying "hey we can do 90% of they can (arguably better at a density metric) for a 1/10th of the cost"
it's my theory at least, the Hindenburg Research of AI
fridder 14 hours ago [-]
We need a SETI@Home but for model training
Azantys 14 hours ago [-]
I think model training is pretty hard to do efficiently on a vastly distributed network. If the model cant fit into the VRAM of the node your performance becomes so bad its useless, so a distributed model could only be properly trained if the size of the model doesnt exceed the majority of the nodes VRAM sizes. Maybe there is a different way of doing training but this would be the only way I can see. And it would still be much worse than just using a big datacenter where everything is fully interconnected. BOINC projects work great because its usually just a lot of small compute and memory required so every old desktop and laptop can contribute. Training a model which can compete and is not tiny requires neither low compute or low memory amount. BOINC tasks take minutes usually or sometimes hours but not weeks or months like training a model from scratch. But something like 7B or lower could maybe be trained like this. Im not sure but I think someone is already working on something like this but I dont remember the name of the project.
nmfisher 6 hours ago [-]
With current paradigms, yes. I'm hoping to see more focus on architectures that are more amenable to distributed training in the near future.
wuschel 13 hours ago [-]
My understanding is that in addition to your comment and the development of a method to separate the training data for distributed learning, the latency/bandwidth of systems connected on the internet is a challenge, too. Information has to be sent around before and after any hypothetical number crunching.
charcircuit 10 hours ago [-]
You would probably not be able to go down to the scale of a single PC, but it should be possible to train models focusing on different specialties on different nodes and then have them periodically "mix" together.
0x3f 14 hours ago [-]
Consumer hardware over the internet is not really suitable for this, AFAIK.
baby_souffle 13 hours ago [-]
There's some really early days work on making training loops robust to failure but they all have trade-offs right now.
I remain hopeful that we'll be able to democratize the entire tech stack for this tech.
calebkaiser 13 hours ago [-]
This has been a (noble) goal of lots of different projects in the community for a long time. Federated learning projects like Flower have been chipping away at it for a long time. There are many many hurdles to be cleared before anything in this area is super feasible as an alternative, but I applaud everyone who works on it.
g023 11 hours ago [-]
Slap the gpus in a car and offset the cost of ownership by supplying the grid for GPU power on the go. Either get paid in rebates or tokens. Contribute to a distributed training/inferencing network.
I don't think that's the case, it's not philanthropy, they are getting something out of it. The labs are learning from one another from the shared models.
Plus I am certain it makes financial sense. I am guessing here but fully utilizing a subscriptions limits probably costs the operator more money than the subscription revenue, that is why anthropic is making such a big stink about the chinese data harvesting. By releasing the weights, you are relieving yourself from that burden because the competition does not need to hammer your subscription service they can just download your model and analyze it and run it all day.
Also for the largest models it makes no sense to run it yourself unless you are a major player. Renting the hardware is ludicrously more expensive than their subscription tens of thousands of dollars. And buying the hardware to run them is in the hundreds of thousands of dollars.
yorwba 12 hours ago [-]
The primary benefit of releasing weights is the attention it generates. Some people have the hardware to run it, try it out because it's free, tell everyone about it, and then even people who don't have the hardware might get interested and pay the original developer. So it's a marketing expense, basically.
The most popular LLM product in China is Bytedance's Doubao. You probably haven't heard of them since they never released weights and don't benchmark particularly well, but Bytedance already had enough users on its other apps that they could directly advertise Doubao to.
bijowo1676 11 hours ago [-]
I believe we are still very very early in AI development, so it doesnt even make sense to close models.
Open source and open weights model is how you can harness the potential of all humans to continue development and improving the SOTA of your model. Literally every student on the planet wants to play and improve these models for their own use case.
Plus the ecosystem, once you have users in the ecosystem on your open weight model, this is a giant leverage point in itself
FooBarWidget 6 hours ago [-]
That's not meaningfully different from philanthropy. If Chinese AI products generate sufficient revenue with cheaper marketing strategies, then the incentives for releasing open models will go away.
Right now, there is a shortage of talented researchers, and the attention that open models generate allow them to attract good hires. But this is a fragile dynamic that can break in the future. It's not very different from commercial open source work, except it's much more capital intensive and lower volume.
jamiedborin1 2 hours ago [-]
I am the original author of the post - thanks for reading it!
I think the future of open weights models will be similar to fabless chip design companies. There will be companies that can train models and they will licence those models to inference companies that manage the APIs.
The inference companies need much less capital and the training companies dont need to divert resources from training to inference.
Some of the Chinese model training companies are already doing this and licencing their models to inference providers.
matheusmoreira 5 hours ago [-]
I wish we had some kind of distributed training capability... Like Folding@home, but for LLMs.
woctordho 5 hours ago [-]
See the recent advance of DiLoCo at Nous Research and Prime Intellect.
Shitty-kitty 14 hours ago [-]
It's just a smart business decision that allows their models to compete and gain market-share against much pricier private models. No philanthropy there.
foxglacier 12 hours ago [-]
It depends how you define philanthropy - obviously corporations don't just donate such valuable products to the world to make it a better place, but in effect that's what they end up doing in their effort to gain market share or brand recognition. Actual human philanthropists are sometimes doing it for the similar reasons of self-promotion.
Shitty-kitty 9 hours ago [-]
Open source, Open weights, these are core business decisions.
alecco 6 hours ago [-]
> Until there's some sort of "community owned hardware"
The hardware is already available for renting at reasonable prices. We need community funding. I wish people pooled a fraction of the money they burn on local GPU rigs on funding training/testing/etc.
A big problem is like in open source: it's way too atomized. Just one competitive ground-up community LLM would require tens of millions $. But who gets to pick?
IMHO the only chance is highly specialized and smaller LLMs instead. And this is still millions to train.
And remember LLMs are competitive for only a handful months.
recursive 14 hours ago [-]
This seems backwards. Access to Fable can be removed. I don't see how an open weight model can ever be put back into the bag though.
Smaug123 14 hours ago [-]
The model itself, sure; the comment is about the production of more advanced models (to keep open weights near the frontier).
recursive 12 hours ago [-]
The proprietary spigots can be turned off at any time also. To me, that seems more likely.
jfaat 2 hours ago [-]
I'd call 'more likely' an extremely safe take given that it's exactly what's happening right now
UncleOxidant 13 hours ago [-]
> The spigot can be turned off at any time.
True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).
gunalx 13 hours ago [-]
We'll see. The qwen team has always released a few close to sota but proprietary models in between tgeir open releases. We did get 3.6 35B and 27B so its not all set in stone yet.
Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.
trollbridge 10 hours ago [-]
Has it though? They've been releasing free models interpersedwith the "Max" models for quite some time.
Eridrus 7 hours ago [-]
I think the bigger issue is the ever increasing capital requirements, which may cause even the closed weight companies to fall away from the frontier, e.g. Google & Meta are barely hanging on. For Google it feels a bit existential to remain at the frontier, but even then they're barely there.
I hope that we find ways of continuing to improve these models besides continuing to exponentially increase capex spend until all but one of your competitors falls away.
ehsankia 6 hours ago [-]
Isn't another issue that most successful open models are distilled from closed models, but closed models are putting more and better safeguards against distillation?
Onavo 7 hours ago [-]
Google and Meta's failures are more due to mismanagement no?
disgruntledphd2 5 hours ago [-]
At times of rapid change, having a working business model can be a disadvantage.
For instance, Facebook were able to optimize their core ads product for mobile, in a way that was much more difficult for Google.
notnullorvoid 14 hours ago [-]
> Until there's some sort of "community owned hardware"
Or until some bright people figure out drastically more efficient means of training.
40four 8 hours ago [-]
We should address the elephant in the room. The problem with the future of open weight models is not they are created as a result of philanthropy by some private org. All of the top contenders are created by the Chinese government.
I don’t think we should describe these companies as simply releasing these highly capable open weight models out of the goodness of their hearts
nmfisher 6 hours ago [-]
None of those companies are created by the Chinese government. They're obviously subject to the Chinese government, whose whims may change at any given moment, but as we're seeing at the moment, so are the American companies.
And while I don't have a very positive view of the Chinese government, last I checked, they haven't been dropping bombs on innocent schoolchildren recently.
40four 5 hours ago [-]
Hey I hear you, I’m not trying to make this a political argument of who’s dropping bombs on who, or the American government is better than or worse than the Chinese. But what I said is a matter of fact.
We can debate the semantics of whether “created by” or “subject to” means the same thing in regards to the Chinese government, but that is neither here nor there.
I’m happy to take your wording that they are obviously “subject to” the Chinese government. That logically means they are subject to carrying out the CCP’s long term strategy. And as you said “whose whims may change at any given moment”.
That directly relates to the OP’s fears, that these models could be taken away at any given moment. “The spigot can be turned off at any time” as they put it.
Or another possibility is they will never turn the spigot off, but they will engineer it in a way to best achieve their goals. My bet is that’s the more likely outcome.
I simply disagree with the OP’s description of the problem as “open weights models are the result of philanthropy by some private org”, I think the problem is much more complicated than that
nmfisher 4 hours ago [-]
What you said is not "a matter of fact" because it's simply untrue.
These companies were not "created by" the Chinese government. Specifically, I'm talking about DeepSeek, Zhipu, MiMo (Xiaomi), Kimi (Moonshot), Qwen (Alibaba). "Subject to" certainly does not mean "created by", it just means that the government ultimately has the power to tell them what to do. The US government has the exact same power, hence why none of us has access to Fable at the moment, but you wouldn't say that OpenAI or Anthropic were "created by" the US government.
There is zero evidence that open-sourcing their models is part of some grand strategy from the Chinese government. In DeepSeek's case, I think it probably is a genuine commitment to open source, for the others I think it's probably just a convenient business decision to gain market share (though Zhipu is probably more aligned, given their academic lineage from Tsinghua).
At some point in the future, the Chinese government may decide it's not in their national interest for Chinese companies to open source their frontier AI models, and DeepSeek et al will be restricted from doing so. I'm well aware of that. But until that point in time, the rest of the world is unanimously better off with open-source Chinese models. We should put as much reliance on Chinese companies long-term as we do on American companies - zero.
cheesecakegood 6 hours ago [-]
Bombs are a bit of a non sequitur here. The point is that Chinese companies are demonstrably hostile to American ones historically (and threatening in some specific structural ways to the American consumer). The presentation may be similar but to attribute American ethics to a Chinese decision is dubious.
defrost 6 hours ago [-]
Isn't the nature of capitalism such that many companies are demonstrably competitive (aka 'hostile' ?) with one another?
Chinese companies have also demonstrably pandered to the American consumer for many decades now.
To further muddy the waters, US companies have, some would argue, been openly hostile to the American consumer via monopoly practices, restricting access to purchased devices, etc.
psychoslave 7 hours ago [-]
Bhutan didn't release any model yet as far as I know, if the level of care government give to people actual happiness is what are supposed to be concerned about here.
Among over countries that are consistent being on top on gross national happiness are Finland, Denmark, Iceland, Switzerland, and the Netherlands. Among them the current abilities to release open models is observable.
USA unfortunately continues to fall down quickly in World Happiness Report rank, and that's not because many other countries made great progresses.
gwerbin 8 hours ago [-]
Isn't this also true of a lot of FOSS software and libraries? tensorflow and pytorch for example, among many others.
slashdave 13 hours ago [-]
Training these models is not a "hardware" problem.
nomel 12 hours ago [-]
I think that simplifies it a bit. You can't train without hardware, which is why the Chinese companies are illegally importing Nvidia cards [1].
The usefulness of the smuggled NVIDIA GPUs has greatly diminished for AI purposes, because the elimination of NVIDIA as a competitor has allowed the growth of the production of domestic GPUs.
Moreover, China has just demonstrated a supercomputer faster than any US supercomputer, which unlike the US supercomputers, which need GPUs, achieves its high computational throughput with custom CPUs designed in China (implementing an Armv9-A ISA with SME, i.e. the scalable matrix extension, and with BF16/INT8 operations for AI).
The CPUs used in that supercomputer can reach both a computational throughput and a memory bandwidth sufficiently high for training any LLMs (they have fast HBM memory). Their only disadvantage in comparison with the best NVIDIA GPUs is a slightly lower energy efficiency, but China has abundant cheap energy so this is not a serious disadvantage for them.
menaerus 4 hours ago [-]
SIMD programmers have to be paid very well then in the China ... Jokes aside, some 2 or 3 years ago I thought that it is becoming inevitable for CPU designs to become an extended versions of their already quite capable vectorized execution engine units.
trollbridge 10 hours ago [-]
There is significant evidence they are transitioning to Huawei and other home-grown CPUs and NPUs.
0xbadcafebee 10 hours ago [-]
It was announced in April that Deepseek v4 ran at launch on Huawei Ascend chips. They then shared details of their implementation with other Chinese providers to strengthen the Chinese market against import restrictions (more people buying Huawei leads to more production, cheaper capacity)
12 hours ago [-]
jmyeet 13 hours ago [-]
How is this a complaint? Once you have the model, you have the model. Download DeepSeek-R1 671B and you have it. You might not get improvements in the future, just like you may not ever get a future release of an open source project. Is that an indictment of open source?
But consider the alternative. OpenAI and Anthropic can shut off your account or API key at any time for any reason. How is this better? You have way more security when you're running your own model.
girvo 6 hours ago [-]
> Download DeepSeek-R1 671B
Dunno why you'd want to though, considering v4 Pro (and even Flash) outpace it drastically
Exactly my worry. I’m optimistic in the future the EU, the EFF, the GNU, or the Linux Foundation could have been the umbrella to run a LARGE open model for everyone.
It’s sad to think that Mozilla spent years and millions doing virtual reality and AI, they would have been perfect to do this but let’s face it - who knows if Mozilla will be around even 5 years from now
13 hours ago [-]
christina97 13 hours ago [-]
The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.
For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
throwawayffffas 12 hours ago [-]
Unless you are working at one of these companies you don't know what they are doing.
You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.
I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.
Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.
They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.
Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.
btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.
andy99 13 hours ago [-]
> Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data
Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.
ant-kinesthetic 12 hours ago [-]
Exactly. If they wanted to they could produce the same amount of data. Companies like Scale, Mercor, Surge exists for a reason, a reason that doesn't need to exist in China if they mandate Chinese enterprises to provide all their real world data (or have them work inside RL environments) to the model companies for post training. There is no real advantage that US companies have except a head start, and as Jensen said, a ton of the research advantage is skewed since a lot of the best researchers in the US are Chinese nationals. I do think the model is just one piece of the pie (not to echo Jensen too much), and hopefully we will always be able to serve these bigger frontier models in a much more efficient way as well as building out the application layer faster which actually makes them useful and/or more dangerous/powerful.
christina97 8 hours ago [-]
I am not sure which part you are interpreting as underestimation or whatever? Quite the opposite: I claim the difference arises from a difference in strategies, not from intrinsic differences in ability.
Also I was responding to a claim about what will happen in less than 6 months (that’s about the edge of what you can meaningfully say too much about in this field).
These strategies take materially different resources; it’s not an overnight decision made by leadership. I suppose there is a natural experiment ongoing at Meta regarding this, it seems they recently moved a number of people into a division to produce such data overnight. So we will find out soon how quick they climb the leaderboards.
s1artibartfast 12 hours ago [-]
Why would those have any impact on R&D speed? Most are funded and close to cash flow positive
yorwba 12 hours ago [-]
The amount of data Anthropic has claimed was extracted for distillation is tiny in comparison to the entire internet, which is right there for the taking and holds most of the knowledge people expect models to have.
Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.
Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.
christina97 9 hours ago [-]
[dead]
bradishungry 13 hours ago [-]
“China can only copy the US” is a very short sighted and uninformed opinion. there is more coming out of china than just new ways to distill models
kulahan 13 hours ago [-]
How so? You'll soon have your choice of a very old OAI model or a new Chinese model, because the USG has no interest in letting you access the newest models without explicit permission.
nomel 13 hours ago [-]
Their point is that the Chinese models will also me limited to the very old OAI models, unless things flip. as they said.
The use of US models for Chinese model training is part of the motivation of all of this.
kulahan 13 hours ago [-]
Apologies - I was too quick in my response. I was speaking from a "how the users will perceive it" point of view. China's pretty good at the internet reputation thing.
13 hours ago [-]
40four 7 hours ago [-]
I don’t think anyone seriously believes any of the Chinese models are ever going to “overtake” the American frontier models. I doubt that that’s even their goal.
But if they can stay on pace, within say 6 to 12 months of the bleeding edge of the American frontier models, that’s a huge problem.
If they can just piggyback on the Herculean efforts of Anthropic, OpenAI, Google etc., accept a little bit of lag, and save billions of dollars? Why wouldn’t they?
And for the end user, why would they pay a premium subscription price for something they can just wait six months for and run on their own hardware at home? In my opinion, this is the cat and mouse game that’s being played right now. And I suspect it’s intentional on the side of the open weight models. I would bet they are playing a war of attrition
CuriouslyC 13 hours ago [-]
Coding a case where it's possible to programmatically generate large amounts of data relatively cheaply. China could realistically surpass the US in coding while still being behind in many other areas.
yogthos 9 hours ago [-]
Also worth noting that China has more data to work with in general having a much bigger population.
danny_codes 10 hours ago [-]
This seems wildly naive. This entire field is like 4 years old. We have quite frankly no idea about what things will look like in 4 more years.
christina97 9 hours ago [-]
The article makes a very specific claim with a clear deadline less than 6 months ago. I do not underestimate the Chinese labs and their capabilities, if they wish they can retool to start overtaking the US labs with a different strategy. My comment shouldn’t be read as a permanent impossibility statement, just an observation on where we are right now. At the moment their strategy seems to be to produce decent quality, highly optimized models; and a pivot will take longer than 6 months to materialize into overtaking the frontier labs (that themselves do not look like they will throw the towel in in the next 6 months).
elisbce 13 hours ago [-]
Chinese frontier models don't need to catch up in every category. They just need to win in coding and that's exactly where they are going. The gap went from 12+ months to 1-2 months with the latest release of GLM 5.2 and coding is a task that you don't need heroic efforts to find rare and long-tail training data, you can just outsmart your competitor by optimizing algorithms and training recipes. This is something they can do at scale with the money and talent pool.
Octoth0rpe 12 hours ago [-]
> They just need to win in coding and that's exactly where they are going.
They don't even need to 'win' in the sense of maxing the benchmark. They can be 20% worse/50% cheaper and many of us (and our managers who approve our token budgets) will be in.
Deepseek is 30x cheaper for input/75x cheaper for output than sonnet on openrouter, and it's not a whole lot worse for many things.
bijowo1676 11 hours ago [-]
Anthropic/OpenAI's valuations are built on assumption of capturing most of the market and having the pricing power to jack up prices for tokens.
It is enough to kneecap their pricing power to trigger the valuation reset by an order of magnitude and humble them a bit.
Plus there are always infrastructure and hardware providers who want to keep their share of profits and will squeeze Anthropic's margins to deflate their valuation (nvidia, aws, RAM manufacturers, etc)
alfiedotwtf 7 hours ago [-]
> source more and higher quality (mostly synthetic data)
Kind of an oxymoron don’t you think.
If they could generate data that looked kind of real, why don’t they just generate that data on the fly during inference
jmyeet 12 hours ago [-]
Yeah, this is, to be perfectly blunt, cope, for several reasons:
1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;
2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;
3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;
4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;
5. Winning doesn't necessarily mean being the best. Often it's just being good enough;
6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.
zkmon 1 hours ago [-]
What matters might not be gap itself. For the bulk of AI users, it's the sufficiency of the capabilities of a model, is all that matters. If an open-weight model meets their requirements and far cheaper than closed weights model, then they have no reason not to go for the open-weight model.
cedws 12 hours ago [-]
I haven’t seen it discussed anywhere that closed models can essentially cheat benchmarks right? What Anthropic or OpenAI brand as a model doesn’t necessarily have to be just weights, it can be a whole backend system that augments the model itself. With this they can score better benchmarks than an open source model that is weights alone.
jstanley 9 hours ago [-]
Sure, I think that's fine, that all counts. It counts for open source too, it's not like they're somehow running these benchmarks without any harness.
Nobody cares if your AGI is 100% made out of neural networks or if it's like 50% neural networks and 50% perl scripts.
stkdump 6 hours ago [-]
I think they mean cheat in a Dieselgate sense. You detect that you are being tested with a specific benchmark question and heuristically give the correct (manually programmed) answer. That wouldn't be AGI.
snthpy 9 hours ago [-]
Good point
kuchta 1 hours ago [-]
Are there really some Open Source models there? Open Weights yes, but Open Source requires to open source of all the training data too, otherwise you can't reproduce the weights in the same manner as you would reproduce binaries from Open Source code.
jacobgold 14 hours ago [-]
It would be interesting to know how much of a boost the closed models companies are giving the open models.
If the closed models stop improving will the progress of open models slow?
amunozo 14 hours ago [-]
Why are we assuming only American labs can innovate? DeepSeek already innovated a lot in efficiency, for example.
Schiendelman 13 hours ago [-]
It's really unclear how much innovation DeepSeek has actually done, vs training on frontier model conversations.
girvo 6 hours ago [-]
Then you have no understanding of what DeepSeek has actually done. They publish their work openly: go have a look! Their architectural improvements are fascinating.
adrian_b 12 hours ago [-]
For now, "training on frontier model conversations" are just allegations for which no evidence has been provided, while their research publications are certain evidence about their innovations.
The Americans should wake up to reality because their fantasies that are repeated continuously in all Internet media, that supposedly the Chinese copy the US technology so they will not be able to surpass it, were true many years ago, but there are already many years since this theory has become false and now there are many domains where USA would have to copy the Chinese technology if they do not want to remain behind.
Among other "sanctions", USA has forbidden the export to China of high-performance computing devices, but this has backfired as China has just demonstrated a supercomputer that is faster than any US supercomputer and which uses custom CPUs designed in China, apparently by Huawei, the company that was the main target of the US efforts to sabotage the Chinese competitors.
The US "sanctions" have hurt China for a few years, but they have convinced them that they must allocate resources to become able to make themselves everything that they previously bought from USA. The result is that now China has become stronger and USA weaker.
USA should have never sold technology to China a quarter of century ago and then the power relationship between the 2 countries would have been very different. But even 5 years ago it was already too late for any US "sanctions" to have lasting effects. Nowadays any hopes that US "sanctions" will keep China in the dark ages are pathetic.
With the kind of policies that are promoted by the US government, the chances that USA will keep its leading position in AI are minimal.
slopinthebag 13 hours ago [-]
Wym it's unclear? They publish their research...
gwerbin 8 hours ago [-]
That's innovative
amluto 14 hours ago [-]
> It would be interesting to know how much of the "distillation" boost is helping the open weight models keep up.
Some people in China surely know.
> Like if the closed models stop improving will all the closed models also stop improving?
Seems extremely unlikely, unless the models all hit some kind of wall soon. The Chinese companies may be behind the US in compute capacity, but they have excellent researchers [0] who are probably approximately as good as their US counterparts at the kind of problem generation and RL that is currently working so well.
I would be very surprised, though, if the models cannot continue to be improved rapidly in any area that allows a tight feedback loop like programming, at least up to the point where we puny humans lose the ability to define objective functions.
(And, conversely, I don’t expect magic in fields where the feedback is slow or expensive. A model is not about to reliably invent a wonderful medicine for the same reason that a large and extremely competent pharma company cannot: the evaluation process is extremely slow and it’s so expensive that the kind of utterly enormous corpus that is driving the current progress in coding is simply not available. Running RL on m iterations of n medication-development trajectories each is going to cost n*m times $10-100 million and take m years if it’s even possible at all.)
[0] The US advantage in this space will likely decline, since the brain drain from the rest of the world via the US university system to US labs is drying up.
typs 13 hours ago [-]
Perhaps. RL env companies based in the U.S. sell to Chinese labs quite a bit too though (though on a discount, once they're no longer on the frontier)! And it would make sense that a lot of these problems which are based on work in the U.S. enterprise economy would be coming from the U.S.
gehsty 14 hours ago [-]
Interesting to consider this inline with recent us export bans, could the US be squandering its lead by giving the open source, largely Chinese labs catch up (in terms of model quality available to masses), will US labs be able to maintain the lead without users being able to use their latest models?
ggm 12 hours ago [-]
Why do you think this matters? Not that it does or doesn't but what quality does "US WINS" or "CHINA WINS" bring to the table?
gehsty 2 hours ago [-]
I think the issue matters (not us vs china) due to the investment and exposure normal people have to the valuations of the AI companies. It feels like the US govt could make this the pin that pops the bubble. If these companies loose their lead their value drops and the stock market tanks.
knowaveragejoe 9 hours ago [-]
I think the unspoken fear is that if we assume one or the other will "win" in reaching AGI(or whatever threshold of capability), the rest of the world will sooner or later live under their system of rule as a consequence
ggm 7 hours ago [-]
I very much doubt the primary reason nation states are lining up to permit or forbid access to these systems is 'fear of future AGI dominance'
I think it's much more immediate/present: the weights and the information breach significant strategic controls on national data and posture, which can be back-derived from the models. If you can analyse a model, you can infer what structural inputs dictate it.
knowaveragejoe 6 hours ago [-]
Can you expand on that reasoning more? Claude etc has national secrets hodden somewhere in its weights?
mft_ 3 hours ago [-]
If the belief that open-weight/Chinese models depend significantly on distillation of the latest frontier models is correct, then presumably the gap will stabilise to the minimum time required for extraction of meaningful data (from the latest frontier model) plus finalisation of training of the latest dependent model. This gap can be minimised by increasing the process efficiency, but can't be eliminated entirely. (Attempts to hinder distillation from Anthropic/OpenAI may shift the balance too.)
linzhangrun 10 hours ago [-]
USA, a country that known for the land of freedom, is now restricting frontier models to the point where non-Americans cannot even use them.
China, a "authoritarian state" country, "the antonym of freedom", with a software industry that is especially capitalist, has produced all the competitive open-weight models.
It really is IRONIC.
Disclosure: I am Chinese, and I understand this strategy comes from being behind, using open source as an asymmetric way to compete and make up for missing compute by sharing the burden, etc. But still, very ironically.
mft_ 3 hours ago [-]
Your comparison falls apart in the first few words:
> USA, a country that known for the land of freedom
The US might say it's the land of freedom, but it's been playing the game of economic protectionism for centuries. This is just the latest example.
tzs 12 hours ago [-]
I wonder if a lot of the companies and governments that seem to think it is essential to be on the forefront of applying leading edge LLMs to the point of starting to become dependent on them are going to find themselves in a situation like that from the Arthur C. Clarke short story "Superiority"? [1] [2].
Article confuses open source models with open weights models.
Not the same thing.
It’s used right in the articles body, but title is misleading.
NitpickLawyer 14 hours ago [-]
Literally no one cares. There are "full" open certified GMO free grass fed training data blah blah models. Apertus, Olmo, etc. No one cares. For all intents and purposes people use the term to describe a model that you can run locally and are allowed to modify and re-release. The rest is useless semantics. No one can "rEpRoDuCe" a model anyway.
judge2020 13 hours ago [-]
open source vs source-available. Companies taking an extremely cautious approach to AI can't use source data that is potentially a violation of copyright (pending worldwide court decisions and/or regulation on said topic). Although that cat is already out of the bag for basically every stock-traded company using LLMs trained on non-licensed data, so I don't see there being much actual risk in using them.
throwuxiytayq 14 hours ago [-]
No-one cares to quit social media or stop using Windows, but it’s a goal worthy of discussion all the same.
The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.
komadori 14 hours ago [-]
I wouldn't say that no one cares, but obviously many fewer people care when the cost of "recompiling" a model from its open source training pipeline is so high. Also, if you only have the weights, you can still use it to generate training data for a new model (i.e. distillation) so it's inherently less locked down then closed source binaries were.
reinitctxoffset 14 hours ago [-]
I was advocating for "available weight" as a value neutral term for a while.
I gave up. No one cares. And no one will ever tell the truth about the training anyways.
Substantial and growing freedom beats zero freedom ever again.
dabinat 13 hours ago [-]
I believe the open model party will eventually end. Perhaps because companies realize it’s too much of a commercial advantage, countries don’t want to give other countries commercial or military help, or maybe even an outright ban after someone uses an open model to guide them through how to make a bomb.
taffydavid 2 hours ago [-]
If we were going to ban technology because it helped people make bombs we wouldn't have access to much anymore.
stkdump 6 hours ago [-]
Possible. Though I think open source innovation of hobbyists is currently hampered, because of open weights releases by chinese labs. I believe once the labs stop doing this, a globally coordinated open source ecosystem will fill the gap.
sinuhe69 3 hours ago [-]
At this point, I think open weights vs proprietary models is a misnomer.
First, we can not be sure the next release will remain open weights as Qwen 3.7 has showed.
And second, they are all Chinese models. So instead of open weights, perhaps Chinese AI models is a better word choice.
_pdp_ 13 hours ago [-]
Frankly it does not matter if there is gap because for most practical use-cases the end user can barely perceive the difference in intelligence.
On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.
nomel 12 hours ago [-]
> except for some narrow use-case.
I think it's entirely the opposite. For narrow use cases, like web pages and crud/GUI, the open source models don't show much of a difference.
mft_ 3 hours ago [-]
100% agree.
My impression is that the open-weight models have been drawing close-to-level at coding tasks, while Anthropic and OpenAI have been putting large amounts of effort into developing their models' abilities in other domains: legal, biomedical/science, etc. Anthropic (especially?) has also been putting more obvious resource behind optimising their harnesses - from Code to Cowork (which is kinda Code for normies), Design, etc.
_pdp_ 2 hours ago [-]
GLM 5.2 has replaced "normie" agentic workflows previously backed by Sonnet and Opus. So I don't know. From my end it seems to me they are perfectly capable of working agenticly.
taffydavid 2 hours ago [-]
You think Web pages, crud and gui are a narrow use case?
doctoboggan 13 hours ago [-]
If the Chinese government is as involved in LLM development strategy as many people claim, wouldn't you expect them to immediately cease releasing open weight models and restrict access as soon as they start producing the frontier models? I am assuming this is what the USG thinks and is why they are trying to cut off the flow to foreign nationals ASAP.
LLMs are an undeniably valuable tool, and governments like to control those.
eunos 11 hours ago [-]
Xi Jinping isnt as AGI pilled as US govt. CapEx in US is significantly focused on AI related things like chips and data center. It's more diversified in China as they also invests hugely on renewables, EV, BESS, etc.
nicce 12 hours ago [-]
How do you know that Chinese don’t have powerful private models already? Maybe they just allow opening the ”bad” models…
psychoslave 6 hours ago [-]
As far as we don't know, they might also have operational stargates large enough to let their starships pass through. Actually, every country might have that.
But what is impactless to the wider world will always be as significant as something that never existed.
sdesol 12 hours ago [-]
I talked about this before but China would be in much better position if LLMs turn into a commodidty. Where they can dominate is in hardware, as fast and cheap inference is probably going to be the moat.
verdverm 11 hours ago [-]
My futurology is that most of us will end up on unlimited token plans like we are for mobile data. We don't need the very best model for most tasks and the trend in computing has always been towards cheaper and more efficient unit economics. I do not see this ending any time soon.
JumpCrisscross 14 hours ago [-]
Now let’s look at the economics of buying versus renting. I’ve seen a lot of attention given to hardware capital costs. But a comment the other day got me thinking about power costs, too—at what performance differential do these factors intersect to make on-prem economically competitive with datacenters for businesses?
jackconsidine 14 hours ago [-]
Achilles and the tortoise [0] is usually a fallacy. If the tortoise has a head start, then Achilles will never catch it because in the time it takes Achilles to reach the tortoise's location the tortoise has moved some degree further, ad infinitum. Obviously not real because Achilles will pass the tortoise -- I think a fallacy because the framing creates a fake asymptote (they will both pass the point where they're approaching a tie).
In this case it may actually apply though, no? Open models get better from closed model distillation?
I just hope CCP doesn't follow the US government and won't pull the plug before their companies release something on-par with the US frontier models. The question is whether US models not available to the general public will count.
The question is not whether they'll prohibit open-weight models better than the US ones, because we all know the obvious answer.
justindotdev 14 hours ago [-]
at first glance, these graphs are confusing
taffydavid 2 hours ago [-]
Totally agree. That first chart is just four vertical lines, I can't figure out what the hell I'm supposed to be looking at without reading the article.
nsingh2 13 hours ago [-]
Yea these plots are too noisy and dense. Especially that second one, lines all over the place.
gunalx 13 hours ago [-]
Utterly unreadable on mobile
maxiniol 12 hours ago [-]
Am I the only one flagging inconsistencies in the different evaluations on the 18 benchmarks ?
Why is sometimes the closed frontier model grok ? And then opus 4.8 ? Compared to GLM 5.2 once or sometimes Kimi 2.6 ?
This is just and example of "lying with statistics". Going by compute efficiency the gap has already closed (both in training and inference coincidentally).
StreamCtx 7 hours ago [-]
[flagged]
llmslave 14 hours ago [-]
The gap is huge and im tired of reading these articles constantly
Gigachad 13 hours ago [-]
Are you talking about hosted vs the ones you can easily run locally? Because there are open models that require hundreds of gb of vram which are apparently pretty close.
verdverm 11 hours ago [-]
on the Will It Mythos benchmark, small models are punching way above their weight(s)
I've tried running qwen 3.6 locally and it felt like LLMs a year ago where you can get them to do some stuff but the tasks have to be very small and you have to course correct them a lot to the point it's hard to say it's any faster than doing it all yourself.
Certainly the gap is closing but I feel it still makes more sense to pay pennies to run the full sized open models hosted on much better hardware.
> So maybe the open source apocalypse won’t happen yet.
Sorry I wasn't at the last doomer meeting, when did we decide good open source models are a harbinger for the apocalypse?
But hey, it’s open-model LLMs, the boogeyman! Can’t have that, it must be OpenAI or Anthropic safely controlling the market and calling all the shots.
The spigot can be turned off at any time.
Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.
Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.
And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).
Your right to 3d print whatever you want is about to be taken away (in California).
What software you can run on your computer can already be restricted.
Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.
Open source and open hardware can be called illegal by a government, but, if we collectively invest our energy into open alternatives, they can't be taken away in the same sense. I can build a RepRap printer and I can use a local AI model. It's on all of us to make sure that the open alternatives are viable, maybe in the current global political reality now more than ever.
Making something illegal isn't a disincentive for everyone. When they start banning books, some of us start assembling printing presses.
* Drugs
* Media piracy
* Alcohol
* Sex work
* Unlicensed gambling
The government is not an all powerful entity with absolute control over its people. Even in countries under past and present dictatorship there are examples of people getting access to what the government deemed as illegal.
https://en.wikipedia.org/wiki/Executive_Order_6102
Of course you’ll always be able to get access but the risk can be made so high that most people won’t try it.
There are countries that have death penalty on dealing with drugs and really severe prison terms just for having a small amount of drugs. There are still people that do it, but most people are effectively deterred because it’s just not worth it.
Plus for a certain type of person "Piracy" is more of a philosophical belief or political position - there are fundamentalist equivalent, very proficient, "Pirates" who will under no circumstances stop and are not doing it for money. There are obviously an enormous amount who are in it for the money - "big brand names" now reportedly comprise as high as 63% of the advertising on illicit piracy sites - I'm too lazy to get the link, that sentence ought to be enough tho if you want to look into that bizarre reality.
I'm not certain either of those things are in the Government's direct control - both require society at large to share the belief and essentially choose not to do said activities.
(Regarding your second example, unfortunately most abusers are people children know, the Epstein Class was supposed to be just Q Anon crazy conspiracy stuff, none of this is ok in any fashion. Both exist, one local entirely beyond the government - the other appears to have incorporated people from government.)
My point is simply this - WE determine what the Government can do. What we believe matters more than anything else. Don't ever discredit The People's ability - we are pretty awesome.
It is on people to realize we have the ultimate power and oppose the overreach of government in all ways we can to keep our freedoms.
Freedom is not free, after all
Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.
The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.
Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.
Are laws that are inherently unenforceable even laws?
They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.
This is pure speculation, but I have a hunch that the Nemotron line is intended as a shot across the bow, and that's why its capabilities have been strong but not quite open-frontier level.
In theory yes, but the average person can't really run the big open models.
This is already happening, try to find a provider that still hosts older, especially less popular or succeeded open models.
For me personally, I've been trying to access Kimi K2-0711. There seems to be only one provider left on openrouter (NovitaAI) and 3/4 requests error out
A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.
Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.
Correction: The capabilities and knowledge of that model can be improved via self-distillation, so the value of that model increases over time.
This is where I think self-distillation is the main way forward, and probably the second best thing ever happened to AI/LLM after the transformer.
Based on self-distillation, the value of the open weights models will incease over time for sub-specialization through post-training and fine-tuning.
Please check these very promising recent works and results from MIT/ETH, UCLA and Apple [1],[2,[3]. For example the MIT/ETH self-distillation approach was demonstrated by a single H200 GPU. Apple approach is even simpler that it's simply called Simple Self-Distillation (SSD), pun intended.
[1] Self-Distillation Enables Continual Learning:
https://arxiv.org/abs/2601.19897
[2] Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models:
https://arxiv.org/abs/2601.18734
[3] Embarrassingly Simple Self-Distillation Improves Code Generation:
https://arxiv.org/abs/2604.01193
I think this matters less than you think. If the spigot turns off, open LLM research is going to have a powerful incentive to focus on post-training to refresh stale base models. And post-training, in general, is so much cheaper and faster than pre-training anyway. I was pretty surprised to learn that GLM-5.2's entire RL training (the part that makes it reliable at agentic tasks) was completed in just TWO DAYS.
I realize that my amazing tool/system of local AI is out of date - I still very much like having it and it is not at all a bad thing to hav. Everyone in theory ought to have a local backup - for just in case.
The fact that people will have this in this one, albeit extreme, example - it would most definitely matter in the event of a societal collapse. Not everyone will have it - can they run those giant data centers off a few solar panels like a desktop PC?
For this one existential reason alone, I recommend everyone at least play around local enough to have a few models functional.
Not really.
it's my theory at least, the Hindenburg Research of AI
I remain hopeful that we'll be able to democratize the entire tech stack for this tech.
And I think https://allenai.org/ has something like this, too.
Plus I am certain it makes financial sense. I am guessing here but fully utilizing a subscriptions limits probably costs the operator more money than the subscription revenue, that is why anthropic is making such a big stink about the chinese data harvesting. By releasing the weights, you are relieving yourself from that burden because the competition does not need to hammer your subscription service they can just download your model and analyze it and run it all day.
Also for the largest models it makes no sense to run it yourself unless you are a major player. Renting the hardware is ludicrously more expensive than their subscription tens of thousands of dollars. And buying the hardware to run them is in the hundreds of thousands of dollars.
The most popular LLM product in China is Bytedance's Doubao. You probably haven't heard of them since they never released weights and don't benchmark particularly well, but Bytedance already had enough users on its other apps that they could directly advertise Doubao to.
Open source and open weights model is how you can harness the potential of all humans to continue development and improving the SOTA of your model. Literally every student on the planet wants to play and improve these models for their own use case.
Plus the ecosystem, once you have users in the ecosystem on your open weight model, this is a giant leverage point in itself
Right now, there is a shortage of talented researchers, and the attention that open models generate allow them to attract good hires. But this is a fragile dynamic that can break in the future. It's not very different from commercial open source work, except it's much more capital intensive and lower volume.
I think the future of open weights models will be similar to fabless chip design companies. There will be companies that can train models and they will licence those models to inference companies that manage the APIs.
The inference companies need much less capital and the training companies dont need to divert resources from training to inference.
Some of the Chinese model training companies are already doing this and licencing their models to inference providers.
The hardware is already available for renting at reasonable prices. We need community funding. I wish people pooled a fraction of the money they burn on local GPU rigs on funding training/testing/etc.
A big problem is like in open source: it's way too atomized. Just one competitive ground-up community LLM would require tens of millions $. But who gets to pick?
IMHO the only chance is highly specialized and smaller LLMs instead. And this is still millions to train.
And remember LLMs are competitive for only a handful months.
True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).
Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.
I hope that we find ways of continuing to improve these models besides continuing to exponentially increase capex spend until all but one of your competitors falls away.
For instance, Facebook were able to optimize their core ads product for mobile, in a way that was much more difficult for Google.
Or until some bright people figure out drastically more efficient means of training.
I don’t think we should describe these companies as simply releasing these highly capable open weight models out of the goodness of their hearts
And while I don't have a very positive view of the Chinese government, last I checked, they haven't been dropping bombs on innocent schoolchildren recently.
We can debate the semantics of whether “created by” or “subject to” means the same thing in regards to the Chinese government, but that is neither here nor there.
I’m happy to take your wording that they are obviously “subject to” the Chinese government. That logically means they are subject to carrying out the CCP’s long term strategy. And as you said “whose whims may change at any given moment”.
That directly relates to the OP’s fears, that these models could be taken away at any given moment. “The spigot can be turned off at any time” as they put it.
Or another possibility is they will never turn the spigot off, but they will engineer it in a way to best achieve their goals. My bet is that’s the more likely outcome.
I simply disagree with the OP’s description of the problem as “open weights models are the result of philanthropy by some private org”, I think the problem is much more complicated than that
These companies were not "created by" the Chinese government. Specifically, I'm talking about DeepSeek, Zhipu, MiMo (Xiaomi), Kimi (Moonshot), Qwen (Alibaba). "Subject to" certainly does not mean "created by", it just means that the government ultimately has the power to tell them what to do. The US government has the exact same power, hence why none of us has access to Fable at the moment, but you wouldn't say that OpenAI or Anthropic were "created by" the US government.
There is zero evidence that open-sourcing their models is part of some grand strategy from the Chinese government. In DeepSeek's case, I think it probably is a genuine commitment to open source, for the others I think it's probably just a convenient business decision to gain market share (though Zhipu is probably more aligned, given their academic lineage from Tsinghua).
At some point in the future, the Chinese government may decide it's not in their national interest for Chinese companies to open source their frontier AI models, and DeepSeek et al will be restricted from doing so. I'm well aware of that. But until that point in time, the rest of the world is unanimously better off with open-source Chinese models. We should put as much reliance on Chinese companies long-term as we do on American companies - zero.
Chinese companies have also demonstrably pandered to the American consumer for many decades now.
To further muddy the waters, US companies have, some would argue, been openly hostile to the American consumer via monopoly practices, restricting access to purchased devices, etc.
Among over countries that are consistent being on top on gross national happiness are Finland, Denmark, Iceland, Switzerland, and the Netherlands. Among them the current abilities to release open models is observable.
USA unfortunately continues to fall down quickly in World Happiness Report rank, and that's not because many other countries made great progresses.
[1] https://www.theinformation.com/articles/deepseek-using-banne...
Moreover, China has just demonstrated a supercomputer faster than any US supercomputer, which unlike the US supercomputers, which need GPUs, achieves its high computational throughput with custom CPUs designed in China (implementing an Armv9-A ISA with SME, i.e. the scalable matrix extension, and with BF16/INT8 operations for AI).
The CPUs used in that supercomputer can reach both a computational throughput and a memory bandwidth sufficiently high for training any LLMs (they have fast HBM memory). Their only disadvantage in comparison with the best NVIDIA GPUs is a slightly lower energy efficiency, but China has abundant cheap energy so this is not a serious disadvantage for them.
But consider the alternative. OpenAI and Anthropic can shut off your account or API key at any time for any reason. How is this better? You have way more security when you're running your own model.
Dunno why you'd want to though, considering v4 Pro (and even Flash) outpace it drastically
It’s sad to think that Mozilla spent years and millions doing virtual reality and AI, they would have been perfect to do this but let’s face it - who knows if Mozilla will be around even 5 years from now
For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.
I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.
Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.
They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.
Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.
btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.
Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.
Also I was responding to a claim about what will happen in less than 6 months (that’s about the edge of what you can meaningfully say too much about in this field).
These strategies take materially different resources; it’s not an overnight decision made by leadership. I suppose there is a natural experiment ongoing at Meta regarding this, it seems they recently moved a number of people into a division to produce such data overnight. So we will find out soon how quick they climb the leaderboards.
Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.
Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.
The use of US models for Chinese model training is part of the motivation of all of this.
But if they can stay on pace, within say 6 to 12 months of the bleeding edge of the American frontier models, that’s a huge problem.
If they can just piggyback on the Herculean efforts of Anthropic, OpenAI, Google etc., accept a little bit of lag, and save billions of dollars? Why wouldn’t they?
And for the end user, why would they pay a premium subscription price for something they can just wait six months for and run on their own hardware at home? In my opinion, this is the cat and mouse game that’s being played right now. And I suspect it’s intentional on the side of the open weight models. I would bet they are playing a war of attrition
They don't even need to 'win' in the sense of maxing the benchmark. They can be 20% worse/50% cheaper and many of us (and our managers who approve our token budgets) will be in.
Deepseek is 30x cheaper for input/75x cheaper for output than sonnet on openrouter, and it's not a whole lot worse for many things.
It is enough to kneecap their pricing power to trigger the valuation reset by an order of magnitude and humble them a bit.
Plus there are always infrastructure and hardware providers who want to keep their share of profits and will squeeze Anthropic's margins to deflate their valuation (nvidia, aws, RAM manufacturers, etc)
Kind of an oxymoron don’t you think.
If they could generate data that looked kind of real, why don’t they just generate that data on the fly during inference
1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;
2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;
3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;
4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;
5. Winning doesn't necessarily mean being the best. Often it's just being good enough;
6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.
Nobody cares if your AGI is 100% made out of neural networks or if it's like 50% neural networks and 50% perl scripts.
If the closed models stop improving will the progress of open models slow?
The Americans should wake up to reality because their fantasies that are repeated continuously in all Internet media, that supposedly the Chinese copy the US technology so they will not be able to surpass it, were true many years ago, but there are already many years since this theory has become false and now there are many domains where USA would have to copy the Chinese technology if they do not want to remain behind.
Among other "sanctions", USA has forbidden the export to China of high-performance computing devices, but this has backfired as China has just demonstrated a supercomputer that is faster than any US supercomputer and which uses custom CPUs designed in China, apparently by Huawei, the company that was the main target of the US efforts to sabotage the Chinese competitors.
The US "sanctions" have hurt China for a few years, but they have convinced them that they must allocate resources to become able to make themselves everything that they previously bought from USA. The result is that now China has become stronger and USA weaker.
USA should have never sold technology to China a quarter of century ago and then the power relationship between the 2 countries would have been very different. But even 5 years ago it was already too late for any US "sanctions" to have lasting effects. Nowadays any hopes that US "sanctions" will keep China in the dark ages are pathetic.
With the kind of policies that are promoted by the US government, the chances that USA will keep its leading position in AI are minimal.
Some people in China surely know.
> Like if the closed models stop improving will all the closed models also stop improving?
Seems extremely unlikely, unless the models all hit some kind of wall soon. The Chinese companies may be behind the US in compute capacity, but they have excellent researchers [0] who are probably approximately as good as their US counterparts at the kind of problem generation and RL that is currently working so well.
I would be very surprised, though, if the models cannot continue to be improved rapidly in any area that allows a tight feedback loop like programming, at least up to the point where we puny humans lose the ability to define objective functions.
(And, conversely, I don’t expect magic in fields where the feedback is slow or expensive. A model is not about to reliably invent a wonderful medicine for the same reason that a large and extremely competent pharma company cannot: the evaluation process is extremely slow and it’s so expensive that the kind of utterly enormous corpus that is driving the current progress in coding is simply not available. Running RL on m iterations of n medication-development trajectories each is going to cost n*m times $10-100 million and take m years if it’s even possible at all.)
[0] The US advantage in this space will likely decline, since the brain drain from the rest of the world via the US university system to US labs is drying up.
I think it's much more immediate/present: the weights and the information breach significant strategic controls on national data and posture, which can be back-derived from the models. If you can analyse a model, you can infer what structural inputs dictate it.
China, a "authoritarian state" country, "the antonym of freedom", with a software industry that is especially capitalist, has produced all the competitive open-weight models.
It really is IRONIC.
Disclosure: I am Chinese, and I understand this strategy comes from being behind, using open source as an asymmetric way to compete and make up for missing compute by sharing the burden, etc. But still, very ironically.
> USA, a country that known for the land of freedom
The US might say it's the land of freedom, but it's been playing the game of economic protectionism for centuries. This is just the latest example.
[1] The story: https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/s...
[2] Wikipedia: https://en.wikipedia.org/wiki/Superiority_(short_story)
Not the same thing.
It’s used right in the articles body, but title is misleading.
The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.
I gave up. No one cares. And no one will ever tell the truth about the training anyways.
Substantial and growing freedom beats zero freedom ever again.
First, we can not be sure the next release will remain open weights as Qwen 3.7 has showed.
And second, they are all Chinese models. So instead of open weights, perhaps Chinese AI models is a better word choice.
On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.
I think it's entirely the opposite. For narrow use cases, like web pages and crud/GUI, the open source models don't show much of a difference.
My impression is that the open-weight models have been drawing close-to-level at coding tasks, while Anthropic and OpenAI have been putting large amounts of effort into developing their models' abilities in other domains: legal, biomedical/science, etc. Anthropic (especially?) has also been putting more obvious resource behind optimising their harnesses - from Code to Cowork (which is kinda Code for normies), Design, etc.
LLMs are an undeniably valuable tool, and governments like to control those.
But what is impactless to the wider world will always be as significant as something that never existed.
In this case it may actually apply though, no? Open models get better from closed model distillation?
[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes
The question is not whether they'll prohibit open-weight models better than the US ones, because we all know the obvious answer.
The unbearable cheapness of open weight models
https://news.ycombinator.com/item?id=48668255
gemma4-26B (#7)
qwen-3.6-27B (#9)
https://news.ycombinator.com/item?id=48640196
Certainly the gap is closing but I feel it still makes more sense to pay pennies to run the full sized open models hosted on much better hardware.