Hacker Newsnew | past | comments | ask | show | jobs | submit | ismaeel_bashir's commentslogin

Thanks, the security point is valid, so let me be specific about how deployment works for us!

There's no telemetry egress. Deployments are air-gapped and run in the customer's VPC, on their own hardware. We don't ship telemetry out to a SaaS backend to reverse-engineer; the data never leaves their environment, and for on-prem/air-gapped customers there's zero egress and full audit logging. We are doing all this because finance, biotech, and national-scale customers are the design target for us - we all worked in the space and understand what security measures need to be in place for this to work.

For example, the "open MongoDB" failure you mentioned isn't something that would concern us, because there's no central store of their data to leak.

On "SaaS being in the critical path": we agree, and that's why we're not in it. We're not a scheduler or a runtime. Our daemon is passive and if it falls over, jobs still submit and run exactly as they do today. We sit alongside as a prediction/recommendation layer, not in the path that has to be up for the cluster to work

For upgrades and expansions with increasing utilisation, most large scale compute users are capacity constrained and growing faster than they can buy GPUs. If anything we are delaying the expansion not killing it. In terms of unit economics, being able to serve more users with tighter user allocations is a net positive for cloud providers and is something they actively try and pursue :)


Probably the most helpful advice I can give you is pointing out that I wrote my comment after reading your homepage and docs. :)

I used to run security for building size computers if you want any feedback. My email is in my profile.


Yes :)

This is actually a really cool feature of the platform. We ingest DCGM, CUPTI, and cgroups to give users granular telemetry of what exactly is going on in the hardware they allocated when running jobs on it.

We also have profiler that has single digit overhead to correlate stack frames with hardware metrics. What this means is not only will you be able to see if you job was compute bound or memory bound at time x, but also you will be able to correlate this to areas in your code [currently only supported in python - other languages coming soon :) ]

Would love to show you a demo of this live. Feel free to email me at ismaeel@expanse.org.uk


Sure would be happy to :)

I’ll send you an email, good luck with the book!


Nope :) the core model isn’t an LLM. It’s a custom architecture built from the ground up. We natively accept multimodal inputs such as source code, submission scripts and hardware topologies. The LLMs in the post are the baselines we beat.

This is also why fine-tuning matters for us. We train a cluster-specific model that gets better as more jobs run on your cluster, because the same code behaves differently on different topology. An LLM reasons about code/script in a vacuum with no native sense of how your nodes actually perform


What kind of non-LLM machine learning is applied to source code? number of lines and other facets?

Without language modeling (rendering it an xLM) how does one process computer language files (source code)? Or are you saying its an SLM not an LLM?


I see, very interesting, thanks!

Good point - people do set capacity aside, reserving it for later.

But our utilisation measurements are from waste within a users allocation. It’s waste of what users are actually requesting and running, not from any reserved idle capacity.

For now we sit only on the prediction/intelligence layer; we don’t do any scheduling. We don’t grant or sell capacity, we just tell the scheduler (and user) what a job actually needs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: