Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
From Go on EC2 to Fly.io (benhoyt.com)
346 points by nalgeon on Feb 27, 2023 | hide | past | favorite | 200 comments


My main complaint about flyio is instability, which you won't notice in your first couple weeks.

A couple of times per mo, redis (and occasionally postgres) times out.

My TLS cert did not renew, resulting in 8 hours of downtime.

You get what you pay for.


Yep. Fly isn't fully baked. Deployed a service earlier this month and got bitten in a few ways:

- It took several hours to get a 1 GB disk provisioned (their API would time out and I'd get left with an allocated but unusable disk that couldn't be deleted without deleting the entire app it was attached to.) The Fly forums have numerous reports of the same issue, going back months, across multiple data centers. All resolved with the moral equivalent of "oops, try again later?"

- The fly.toml reference is incomplete and internally inconsistent, documenting keys which are not valid, while omitting valid keys that appear elsewhere in the docs.

- Volume snapshots are retained for either 5 or 7 days, depending on which of the multiple conflicting docs pages you believe.

- I wanted to let them know about that last one, so I emailed the address listed at https://fly.io/docs/about/support/, only to get an autoresponse saying that they don't read that mailbox and that I should pound sand unless I'm already paying them to read my emails.

Reading between the lines of their recent blog posts, it seems like a lot of the infrastructure pains are pinned on Nomad, with the hope that moving to Fly Machines / Apps V2 will obviate a lot of those issues.

I hope they're right, because what they're trying to deliver is exactly what I want.


Well Nomad certainly isn't at fault for our docs.

Volume provisioning issues have been a game of whackamole, largely as the result of scaling pains. We've grown like 4x in the last 3 months (thanks, Heroku) and it's put a strain on pretty much everything.


I got lucky and hit all of these problems in my first few hours of trying Fly (much to their detriment, unfortunately). I wanted to love it, but I encountered nothing but bugs when I played with it over a weekend a month or so ago. Internal DNS worked for one but not the other of my nodes; Dashboard accepted my credit card but didn't actually upgrade my plan; SSH console would hang, requiring killing the client process with SIGKILL; my builder hung with a Dockerfile that was literally just "FROM ubuntu" and "apt update"; this failure resulted in `flyctl machines list` hanging and `flyctl machines kill` hanging. Maybe it was just a bad weekend at Fly, but their status page was all green, too.

So many problems between me and even giving fly a serious try. Unfortunate, but maybe in a year it will be better.


It'll definitely be better in a year. It sounds like you hit some of our capacity issues. In a year we'll either be ahead of those, or not growing anymore due to ongoing capacity issues. I'm hoping for the former.

The SSH and Builder issues are interesting. Those sound like something related to our wireguard stack.


My account got marked as "high possibility of fraud", even though I validated my email and added a credit card they even charged. The error message said that I had to contact support@fly.io but that address only accepts email from paid subscriptions.

With no way to evaluate the service, I opted to delete my account.


Exactly what happened to me as well. In my case, I was (fortunately?) messaged by someone from Fly.io after asking around the community forum how to remedy the issue. I think the person I talked to was inferring that the system got tripped because I was using my own domain for my email address.

It's since been corrected, and I still have a handful of services running with them, but getting started with the platform definitely was a rougher experience than I was expecting it to be.


So they will regard any large company using their service as fraud? Insane


Large company? Literally every company I ever worked for, the smallest one were 5 people, had their own domain for emails. And they were not all in IT.


So we agree its insane to cut off every company with a domain? Good.


I don't know what merchant system they are using but I will say it is very common for them to have their own anti fraud detection and rejection on charges. Your specific case could be related to a huge list of possibilities (even many of them a rollup of many other interactions on other sites that happen to use the same merchant or foundational fraud data).

Thinking this is just because of a domain name is silly.

Also Fly.io -- You may want to clearly accept and manage trial issues (or at least a subset) via a support path. It seems silly to effectively bounce users with the impression that you have no support because they are not YET paying customers while in the trial phase.


That would be insane, yes. It isn’t the case, though. I signed up with my own domain just fine.


No, we don't flag accounts as high risk based on email domain.


as a counter-point, i use custom domain for email, and haven't had a problem with fly.


It may not be custom domain but rather self-hosted mail server.


That's worse. A self-hosted mail server implies a significant cost investment that probably wouldn't be required for ye olde hotmail account.


We don't charge credit cards to try and remove the fraud flag. We do a preauth, which is immediately canceled. Sometimes this looks like a charge.


> We do a preauth

Probably that was it, but my point is, there is no process you (as a customer) can follow to remove the fraud flag.

Also the flag was still there long after the preauth.


After I have experienced several fly.io render railway northflank platforms, I think the problem with fly.io is that the web console is not fully functional, too dependent on CLI and the response time of the application is unstable. The problem with render is that it is too slow to build, north is similar to railway, and the experience is good. But although the north console is very comprehensive, it is too complicated and dazzling. By contrast, the interface of railway is very simple and comprehensive, and UX is great. (found by wappalyzer analysis that it is based on nextjs,very nice)


I tried railway and northflank. Really like the UI northflank has; I think anyone familiar with Kubernetes should feel right at home. I also agree with the sentiment that railway ui is simpler.


I get up to 20secs of downtime when redeploying on railway... I was using the sample golang repo from fly : https://github.com/fly-apps/go-example

That's just unacceptible imo.


Hey there fyzix! Angelo here, Support Engineer from Railway. I was here procrastinating from work when I saw this comment.

The downtime you see is our proxy taking time to cutover- however, we have https://docs.railway.app/deploy/healthchecks that will only cutover once we have a 200 from your API. This way we can keep your old deploy live and serving requests if and only if it's live.

Theres more we can do to make it magical, but this should help in the meantime.


Thanks for the reply, Issue resolved:)

This could be done automatically by pinging the root and searching for a header set by railway's default page. If it doesn't exist then the service is live.


We used to do that by default, but it led to issues with load on our proxy. (Imagine n * m pings against your host for a shit-tonne (official measurement) of builds + deploys a day.)

We are hiring Network Engineers for exactly this reason :')


More context: The UI shows that the service is available but when I visit the live url, I get a 404.

What I think is happening is that the UI shows the status of the underlying container and there is some fault with the reverse proxy they are using to expose the container to the internet.

Fly.io doesn't have this issue but I am leery of all the complaints here.


>The problem with render is that it is too slow to build

This is only true for the free tier, and (hopefully) won't be true for much longer.


Yes, but I think the difference between free layer and paid users should be distinguished in terms of capacity and scalability. This crude entry experience may lead to a reduction in user conversion.


Echo this, and I _did_ notice it in my first two weeks.

3 separate issues prevented me deploying. One was my fault, but hard to diagnose because error messages contain no useful information.

The other two were the platform falling over, with the support forum having no resolution but plenty of other people in the same boat.

Lots to love about fly. The latency is noticeably better than anything I’ve used except cloudflare. Prices are good. Hopefully they’ll get the growing pains sorted and we’ll have another contender.


These deploy issues are all kinds of ulcer inducing. They're not strictly Nomad's fault (we're holding it wrong), but the complexity of flyctl -> nomad -> global hosts makes it difficult to bubble errors back up to you.

If you end up trying again, give the machines based apps a shot. It's much simpler infrastructure: https://community.fly.io/t/fly-apps-on-machine-prerelease/10...


The TLS cert renewal was a big failure on our part. I know that's, like, a one strike and you're out problem, but we've improved that part of our stack substantially. It should be much more resilient.

Redis/Postgres timeouts sound like they might be a problem we can help with, though. Both those services work through a load balancer with a TCP idle timeout. If you get timeouts after a period of inactivity, it's usually a simple config tweak to have a driver handle that seamlessly.

The timeout issues surprised me, more drivers than we expected can't handle a DB connection timing out when it's idle.


My main gripe is the wild variation in response time.

My server response time seemed to vary from <100ms to >300ms for the same operation.


fwiw, for non-html workloads we serve (nodejs) with shared-cpu (several 100 requests per second), the latencies (p50, p75) remain consistent.


Is this due to the network i.e. you get routed to a different server or due to CPU sharing?


I thought it could have been because of a shared CPU at first as well, but I upgraded to dedicated CPUs and it didn't make a difference.

I don't know whether this is a platform issue with Fly or something else, but it was kind of annoying to have such a high variation with seemingly no way to fix it.


isn't the whole selling point of fly is edge servers which supposed to reduce latency


Sure, but what's edge? The world is big. Can't have servers everywhere (ok, maybe Cloudflare is close). It's possible due to routing decisions that you get sent to a "different edge" at different times.


So what's the point then? 250ms is enough to literally send traffic to the other side of the world. So why bother with edges if you can just have a single server where most of your customers are and get the same latencies?


I'd spend some time reading community.fly.io before committing to it. I recently helped a startup that has been on fly.io and there was a lot of frustration about things not working.

it was very frustrating to see un-answered problems reported, fly.io down for what seemed to be many users, the same time seeing a fly.io co-founder arguing about politics all day long here on HN...


In fact, when you enter the fly.io console, you will feel that this is an unfinished construction site, especially when you start to deploy a new app, and the frame text icons in the options are simply truncated, which feels rough.


Yeah, that would certainly have slowed me down too if I'd experienced that. I haven't had any of these issues. Fingers crossed! And hopefully they're working on stability.


There’s an edge case in which you can lose access to your account as well when switching from github to email login with the same email. It’s an edge case but quite a bad one.


I had (a couple years ago) some weird websocket throttle or debouncing characteristic. I tried to construct an evidence, but it's quite hassle so I just ignored it.

I think they are too engineering culture. Target is devs, sure, but they could use some product guys.


Venture funded stuff like this should be fast, good, cheap, pick 3! Until they need to make a profit or shut down.


I enjoyed this article, but I'm noticing more and more that everything I read about fly is either a fun/low cost side project, or it's a (very impressive) article about fly.io internals. There's very clearly a capable tech team, but I'm confused about their target market. What customer success stories am I missing?


i dont work at fly but i do work in devrel and this is pretty common. its basically a joint result of the fact that most content marketing is either about lowering the barriers of adoption and telling an impressive technical story for recruiting and sales; and that most big companies require a lot of hoop jumping to post official case studies; and that fly is still a newish paradigm that most companies haven't yet adopted.

i have no doubt they'll come along if fly continues owning mindshare at this level. when its obvious on HN but not yet obvious on the marketing sites you just have to wait a couple years (with some exceptions to the rule, e.g. people often bring up the failure of light table which got significant HN pull and seems to have died out https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...)


It's much simpler here: we just like writing, and we especially like writing about the parts of the system we like or find interesting. Other than having a feel for how to write for this particular audience, there's no strategy to it.

Case in point, the last big thing we had run here was, theoretically, part of a series of recruiting challenges --- for a role we've paused hiring for. If we had any strategy, we'd have held off on it for a couple months and ran it when we were actually hiring. But Ben really liked the piece, and so did the rest of us, so up it went.


Insightful

> i have no doubt they'll come along if fly continues owning mindshare at this level. when its obvious on HN but not yet obvious on the marketing sites you just have to wait a couple years

Seems like an accurate observation.

Now please elaborate as to why? And how does one shorten this?

I am unskilled at devrel and wish to benefit from your experience further. Cheers. :)


I think they’re copying digital ocean’s strategy. DO had great tutorials on hosting applications 3-5 years ago. This brought in SEO and social credibility.


nothing really to do with devrel skill level, just a noted pattern others before me have already observed. hn is a community of technical early adopters that care more than the average social network about substance. so likely to be earlier than most other signals in recognizing up and coming success particularly in the tech domain.

as for shortening... i'm writing https://dx.tips/ and have a couple of related posts coming up on that


Well looking at this post comments the targets seems to be "people that don't care their shit isn't working"...


you probably don't hear much about success only when things go wrong. Wasn't fly stood up in response to downfall of Heroku? It felt like a knee jerk thing to me but overall i feel like the sentiment is positive toward fly.io. Growing pains hurt, I'm rooting for them.


Nope, they predate the Heroku changes by a number of years.

I did have occasional outages (lost db connection). They seems to usually resolve themselves. I ended up shutting down the (unused) app since it was under attack.


> you probably don't hear much about success only when things go wrong

Which is exactly why companies hire marketing, PR, DevRel, business development, salespeople, account executives. Hiring a bunch of smart engineers and hoping everything will automatically turn out great is a losing strategy, but that is the one fly.io seems to be focused on.


I wish I could say I love fly.io, but it really is not ready yet for production. A few months back we migrated a non essential production app to fly.io and we've had 3 major downtimes since where we didn't knew at all what was going on.

One thing from heroku in 2021 should have been learned here: Communicate before your clients learn about the issue. Communicate open.

I hope they improve in the next year


I'm pretty happy with fly.io, except for one thing: I moved over two apps from Heroku, lured by the free tier. But you do have to enter your credit card details whether or not you intend to pay. And then they do charge me a few bucks per month for an almost completely unused app.

It's probably all in line with the ToS and everything, but I find it a bit shady that I have to investigate this, rather than having a free offering that doesn't suddenly charge me.


It's not a free offering; there is a free allowance. If you exceed that you get charged:

- Up to 3 shared-cpu-1x 256mb VMs

- 3GB persistent volume storage (total)

- 160GB outbound data transfer

If you spin up, say, 4 VMs you will pay for it.


I probably have to fiddle with the settings to stay within 3 CPUs, but since it was heavily recommended as an alternative to Heroku's free tier, I'm a bit disappointed I have to. Maybe they never intended to be recommended for that, don't know.


Right now the most generous free tier is from Oracle (24GB RAM machines feels unreal). But you can find other free tier offerings here: https://github.com/ripienaar/free-for-dev


They will yeet your machines without any info tho. I had test (smallest in free tier) VM that just got destroyed one day.

But that is not all

My account is now "inactive", with no option to activate it and with no option to even pay and upgrade to paid tier.

The account is in some broken state where half of the dialogues produce dialogues like "Authorization failed or requested resource not found", like I had account with permissions denied to everything.

Even the "upgrade to paid tier" option gives

> You do not have access to perform this operation. Please contact your cloud account administrator for additional access.

Utter fucking disgrace


I tried to look at OCI but was immediately bamboozled with a slow as heck web site (locations table took 15 seconds to load!), a tonne of management BS about "aligning strategic objectives to operational constraints in a dynamic environment" and then was expected to immediately book a call with a salesperson.

No thanks Oracle even for 24GB free VMs!


Go to Oracle’s subreddit. It’s full of people whose free trial accounts were terminated.


"zombiefied" would be better description. Account exists but you don't have permission to do anything and you don't even have permission(lol) to upgrade to paid account


And you don't even have access to your data to backup or migrate.


On top of what other people said: capacity for the free tier machines is very limited. I wanted one to fool around with and it took multiple weeks of trying daily at varying times before I was able to actually allocate a VM.


True, Heroku allowed as many dynos as you want when their free tier existed, however, they had a limit of 1,000 hours per month. That pretty much meant you couldn't run more than 1 VM (~744 hours/month) for free. Fly.io is more generous as those 3 free VMs can be up and running all the time (but VMs have 256MB of RAM instead of 512).


The free tier is 256mb per vm (up to 3 VMs). You can't use or increase VM's memory to 512 or you will be charged.


I haven't used it, but from what I know it's not a very good analogue to Heroku (other than being an easy-ish/dev-focused host). Their pricing model is pretty different


OTOH they don't even charge you if your bill is less than $5/month. I've had an app do nothing for a few months now, and my bills are always just thrown away due them being under $5.


My concern (for non-paying or low-paying side projects) is not so much the difference between $5 and $10 for monthly hosting - that's like the difference between one or two cups of coffee - but what safeguards are in place to prevent me from inadvertently exceeding an expected amount due to a high traffic spike, misconfigured cloud function or whatever.

So I don't really care if one service costs x2 if the difference is between 5 and 10, but I care a lot if I could accidentally end up with a $2000 bill one day (and warnings don't count - not much good if I get a warning if I'm away from my laptop in a log cabin or something).


I guess that's the modus operandi of the Cloud providers - fake your concern by an useless warning and hope that no one will react before they'll rake up the bill.

That's why I stay away from e.g. AWS for side projects, too many histories of a stupid mistake that costed quite a bit of money when all that is needed is some breaker if usage is exceeded beyond a point.


That's something I wonder about looking at AWS. My personal account has about $0.7 monthly spend. Doesn't it cost more to charge a credit card?


A few months ago, I found out I had an AWS account from college that was racking up $0.50/month in charges. After 40 months, I finally got an email from AWS warning me to pay the $20 or they'd just close my account. I paid it, and closed my account, but I guess I could've just left if I wanted to.


I am guessing that 50cents per month is a Route53 hosted zone from one of your experiments with hosting a website?

You may want to check if you also have a domain name purcahed through route53 that is set for automatic renewal every 3 years or whatever


That's a good point, and you're spot-on about the Route53 zone. I did close my AWS account already. I'm hoping it just kind of doesn't renew that domain name.


Good to know. I'm paying anyway because I have other accounts there with bigger charges and I actually want the service, including the small personal account (which really shows as a separate transaction on a bank statement).


Yes. But building an exception into the billing system and possibly accumulating charges below $x over a longer period might just not be worth the effort.


AWS regularly cancels my invoices if they're under 1EUR, I assume precisely because it costs them more to do the transaction.


We don't suddenly charge people. We charge per VM you have running. Setting these up is intentionally manual. You can opt in to autoscaling, which will create new VMs for you, but you have to set that up after you've deployed your app.

If you're getting charged you should run `fly apps list` and see if you have more than two apps running.

One thing that trips people up is Postgres. Postgres is just another fly app. So if you moved two apps with DBs, you're probably running four total apps. We prompt you for database info when you do this, though, it's not done in the background.

One problem we've come across is expectations from Heroku users. We are very different (like the Postgres thing, there's no shared tenant postgres). People expect us to be the same. We haven't figured out how to expose those differences to people in a way that sticks.


> One thing that trips people up is Postgres. Postgres is just another fly app.

Sounds like there may be room for improvement in your documentation.


Perhaps because I'm not used to Heroku and such so I have different expectations, but I found it quite clear. There's a list as soon as you open the dashboard that looks like

    Apps
    -----------------------------------------
     myappname                shared-cpu-1x
     myappname-db         1GB shared-cpu-1x



Huh? I had a VM running and a postgres running without entering my credit card. I only needed it when I had to temporarily scale up the RAM on my VM


Same, then my coworker signed up and had to enter a credit card. No idea on the discrepancy!


Hm do they itemize the charges? what are they for?


CPU - I would assume it's not a mistake, maybe if I dig some more I can figure out how to keep within the allowance, but on Heroku I never had to deal with this kind of thing for free apps.


I agree, as an individual user I much prefer to be locked in to a fixed rate. And not just for the free tier- I don't want surprises even if I'm already paying

As a company with a cushion, hosting a profitable app, it might be worth doing the usage-based pricing to try and save some money. But the risk is just not worth it for smaller projects


Only if Fly.io would focus on giving proper managed PostgreSQL and MySQL as service, they would be default choice for hosting many applications.


Yes, I feel this. I have a production app running in Fly with their Postgres offering. It would be good if it was fully managed


This is what put me off even trying them out. In 2023 managed DBs should be table stakes for a PAAS.


I'm really happy with using docker and compose for everything. You really can't beat how well it works on a single host.

Worrying about having to configure systemd? "restart: always" in compose. Caddy updates? Caddy provides an up to date docker image. You can't get around caddy's config, but tbh, you do it once, and then you're done. It's way better than nginx. Deploying a new version of your go application is just "docker compose up -d".


Any issues with persistence, like using Postgres? I heard that can be quite funky.

Just wondering about having a "native" NGINX reverse-proxy + multiple apps ran via docker-compose on my VPN, seems the most hassle free. Currently using it for integration tests and it works great.

Did you maybe consider k8s, maybe as a learning experience?


I looked at Fly.io pricing, and boy that's expensive. $558/mo for a virtual machine with 8 dedicated CPU cores and 64GB of RAM. I pay ~$120/mo for a Zen 2 12-core dedicated server with 128GB of RAM. It just doesn't make sense to migrate anything that needs some chunkier resources.


Cloud pricing is always terrible compared to "just a fucking machine somewhere in a rack" or even more traditional VM providers.


Right, but they add quite a bit of margin from what I see. Correct me if this comparison is not fair:

Fly.io, 8 vCPUs, 32GB RAM (dedicated-cpu-8x): $398/mo

AWS, 8 vCPUs, 32GB RAM (t3.2xlarge, us-east-2): $240/mo

So, 65% more expensive. Quite steep.


That's not entirely fair. The AWS t family aren't dedicated CPUs. They're burstable, there is a token system that decides if you get to use them to full capacity and for how long:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstabl...


Thanks. Do you know which EC2 instance would be most comparable to what Fly is offering with their dedicated-cpu instances?


Short answer: no. Long answer:

When you consider ARM vs AMD vs Intel vs current-gen vs last-gen, AWS sells several 8-core 32-GB machines that have wildly varying performance characteristics.

It used to be that the manner of virtualization impacted performance, too. I think that might be less the case now.

To know if you're getting good value, you really need to try a few different vendors and benchmark then, either with real workloads or synthetic ones.


> When you consider ARM vs AMD vs Intel vs current-gen vs last-gen, AWS sells several 8-core 32-GB machines that have wildly varying performance characteristics.

This is actually a benefit of AWS and other large public clouds. You have a choice due to their size.

If AWS's investment in Gravitron design works well for your workload, it's a ~30% cost savings. They're able to invest in designing and producing the chips due to their size. A company like fly.to is limited in selection and heavy R&D.


Sure, I personally like the Graviton instances. For workloads that care more about throughput than latency, they offer excellent performance-per-price.

The OP was asking what was "most comparable" to Fly's offering. I assumed they meant "what would similar performance cost elsewhere".

By switching from Intel to ARM, they're buying something fundamentally different. Maybe it's a better fit overall! But I didn't understand that to be what they were asking.


I think we're on the same page. I only wanted to further contrast for readers the difference between something like fly.to and AWS. There's a lot of pros / cons, and one of them is selection.

I wasn't sure what the dedicated instances for fly.io were running on (can't find it on this page: https://fly.io/docs/about/pricing/#plans). To truly do an apples / apples compare there's networking capabilities, storage, and other considerations also.


Anything not these:

> The EC2 burstable instances consist of T4g, T3a and T3 instance types, and the previous generation T2 instance types.

Your point still stands.

Here's a m6g.2xlarge with 32GiB of memory and 8 vCPUs (AWS Graviton2 Processor / arm64):

https://instances.vantage.sh/aws/ec2/m6g.2xlarge?min_vcpus=8...

Monthly cost: $224.84


But then Gravitons are cheaper than x86. Though apparently the x86 variants are still only around $252.29/mo.

Fly compares better on the low end of scale.

AWS t3a.nano 512MB (2vcpu but 1h12min burst only, you can normally only consume 5% of that!) is $3.431/mo

Fly shared-cpu-1x 512MB 1vcpu $3.19/mo


PaaS is also always more expensive than IaaS (which as GP said is always more expensive than I not-aaS).

They're charging you more for the convenience of `flyctl region add lhr` or whatever it is vs figuring out the routing & VPCs for example in AWS yourself.

It's not all roses though, I find the docs a bit lacking (good on networking - bad on security & storage) and the price of that convenience (aside from literal price) is that it is limiting - as soon as you think 'well I just want a very simple firewall/security group wall' or 'what's the equivalent of allowing only this x to y with an IAM policy', nope, can't do it.


I know that, but I've never understood the PaaS model of just slapping X% on top of the underlying cloud cost. It just invites customers to migrate to the underlying cloud once they're big enough to hire and sustain a dedicated devops team and setup something like k8s.

Surely the amortized costs of the Fly.io platform don't scale linearly with the power of the VMs you rent from them. A 64GiB server won't be twice as expensive to maintain as a 32GiB one. So why not create a model that e.g. consists of a cost per VM/autoscaling group/etc plus whatever the VMs cost?

I'm genuinely interested, I could very well be missing something.


The costs of the VMs scale linearly. The bigger they get, the more additional capacity we need to be able to launch on demand ones.

It's not just RAM, though, our physical servers have a consistent ratio of RAM/CPU/disk available. You're basically just buying slivers of hardware we've already paid for.

That said, our pricing isn't far off of AWS. They have much better economies of scale than we do, and can afford to do Graviton, but you get pretty close horsepower per dollar for comparable instances.


Also, just for the record, I'm the sort of person that really dislikes magic command lines, I don't want docs that tell me do fly this or that. I want it to be possible in the web console (almost nothing is) and via the Terraform provider (glad it exists but frankly it's an alpha/WIP) and for parity vs the CLI in docs.

I used it in above comment as a shorthand for the fact that such an API/functionality exists that you can just say to Fly.io (whether via CLI or ~better~ other means) 'hey stick this in a second region too'.


This phrasing of command-lines as magic is weird to me, with an obviously different background. I'd consider a command-line example more explicit in what it is doing than either a 5-step walkthrough of the web UI, or Terraform.

flyctl is a reasonably simple GraphQL client. They haven't really documented the GraphQL API (see e.g. https://fly.io/docs/app-guides/custom-domains-with-fly/#grap... for one slice of it), but that's all flyctl does.


A comparable would probably be m6a.2xlarge, still only $250. Larger burstables have higher baseline cpu so they’re more like dedicated CPU, hence the low price difference.


fly.io dedicated cores probably are vCPU aka Hyperthreads anyways. So the gap is even wider.


The only good thing about EC2 is reliability.

You can get equivalent hardware for much cheaper elsewhere and save way more than 9$. Add some load balancing across cheap hosts and you can achieve good reliability.

I run things on Digital Ocean, hetzner, ovh, linode, fly: they all cause downtime from time to time. I have clients on azure / Aws / Google and it's a much rarer occurrence.

Fly is definitely among the unreliable ones, so you're paying for a simplified deployment / ops experience here - the heroku model.

Even at that, configuration is not the easiest and there are gotchas you need to know. Documentation is also pretty sparse and the example images were stale. Getting a build to work was a huge pain.

The free offering is nice, but I'm not recommending it for production.


EC2 has Graviton for now, which is "exclusive". Graviton 3 can be nice if you can make it work with your workload.


Fly.io is excellent for testing, small personal projects, and similar, but it's NOT for production!!! Instances can silently fail anytime, and sometimes there are no easy ways to recover. Once I had to go through the process of updating domain DNS to point to the new instance.


I am so confused about what you're describing here.

Apps get static IPs, at no point would you need to make a DNS change to route traffic to a new instance.

VMs can definitely fail silently. When apps fail, we're not very good at notifying you of what's going on. I'm hoping that improves this year.


My domain DNS records were pointed to the Fly app IP. When the app silently died, and support wasn't able to help they suggested me to launch a new app which meant new IP for DNS.


I was about to demo my app to someone and quite shock that my app was down. Though, it's super easy to restart with a single fly command.


That’s what I get from reading all these comments here.


Fly actually has rudimentary static file hosting: you can use a [[statics]] block in fly.toml and any exact filename match (no index.html etc) will be served directly from the router. I'm not sure this would end up being useful for OP's exact usecase, i'm pretty sure it doesn't set etags.

https://fly.io/docs/reference/configuration/#the-statics-sec...


Great post!

Not sure why giftyweddings.com needs to serve static files with an ETag in the HTTP response header: a hash of the file content is already included in the URL, which means it will invalidated when the content changes, just based on the path, regardless of the ETag.


Yeah, this is a good point. I've previously just used the "hash in the URL" approach as it seemed more obvious than ETag. I asked Ben Johnson why his hashfs library uses both, and he said "Yeah, that's a good point. I can't remember why I added both. I suppose some proxies may use ETag? I dunno."


Haha. Fair enough. Thanks :)


What stops me from using such platforms is the risk that one day for any reason my account can be suspended and I will have to migrate my apps and data.

Do you have a strategy for that situation? Your old Ansible branch hopefully is still there and compatible with your recent production code changes. And your sqlite DB is replicated to an external storage and your data is reachable even if you have lost control over “your” PaaS account.


> What stops me from using such platforms is the risk that one day for any reason my account can be suspended and I will have to migrate my apps and data.

You have found a hosting provider where this is not a risk?


With the standard tools you have a whole spectrum of options:

a) Cloud providers; b) Hosting providers; c) Collocation; d) On-prem;


All of which have a counter-party who can suspend your service if they believe you're abusing the service: A&B your cloud / hosting provider, C&D the transit provider - all will have language in their contracts allowing for suspension or termination if they suspect abuse.

The only things that help protect you against this are (1) having named contacts on your account who will vouch internally at a supplier; (2) negotiating these clauses in contracts to prevent instant suspension in all but the most serious cases; and (3) utilising multiple suppliers.


This is also about the number of suppliers. Clouds - a dozen. Hosting providers, collocation - thousands.

I agree that in all cases, you depend on someone, but you can lower the risk to the point where other dangers that threaten your business become more probable.


If you're on AWS or GCP or Azure, and they terminate your account, and you use only the "standard" services (raw instances, k8s, database, redis, queues, email, etc.) it's generally relatively easy to migrate to one of the other two.

Fly is different enough that moving to or from it takes much more time and effort.


> Fly is different enough that moving to or from it takes much more time and effort.

It is? Fly seems to be only docker containers. You can't get much more standard than that these days!


> Fly is different enough that moving to or from it takes much more time and effort.

Have you actually used fly? It's pretty vanilla. Having migrated to and from fly I don't think your assessment is accurate.

It uses Docker files or Heroku style procfiles - not particularly exotic. Your apps live in a private network, which is managed for you.


Yes, this is my approach, as long as you stick to standard interfaces, you can migrate rather quickly.

Your level of abstraction may vary. I stick to VMs after realising how much overhead I had with the cloud as a one-person shop.

Teams can interface with k8s which is also standard nowadays, taking you can spawn a cluster fast.


One nice thing fly has going for it is that you’re just deploying standard OCI containers, so there’s pretty minimal lock-in if you need to get off in a hurry. Due to some edge cases I recently transitioned a workload from Fly to AWS in 2 hours or so.


I skipped over fly.io because I don't want to run a command to deploy an update to an app. I much prefer Render.com and DigitalOcean's setup where you select a repo and then it takes care of the rest with auto update on git push.

This seems like a trivial thing to add. If I wanted to use a domain specific configuration file then that's always an option, later.


They don't really need to add anything, because you can just use a GitHub Action that run "flyctl deploy". Here is Fly.io's guide to the 15 lines of YAML you can use to do that: https://fly.io/docs/app-guides/continuous-deployment-with-gi...


If any Fly.io people are reading: this page has Git merge conflicts in it currently. Search for <<< on the page.


What's really needed for this to be good is API tokens scoped to something significantly less than "anything my account can do".

Fly devs have confirmed this is in the works, think macaroons.

The only defense here is that Fly isn't the only party being incredibly sloppy with their API keys. Cloudflare is just as bad.


>I much prefer Render.com and DigitalOcean's setup where you select a repo and then it takes care of the rest with auto update on git push.

Doesn't that force you to use your hosting provider as your CI provider?

I've always avoided the automatic "deploy on git push" solutions because I want to keep my CI provider uncoupled from my hosting provider. That way, I can switch hosts without having to rewrite all my CI code.


To keep CI and hosting decoupled, your CI can push fresh images to a docker registry, and your deployment can poll for updated images. An easy way to achieve this is to add the Watchtower container to your compose files: https://containrrr.dev/watchtower/running-multiple-instances...


I don't think that's a good solution in general unless you're confident that all of your deployments will use Docker Compose.

About 1/3 of my deployments are just static sites, and the rest all fit into a single Docker container, so integrating Compose would be a big increase in complexity.


I'd say Compose simplifies even your single-container deployments, as the compose file is the single place to set your env variables, port forwards etc.

That being said, you can use Watchtower without Compose. It's enough to run

  docker run -d -v /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower


Fly did send an email linking to one of their articles on how to set it up for github maybe a few hours after the initial setup. I think it took me like 5 minutes to setup deploy on git push to main (or your deploy branch)


I do the same except I use Litestream (litestream.io/) for backing up SQLite. So far using fly.io was nothing but pleasant and allowed me to stop paying ~5EUR for a Digital Ocean droplet.


Maybe I'm missing something, but 500 qps for serving static files seems supremely unimpressive. Fair play about it being more fun, or saving some money from AWS, but I am not convinced that this setup is a good achievement in terms of performance.


Only 1 out of 4 URLs in the test was static. The others are rendered HTML templates (using Go's html/template -- I'll update the article to make that clearer), and one page does two SQL queries.

Note that my load test wasn't to see how much I could squeeze out of this, but to determine that it's "more than good enough" for this use case. The typical qps I get from real customers on the site is about 0.5 qps, so I think I'm good. :-)


He mentioned one of the pages was querying a db


How is the privacy and compliance in Fly.io vs EC2 if I build an app serving European users?

Edit: apparently you can select the region, and you can possibly get a pre made DPA document template.


Yes. You get a Data Processing Agreement (DPA) for free. That is what you need in the EU for GDPR purposes.

Customers with HIPAA compliance needs may need an Business Associate Agreement (BAA). That BBA thing is only included in paid plans.


The only problem I have with Fly is that it is VC backed. It's receiving a lot of developer love now, but wonder what happens when it eventually gets acquired by a behemoth in a few years and the same fate as Heroku is bestowed upon it.


We aren't super interested in selling. I want to work on Fly forever and selling is a quick way to end that.

But I also don't think you can build a from-scratch cloud without VC money. It's very expensive.


Appreciate the honesty and transparency but sadly VC isn't chasing the profits from being a cloud- or app-server-provider. Looking at the profiles of the VCs which participated in funding rounds, most of them are exit-oriented. A16z's portfolio proudly opens with "exits". Intel Capital is also very proud of their ~50% IPO or acquisition stat. Hopefully you can work on Fly forever but history has learned that probably won't be the case - and that's not judgemental.


You'll notice that the ones they feature most are IPOs. Which isn't the same as getting acquired. I'm not opposed to an IPO, I am opposed to get subsumed by a larger company.


> But I also don't think you can build a from-scratch cloud without VC money. It's very expensive.

Linode did it 20 years ago (somehow). Dunno if there's a more recent example though. :)


> It might be “quite trivial”, but it’s still too much work for lazy developers, let alone non-technical people.

I don't think you have to be "lazy" to take advantage of the value proposition of saving time or complexity. I think Dropbox and similar are successful because they have put a higher emphasis on their target audiences time than other competitors in the past.


Very interesting.

I have a question regarding the change from a external periodic process (author mentions two: backup database, and send post-wedding email to couple) to goroutine:

- if fly.io ever needs to scale your app up to > 1 instance, won't couples receive duplicate emails? (similarly, any other background process like database backup would be run in multiplicity)


We don't autoscale apps by default, you have to opt in to that. Many apps don't need or want that functionality.


Fly.io is so good, i really like it. Started using it to host my backends for hobby project while all my FE are at vercel. The free version keeps my project running and it is very generous as well!

Because of all their generous free tier, I'm willing to pay more for my personal projects and literally host everything at fly lol.


Why not just use digital ocean or hetzner?


Do you mean VPS's (virtual private servers)? Because there you need to take care of the maintainance of the OS, system upgrades, initial setup, etc. With a system like Fly.io you just push your app and the OS is abstracted away from you. That is exactly what the article is about.


How hard is it to do apt update && apt upgrade -y every once in a while?


Say I want to write an app, deploy it, and never think about it again. What's the easiest way to do this?

Fly.io?

AWS Lambda?


I have a couple notes for node.js on AWS Lambda. If you're at all security conscious you'll want to regularly bump the runtime version and update your packages.json. Some automation with GitHub actions or similar should be able to do that. Probably add some basic testing to make sure things still work before you deploy. I haven't automated the ones I manage, and I've updated them maybe 1-2 times a year for security reasons.

As an example, node 12 was released in 2019 [1] and will begin deprecation phases in another month [2]. So it looks like a 4 year support cycle, although old runtime versions will still run. If you're using any of the AWS security scanning tools, these will flag any vulnerable dependencies and then you'd update them. Those seem fairly common in the node ecosystem, so other languages may see lower toil.

[1] https://nodejs.org/en/blog/release/v12.0.0/

[2] https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes...


Depends on your app of course, but if you can live within the confines of something like Cloudflare Pages/Workers you can get pretty close to "never think about it again"

https://pages.cloudflare.com/


Hetzner Ubuntu Server, a good firewall configuration and automated backups. I can't think of anything easier. I tried a lot of stuff, but this just works. For internal applications I basically block all traffic outside of our office network, activate auto updates for the OS and just stick to one version of the app for as long as I can.

Edit: and only one app per server. Also only the Hetzner Cloud offering with managed backups and managed firewall.


How does this meet the condition "write an app, deploy it, and never think about it again"?

As I'm reading it, your setup is the exact opposite of that.


Why do you think you need to touch them after deployment? You just pay the monthly bill. The thing is you can't just deploy and rest usually as there are often security concerns, your provider could run out of service, various system failures which require a restore of an healthy state or similiar. Even if you serve just static files, there could be the need to adjust your DNS, because the provider changed after 5 years something, or some other provider-related configuration. You really want to care for all those things at your frist deploy if you don't want to care for them long-term. The Hetzner firewall makes frequent security app updates mostly unnecessary, Ubuntu auto updates care for the security of the OS, the backups make recovery easy, if Hetzner goes out of service, you can restore your system wherever you want as it is just an Ubuntu server, by keeping only one app per server you also reduce the complexity of the system, separate the systems, so that failure can't spread.

Managing a server is nowadays with offerings like Hetzner the easiest way to run apps which just need to work and don't need to scale to exponentially over time. Why add anything on top of it? It's done really within 5 minutes if you know what you do.

If you need something which scales and doesn't require file storage / databases, then I'd go for the provider-agnostic Serverless framework. If you need storage and/or databases then it really depends, as Serverless gets costly at these things and more complex and there are more provider-lock-ins. Containers may make mor sense in such cases.

At our company we use these two systems to reduce maintaining basically to zero.


Google Cloud Run is very easy.


If it's a web app, Heroku is a smooth experience; it lets you deploy by pushing a git repo.


I'm litteraly using that same function

  func getEnvOrDefault(name, defaultValue string) string {
      value, ok := os.LookupEnv(name)
      if !ok {
          value = defaultValue
      }
      return value
  }
for some of my projects


Heh, that's funny. I distinctly remember typing it in from scratch just the other day (no Copilot), and I guess it's short enough it sorta writes itself: the function name is kind of obvious, the name of "defaultValue" is necessary because "default" is a keyword, and so on.

I normally do early returns, so it's somewhat surprising I wrote "value = defaultValue" instead of "return defaultValue", but whatever.


Curious, is this coincidence related to GitHub Copilot?


In a long time till now, Heroku still doesn't support mounting volumes.

Is volume mounting the hardest problem in cloud computing ?

All buzzwords till now is scammy in the sense that, it doesn't make any sense when you can't cache your data in your local file system.


"Is volume mounting the hardest problem in cloud computing ?"

It kind of is.

If you don't provide a mutable filesystem, you get a whole bunch of benefits:

Need to scale to handle more traffic? Start another instance and add it to the load balancer.

App crashes for any reason at all (including a hardware failure)? Fire up a fresh instance on another physical server and load balance there instead.

Want to provide zero-downtime deployments? Start a new instance running with the new code, switch the load balancer over to serving from that instead, then shut down the old instance.

Each of these becomes a LOT more complicated if you offer read-write volume support.


This. DO app platform doesn't support mounting their volume either (wait for 3 years now, and probably won't happen forever). I wonder how render.com make it like it's trivial while the others just assume you wanted a block storage .. nope.


I understand that such sites are paid to avoid maintenance. But if the scale is not too big by renting a vps, I think it is always more logical to set up your own system. Unless of course you want to get your hands dirty.


Why should HTML files be embedded within the Go binary bloating its size ? And then needing to fiddle and hack around with Go's embedded FileServer to support basic HTTP caching is painful.

Is there no facility for a CDN in fly.io ?


I assume there is. But that’s not the point I think.

The point of embedding is similar to making a jar from an ops perspective. You “bundle” everything into one thing and you deploy that thing as single binary/archive. Less moving parts, less complexity.

Secondly, providing etags/modified headers for serving static assets is not a hack. It’s just not built-in. But it’s perfectly fine to serve those directly from your app server with the appropriate headers, so proxies can serve them directly.

In fact I would ask the reverse: why use a CDN when you can use canonical features for deployment (Go embedd, Java jar etc) and serving (HTTP cache headers)?

Personally I wouldn’t necessarily use a 3rd party library for this but to each his own.


> why use a CDN when you can use canonical features for deployment (Go embedd, Java jar etc) and serving (HTTP cache headers)?

* different work profiles (streaming large amounts of compressed data vs small JSON requests/replies) means that different software is more optimal for each -- an application thread usually is a lot more powerful and does a lot more (and thus is slower in the general case) than an nginx thread

* you're using your own compute on your own dime to do what your service provider is probably offering to do for cheaper

* bloats the image size -- there is a productivity difference between working with a 150MB image and a 1GB one (debugging, easily diffing changes)

Admittedly, these aren't problems for a one man startup or a very low traffic website, but these things do eventually matter as your system grows in size.


As for the first two: I would assume that you’d put nginx or some other reverse proxy (or a network of proxies) in front of your application server anyway.

With the appropriate headers and configuration you would have the same kind of performance characteristics and resource utilization. Am I missing something here?

As for the last point: you have 1GB of stuff in either case. And all of that stuff needs to be looked at. The difference is really the deployment: do you prefer to move everything in a single piece or would you rather make individual, fine grained deployments? Either has advantages and disadvantages.

I’m not sure how Go let’s you inspect embedded files in a binary though. There might be gotchas. An archive is probably more transparent and more general tools can be used.


At some point the image size does start to matter. Both for local convenience and for the time of deployment.

Also serving only the current version's assets will cause failures during a restart. If you have more than one container, there's a chance the page request will hit the new one, but the resource request will hit the old version - without the needed asset.

And once you hash your assets, you normally only deploy a couple of extra files, not the whole 1gb. I don't get why people would want to keep them embedded (apart from convenience for a third party who will run it)


Author said in a different comment[0] that some pages are dynamic, so it's probably convenience for their setup.

[0]: https://news.ycombinator.com/item?id=34953567


Yeah fly does static/cdn content just fine


So what's the state of the company behind fly.io? Because if it happens to be going through some stage of VC fueled bought growth, all talk about price comparison isn't worth the air exhaled for speaking.


I currently rent AMD 16 core server with 128GB RAM and 2x2TB NVME drives for around $100 / month from Hetzner. Price for equivalent hardware on Fly (without mentioning drives) is $1,126 / month.


Well yes of course. But then you have to manage everything yourself. The whole purpose of fly.io, heroku, raleway, etc, is that you don't have to care (much) about ops & sec. Handling it yourself (or hiring someone to do it) can cost way more than 1000$/m. When you grow big enough it might not be worth it anymore, but for smaller project, it can be really useful. And for the record, at the moment, I host most of my service on my own server.


This response would have made more sense if fly.io provided fully managed db services.


There's no anycast support and the ability to have servers around the world.


But what would it cost to run what you are actually using?


The author doesn't seem to explain the before and after cost difference. It used to be $9, and now it's what? Is the 10 cents talking about the AWS S3 pricing or the Fly.io pricing?


Sorry if that wasn't clear. This was meant to answer it:

> Fly.io looked more geeky and command line-oriented, which suited me, and their prices are also ridiculously low: free for up to three small virtual machines (I only need two), and $2/month for small VMs after that.

Their free allowance allows up to 3 small VMs and 3 GB of permanent volume space. I'm under that (2 VMs and 2 x 1GB volumes) so I get both my apps for free, and I could add one more. Even if I had to pay the regular rates they charge after the allowance, it'd be only $4.18 per month.

In any case, I was paying AWS $9/mo, now I'm paying Fly.io $0/mo, so that's what I saved.


The author saved $9 from the change.


One could consider using https://github.com/caarlos0/env instead of the hand-woven env var handling.


I had the same experience with them. Very simple to deploy when I moved over from Heroku. Was up and running within a couple of hours and only used UI to deploy.


Compared to Code + Docker + AWS Copilot CLI, am I going to see a 10x improvement in my workflow?


Could someone help me I understand What is makinh fly.io different from regular cloud service?


Depends what you mean by "regular cloud service", but broadly Fly.io and Heroku are examples of Platform-As-A-Service, versus AWS, GCP, Azure, and Digital Ocean which are more Infrastructure-As-A-Service. https://www.redhat.com/en/topics/cloud-computing/iaas-vs-paa...


don't know much about fly.io, but i would have definitely deployed my go binary + caddy w/config in a container, rather than setup a systemd service on the host. just me?


Good for them. Looking superficially at the application they're talking about, it's also possible that a more cloudy implementation of said software would have lowered the costs also on AWS side. Like using lambda functions instead of EC2.


Tip: Use cloudflare R2 for images and backups

Saves you traffic costs


cool transition and fun writeup!

for low, intermittent traffic sites, go on lambda might be a better comparison:

https://github.com/nathants/libaws/tree/master/examples/simp...


There are stories from the late 1800s/early 1900s of people creating private mail delivery services as competition to the US Postal Service.

There is even a story of the Federal government forcing these companies to make sure that the letters had postage even though they were NOT being delivered by the USPS. Customers would actually pay a premium over the postage to the private company because the overall experience with private company was better than the USPS.

I am wondering if something similar is going to happen with AWS: more and more private companies will charge a premium for services built on top of AWS (and therefore reliable) but much easier to use.

(I imagine this is probably already happening and I'm sure HN folks will point out examples).

PS In the end, these private mail companies were deemed illegal and went out of business.


Fly.io is not built on top of AWS. see https://fly.io/docs/reference/regions/


Not sure if still the case, but remember watching a talk (around 2021) where they mentioned they were using different providers. Among them AWS for some regions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: