Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Backblaze B2 Cloud Storage Now Has S3 Compatible APIs (backblaze.com)
641 points by pyprism on May 4, 2020 | hide | past | favorite | 279 comments


Backblaze is also the founding member of Bandwidth Alliance, meaning getting those B2 via Cloudflare is essentially free.

So you are only paying for Storage. ( Correct me if I am on this one )

I wonder why doesn't ALL non HyperScale Cloud Vendors, like Linode and DO provide one click third party backup to B2. You should always store offsite backup somewhere. And B2 is perfect.


Yev from Backblaze here -> That's correct. We're a founding member of the Bandwidth Alliance and you can find more info about that on our Blog (https://www.backblaze.com/blog/backblaze-and-cloudflare-part...) or FAQ (https://help.backblaze.com/hc/en-us/articles/217666928-Using...). May the 4th be with you ;-)


Hey I've been wondering this but are the transaction costs also waived or is it just the bandwidth fees? I feel like if both are gone it'd be hard to make a profit...


For that integration it's just the download fees. The transactions do still get billed, but many people can stay within the free daily allotment.


What's the point of the Alliance in this case?

if I use against-the-alliance corp, i will just pay the storage+download from backblaze, just the same as using in-bed-with-alliance corp.

Or does the alliance will ensure i only get charged for 1download per file per billing cycle, no matter if the CDN cache can't hold all the data and they download it many times? Does the alliance ensure (and more importantly, who monitors) it?


Hey good question, it's essentially a peered connection so you're not paying for the transfer between providers, you can learn more here -> https://www.backblaze.com/blog/backblaze-and-cloudflare-part...


Where can I find the pricing info for this allotment?


All of our pricing is available on the site (no need to call anyone) - B2 Cloud Storage is listed here -> https://www.backblaze.com/b2/cloud-storage-pricing.html. Just scroll down for the transactions!


                                    Storage        Download
    BackBlaze                       .5 cents/GB    1 cent/GB
    S3 (one zone infrequent access) 1 cent/GB      9 cents/GB
I'll transfer my 250GB of videos and images from S3 to BackBlaze some day.


Heh, we make it easy! Check out this Flexify partnership -> https://www.backblaze.com/blog/supercharged-data-migration-w...


What does "Charged for any portion of a GB." mean?


It means 1.01gb, 1.1gb, and 2.0gb are all charged the same


What about 0.1gb, 0.5gb? I guess that's my question. Is this a minimum charge per download?


Yev here -> No there's no minimum charges for downloads, it's just $0.01/GB down and $0.005/GB/Mo for storage, but everything is calculated on a byte hour level :)


Thanks! That's where I was curious.


Isn't this incompatible with Net Neutrality?

I think Cloudflare already was against net neutrality, but people who believe in that principle might need to avoid Backblaze as well if that is the case.


No, this has nothing to do with net neutrality. Your ISP isn't discriminating what data you consume, whether it's from cloudflare or another CDN.

This is about the fact that bandwidth is usually bought by capacity between networks rather than usage and they're passing those savings on to you instead of charging expensive per-GB costs like the major clouds.


I'm referring to Cloudflare as the network provider. It seems not-net-neutral to be charged different amounts depending on the relationship between the storage provider and the network provider.

If the argument is a semantic one, that net neutrality is so narrowly defined that it doesn't apply to them as a CDN rather than a conventional ISP, it still stands that the situation is not-net-neutral if we generalize that to include CDNs, which it ought to, since that's what the N stands for.


Network neutrality is a difficult concept to apply because its core concept doesn't map well to reality.

The core concept, in my mind, is that all packets should be treated equally, and should cost the customer the same amount per byte.

The problem is that different packets take different network paths and have different costs to your provider. The bandwidth alliance partners are exchanging traffic via peering links, so there's only the fixed costs of equipment and any port charges/interconnection fees to the peering facility. Packets sent to other destinations may pass through a paid transit link and cost the provider. If your provider charges you the same amount for both types of packets, they're not being transparent about their network costs which is bad for you, and is also bad for them, because you may be able to adjust your traffic so more is settlement free and less is paid --- but you won't do that without an incentive.

Of course, if you don't like how Backblaze manages its network, there's a bunch of other network storage vendors who you could switch to (many of which are members of the Bandwidth Alliance). Or, it's not too hard to build a storage box or two and get them into a colocation somewhere.

For your residential ISP though, for most people, if you don't like the network policies, you might have another option, but they will likely have similar policies. You don't have meaningful choice in a single residence, and moving residences to get better choices isn't a meaningful option either. So, network neutrality is a policy to (attempt to) regulate residential ISP behavior, in order to provide a reasonable policy for users. It's still a problem that it doesn't match reality, and it doesn't provide user choice, but it has gotten a lot of support.

I would rather see mandatory line sharing for residential ISPs, it's a lot easier to define and with proper regulation offers a path towards consumer choice, and gives people a way to take action if their provider has poor network policies --- if your provider runs an acceptable last mile service, but provides poor interconnection to the rest of the world, you could become a line sharing partner and provide better interconnection to the world without having to build out an overlay last-mile network (which is incredibly capital intensive and difficult)


Is it really that difficult a concept? Comcast charging Netflix for access to Netflix's customers, when Netflix's customers are paying Comcast for Internet access, is bullshit.

Reword it to be generic, and dress it up in more polite language, but that's what it is.

Comcast, nor any other ISP, doesn't get to shake down Netflix or Google or Facebook nor any service providers, big or small, on the Internet.

Sure, behind the scenes there are backbone providers and peering arrangements and BGP, but those are technical details. Network neutrality means as a consumer I pay an ISP and get connected to the Internet.

Whereas Comcast is a cable TV company, the game between companies of consumer chicken is not theoretical, eg

https://www.thedenverchannel.com/news/local-news/why-you-sti...

https://mynbc15.com/news/local/att-and-directv-have-removed-...

https://www.washingtonpost.com/business/cbs-blackout-on-dire...


> Is it really that difficult a concept? Comcast charging Netflix for access to Netflix's customers, when Netflix's customers are paying Comcast for Internet access, is bullshit.

I agree, that it's bullshit, but please consider:

A telephone company charges me to have a phone line and per minute to place and receive calls, and charges whoemever I'm conversing with to have a phone line, and to call me.

If Comcast wasn't large enough for Netflix to want peering, Netflix would probably use paid transit to get there (they might even use paid transit that Comcast would also have to pay for).

So, the reason it's bullshit can't be because Netflix has to pay for a service that Comcast is providing; because paying for network services is the norm. The reason also can't be because Comcast is charging both ends of the connection for the service it is providing to both, namely connecting the two parties; because both sides paying for connectivity is the norm in telecommunications.

There's some other reason it's bullshit. A big factor, of course, is that internet video services compete with the ISPs video services, and there's some implicit unfairness there. I would say it's because the residential customer is stuck with a very limited set of choices, and can't pick an ISP that distinguishes itself with open peering. It might be because the norm for internet peering is if there is a significant imbalance in traffic, the party sending the most pays; so the residential ISPs should be paying their residential customers; this one doesn't quite work, because it's not actually a peering relationship, and the norm is transit customers pay the transit provider for traffic in either direction.

> Sure, behind the scenes there are backbone providers and peering arrangements and BGP, but those are technical details. Network neutrality means as a consumer I pay an ISP and get connected to the Internet.

The problem here is there is no "the Internet" to connect to. An ISP needs to connect to substantially all of the other networks; and while a small ISP may simply use a single upstream ISP, where it would be easy for the small ISP to be neutral (bits in = bits out), a larger ISP is going to have a diversity of connections, and that's going to lead to defacto non-neutrality as some connections are bigger than others, and some connections are longer than others, and some connections are more expensive than others.

I fully understand the desire to restrain residential ISPs (and mobile ISPs) from anti-competitive behavior; I guess network neutrality might work for that.

But, when you apply it to other networks, it doesn't seem to make sense. In this case, you have a group of networks that would like to lower costs, both for themselves, and their customers, and they've agreed to settlement free peering, and to also not charge customers for bandwidth on the settlement free links. Network neutrality doesn't really speak towards the interconnection, but says that customers should be charged the same rate for all bandwidth, even when the underlying costs are different. I don't understand why that's a good thing in this case?


Mandatory line sharing is an interesting idea. We've tried it before, back in the DSL era (2000s). Not saying we shouldn't try it again, but it only sorta worked. Whether instructions came down from ATT corporate offices or were malice, or incompetence, on the part of locally contracted installing technicians; the shared boxes were the worst of the commons.

If a neighbor got from a rival DSL provider, your Internet was liable to go out until your ISP was able to get a truck out to fix the damage to your connection that the competition's technicians did.

Those laws are still around, but unfortunately, (A)DSL tops out at single-digit megabits. An upgrade from dial-up, but not competitive with todays broadband market for those with other options.


I'm pretty aware of the issues with past implementations of line sharing; a competent regulator would be required to iron out these issues. Of course, network neutrality requires a competent regulator as well.

Pricing was a big issue in the past --- where the 'wholesale' per-line tariff charged to the competitive carrier was more than the retail price charged to incumbent customers.

I think, in most cases, new install breaks other user types of issues are related to poor records of which line is used by which customer; that happens within the incumbent as well though (when I got ADSL2 installed, one of my neighbor's connections went out, and pretty soon there were three AT&T trucks on the street to work everything out). That particular issue could probably be solved by allowing the incumbent to manage all the installs and monitoring for install-time customer steering.

The federal mandate no longer covers line sharing, it only covers line leasing for copper telephone service; wherever your premises is directly connected to, if there's space for competitive equipment, the incumbent must make it available. Of course, most telephone companies have moved customers to remote terminals for better speeds, and there's no room for competitive equipment in the remote terminals; and a lot of telephone companies are replacing copper networks with optical networks, and those aren't covered either. The whole concept got submarined by the insistence that it apply to telephone networks and not cable or other "new technology" networks, and then deciding not to apply it anywhere in light of court cases that applying to one and not the other was unfair.


Cloudflare doesn't charge for bandwidth, it's as neutral as it gets.

The charges are from the cloud (storage) providers. And most of them charge for bandwidth usage, but they don't discriminate based on what data you send. It doesn't matter if it's source code, photos, or linux binaries so there's no net-neutrality issue here.

You can potentially argue that Backblaze is discounting traffic based on the upstream network but that's not active discrimination. There are hard costs to internet transit and some companies can offer better pricing depending on where the traffic goes. For example, downloading from Australia is more expensive than downloading in America because of the infrastructure costs, and this isn't considered a net-neutrality violation.


cloudflare is not charging different amount for different storage providers, some storage providers are waiving the fee when the upstream is cloudflare.

disclaimer: ex-cloudflare


So, Backblaze is not-net-neutral.


As far as our service is concerned everyone pays the same price for B2 Cloud Storage, $0.005/GB. We're not a network provider and in this case neither is Cloudflare. We are both service providers. The partnership between us simply allows people to move data between us freely, but you still need to pay Backblaze B2 for the storage of the data, and you need to play Cloudflare for the distribution!


You are ignoring the responses to your original point. This is not relevant to net neutrality.


I think I addressed that comment, actually!


Yev here -> unsure how those are related, but you certainly don't NEED to use Cloudflare if you're using B2 Cloud Storage as the data origination point, it's just an option that was brought up in the thread!


It sounded like the price would be different if you used a certain network provider, which is not-net-neutral.


I believe in this case Cloudflare and Backblaze would be considered service providers, not network providers (like Comcast or AT&T, etc...). And you'd still need to pay each service individually for the service they're providing. It's an alternative to say, Amazon S3 and their own CDN which also does not charge for egress in between the two services (since they own both). But in our case, you pay Backblaze B2 Cloud Storage of the storage, and you pay Cloudflare for their CDN capabilities, the partnership simply makes the transfer between our storage and their CDN free.


Transit does cost different amounts of money depending on who you negotiate transit with. That's not part of net neutrality.


Airlines have alliances, it does not stop you from taking any airline you want for your route, but costs may vary.

It is just a property of networks. As an end user you still get to use all providers, but two of them may use a dedicated connection to reduce internal prices (the price benefit of which they may or may not give back to the user).

Neutrality from the point of the user is not affected when it comes to service being available.

Now if one airline alliance said you are not allowed to get on their flight if you hoped off from a rival alliance, then we have an issue.


This is so mind-blowing-ly crazy and awesome. I've always loved the idea of Backblaze but never tried it. But this combination is so insanely good and valuable. When an efficient market/economy works correctly, the results are just amazing: Cloudflare and Backblaze essentially got their unit costs for their respective products so low that for just a small money fee, I can do something that would have been prohibitively expensive maybe just a decade ago.


On the note of unit cost I doubt it will be much lower in the near future. Cost / GB isn't falling much if at all, and their Load balancer are already eating into their margin which means any further reduction in unit cost are only gaining back what they took in with the S3 API. Hopefully the new S3 compatible API means they could make all that back with volume.

But this is also a sad realisation that storage tech aren't getting cost reduction any further.

( Of course Blackblaze can always prove me wrong :D )


Wasn't there a startup on hn some weeks ago that used a crypto mining approach to server management that was much cheaper?


Disclaimer - Backblaze employee here, but just speaking for myself:

It was.. sort of cheaper. They didn't actually build the servers, and as described the server wouldn't work (onboard SATA didn't support port multipliers, lack of ECC would probably cause problems in practice, bit hand-wavey on power/space/network/manpower costs, etc). The goal of the article was to get other people to build cheap storage and put it up for rent on their network. They do have some amount of storage space available for very cheap on the network now, but personally I suspect it's people who figured "what the heck, I'll give it a try!" as opposed to people actually building storage servers and making a profit renting them out.

I was honestly pretty disappointed - I'd hoped they'd found a cheap motherboard with ECC and support for port multipliers, but nope.


With the client responsible for chunking, redundancy, and error correction, I don't think the lack of ECC really matters for their use case. The rest of the issues are more important.

Edit: The motherboard they picked does support ECC memory, anyway. In general ASRock models do.


I haven't dug too deeply, but it's unclear whether anything but the Asrock Rack boards have full validated ECC implementations for Ryzen (vs just using the memory but ignoring ECC), and I think that may depend some on which Ryzen CPU is used (maybe PRO-only). I'd love a source of better info there, though.

You're right though - since the client's doing the work and they have a lot of redundancy/diversity in the storage it's not as big of a deal for them as it would be for us. I'd be a bit wary because the client-only verification does mean that there's no verification-with-ECC step in the entire chain, but I'm not sure that's significantly worse in terms of actual risk.


How far off do you reckon Backblaze is from designing your own motherboards + electronics, with just the pieces you need? :)


Pretty far. Since we cram so many drives into each server, our total server count is actually relatively low for the amount of storage we have. I'm not sure exactly how many units you need to amortize the design costs across to make it worth it for a custom ODM design, but I suspect it's in the tens of thousands.


Yes and it had little redundancy with limited up time and speed guarantee, with no S3 API compatibility. And if I remember correctly even that was $0.003 per GB, with lots of unit cost that a lot of people including me thought to be miscalculated, where it should be closer to $0.4 for it to be somewhat break even.

So may be it is cheaper, but not really comparable.


How does that work? Can I just dumb two terabytes of video into Backblaze B2, setup a Cloudflare account and have people watch those videos with it costing me only $10 a month? Because that doesn't sound right.


The Cloudflare ToS explicitly exclude that use-case.

2.8 Limitation on Serving Non-HTML Content The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other functionally equivalent applications and rendering Hypertext Markup Language (HTML) or other functional equivalents. Use of the Service for serving video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.


In a previous HN post [1] where someone wrote up how to use B2 as an image host, the Cloudflare CEO chimed in and addressed rule 2.8 specifically, saying if Cloudflare workers are used for URL prettifying and redirecting, a different ToS is applied and that use-case would be fine.

Does that mean your video use-case would also be fine? I have no idea. An HN comment from the CEO doesn't seem like it would hold up if Cloudflare suddenly shut down your free account.

I'd love for Cloudfront to officially clarify the limits of the Cloudfront/B2 alliance in terms of external traffic. The confounding issue here is that B2, as a storage service, is not really intended for "serving web pages and websites" — it's for larger files, binaries, etc. — and therefore any traffic from B2 going through Cloudfront is sort of de facto in violation of 2.8.

[1]: https://news.ycombinator.com/item?id=20790857


Disclaimer: I work at Backblaze so I'm biased. :-)

> B2, as a storage service, is not really intended for "serving web pages and websites" — it's for larger files, binaries, etc

It might be missing a couple features (which is a pet peeve of mine) but we SURELY intend for it to be used for serving web pages. That's one of the largest differences between "Backblaze Personal Backup" (our original product line) and Backblaze B2. The largest parts of the redesign/refit when we originally did B2 was around the concept of what we call "Friendly URLs" (web page names, folder names) instead of just ugly 82 character hexadecimal file names like Backblaze Personal Backup stores all your files in.

For full disclosure, Backblaze B2 isn't a great "hosting" solution for something like WordPress because we lack two or three things, one of which is comically easy to fix and I keep trying to convince everyone to do it. The issue is URLs that end in a "/" (trailing slash) basically need to "guess" that after that is an ".html" or ".php" or whatever. So the URL: https://f001.backblazeb2.com/file/ski-epic-c/full/2015_scotl... does not work, but the URL: https://f001.backblazeb2.com/file/ski-epic-c/full/2015_scotl... does work. All modern web servers do this automatically filling in of the "index.html", but it is missing from Backblaze B2 currently. And it would take just a day or two for one of our developers to fix it. And dang it, I'm going to get it done one of these days.


S3's use of separate servers for website hosting is actually very sensible. Options like usage of index.html and error.html only apply on the website servers and won't cause any surprises for people using the service as a key-object store.

That said, I would absolutely not consider using B2 without support for index.html, error.html, and Website-Redirect-Location.


You can trivially use Cloudflare Workers to implement that functionality on top of B2.


Please be inspired by how Netlify handles url rewriting through their netlify.toml file, in doing so, if it will be possible. :-)


Sounds like workers are fine in general, especially on the non-free tier that only increases the budget by $5 a month.

That covers 10 million requests, which could each be 10MB chunks, or 512MB cached files, or possibly larger.

(A raw mp4 needs about 3 requests to start plus one per seek.)

But I wouldn't be surprised if that doesn't scale to enormous amounts of data.


To clarify, the key line is "a disproportionate amount of non-HTML content". Serving videos isn't prohibited but CF still has to prevent free plan websites from becoming huge loss-leaders.

Another big asterisk is that this applies to all proxied (DDOS-protected) content, not just the ones that use the CF cache. CF pays for all of their uplink bandwidth out-of-pocket regardless of if it's cached. You can see what happens when you proxy multiple terabytes of content on the free plan in this thread[0] (again all proxied bandwidth costs CF, which is why this user had their zone unproxied).

0: https://community.cloudflare.com/t/the-way-you-handle-bandwi...


From what I can tell that's more about their caching service. It mentions exceptions for hosting from a paid service and correct me if I'm wrong but wouldn't Backblaze be that paid service?


They are talking about their own paid service, obviously. I don't think they care whether you are paying for someone else's product or not.


Perhaps they are talking about a paid service that they offer,like Cloudflare Stream https://www.cloudflare.com/products/cloudflare-stream/


This is huge if true - folks keep on saying Cloudflare can replace cloudfront for free. If we could put 20TB of binary content for software distro / videos etc on B2, then stream by cloudflare I'd actually believe the claims.

Every time we've tried this with other "free" providers there always seems to be "fine print" when you start pushing huge bandwidth on these "unlimited" plans. It seriously is not worth the time at some point.


Cloudflare's main service excludes video.


This is key for many.


The way I understand, you can do this with images, eastdakota confirmed that here earlier https://news.ycombinator.com/item?id=20791660 not sure about videos.



I believe you still pay Cloudflare costs, but the traffic between Cloudflare and B2 is free on both platforms. But it might be worth double-checking the fine print.


Isn't the Cloudflare CDN included in the free plan as well?


It is, but they cap out the file size they will cache on the free plan to be 512MB. So you would need to chunk up your videos to get free bandwidth.


> So you would need to chunk up your videos to get free bandwidth.

Having a functional video player on your site or in your app (e.g. one where you can skip to arbitrary times without requiring the video be buffered up to that point; or where a video can be "resumed" from the middle if you leave it and come back) already requires that you use MPEG-DASH or HLS; which in turn implies/necessitates pre-chunking, no?

Is there some use-case where people are currently serving 400MB contiguous video files from a CDN? I can't think of one. YouTube doesn't. Netflix doesn't. Even porn sites don't.

I guess Archive.org has some large video files on there in various places, that can be direct-downloaded; but the recommendation Archive.org itself makes, is to consume those via BitTorrent. Presumably they don't have a CDN partner willing to handle their unique workload for cheap.


You don’t need DASH or HLS to seek to an arbitrary point quickly. The mp4 container format has an index which stores playhead time -> byte. This index is either at the end of the file or beginning, and tools like qt-faststart move the index to the beginning, which makes videos start much quicker when serving over http. Browsers will use the index to issue range get requests an be able to seek just fine.

Serving a 400mb video via a CDN is highly dependent on the CDN. Some will construct a cache entry from a slop of range get requests and translate them to get the missing pieces and work brilliantly and other CDNs should be avoided.


ffmpeg.exe -i input.mp4 -c copy -movflags faststart -y output.mp4


> Having a functional video player on your site or in your app (e.g. one where you can skip to arbitrary times without requiring the video be buffered up to that point; or where a video can be "resumed" from the middle if you leave it and come back) already requires that you use MPEG-DASH or HLS; which in turn implies/necessitates pre-chunking, no?

Browsers are smart. They only buffer a few megabytes at a time and can seek around pretty efficiently.


That requires the video to be encoded in a way where you can just start reading the stream from any random byte offset, and everything will still work. Video files are not usually encoded this way (any more.) Resume an MP4 or MKV video half-way through, without reading the TOC-ish stuff from the first chunk, and you'll get garbage that maybe resyncs after 20 seconds.

It's totally possible to "encode for streaming", but it usually results in both an increase in overhead [more keyframes] and a decrease in quality [inability to use predictive interpolation, instead relying only on forward-interpolation.]

Mind you, this streaming-enabled encoding is how things were done on the web, before the advent of MPEG-DASH/HLS; and it's still how e.g. the MP2 encoding of digital cable/satellite video works. But we don't really want to go back to those days. They kind of sucked.

Jumping to random byte offsets in a video also tends to screw with any embedded data streams like subtitles or thumbnails, which tend to just be stored in most media container formats as a single chunk at the beginning/end of the file, rather than being spread or copied across the stream. Again, the kind of captioning done back in the MP2 days is immune to this, but it kind of sucked as well (e.g. it wouldn't trigger if you happened to skip to the millisecond after the instruction for it appeared in the stream, often leaving you with ~30 seconds of untranslated audio.)


I don't think this is quite right. If you serve an mp4 with H.264 video statically on any basic webserver it will just work in browser through plain HTTP, without need need for MPEG-DASH/HLS. Every widely used mediaplayer/browser just downloads the nearest chunk (keyframe point) behind the time that was seeked to can resume the playback from there. This point is found through an index stored in a container format. For basically every video format these days (say, at least as new as H.264), regular settings make this only a few more seconds of video to download and decode before the seeked point and basically happens instantly for normal online consumption. In H.264 forward prediction (through two-pass encoding) will playback fine too.

I think what you're saying applies more to a setting where the video is being streamed live, so that you cannot access the start of the file to get keyframe metadata. In that case HLS and MPEG-DASH help.


That's odd. I can't recall ever having a problem with browsers playing mp4s in a vanilla <video> tag, as long as I encode them in the main h264 profile, AAC audio, and MOOV atom at the front (see [0] for ffmpeg command). Obviously the server has to support Range: byte requests.

My impression is DASH/HLS are mostly useful for adjusting bitrate on the fly.

[0]: https://superuser.com/a/438471/402047


When the parent talks about jumping to random byte offsets, they mean you don't have the first part of the file at all. You just have an arbitrary 512-MB chunk out of the middle.


But they were claiming that a single monolithic file would break too, which is not the case. The browser does a range request to get the first part, then a range request to get the part you're playing, and it works.


Right. Like tuning into a digital-cable signal "in the middle"†. You just get bytes of the stream starting from an arbitrary offset, without having seen/processed anything before that (and without even being able to request anything before that), and you need to resynchronize from what you've got.

† I mean, a digital-cable video stream is always "in the middle" unless you're just starting a VOD stream, but still.


The browser just reads the TOC-ish stuff from the first chunk. Trust me, it works. I regularly load plain old multi-hundred-megabyte mp4s in my browser, off a web server, and skip around without problems. The default keyframe interval from x264 is fine. You don't have to do any horrible things to the encoding, you just have to start loading a few seconds before the seek point. Which the browser does automatically.

Do browsers even support embedded subtitles?


CloudFlare's free plan can and does end at any moment, I wouldn't rely on it for any serious application


Curious as well.


Does AWS (and GCP and Azure) not do the exact same thing when transferring data out via Cloudfront (or their respective CDN)? Is Backblaze really special in that regard?


(backblaze ceo here) AWS is free to Cloudfront because that's an AWS service. AWS/GCP/Azure are not free (and quite expensive) to Cloudflare. Backblaze is free to Cloudflare.


And Cloudflare is also the winning reverse proxy caching provider, the one that people tend to use even despite transit charges to AWS/GCP/Azure.


Bit confusing. Need more clarity. Read 10 times. Pls recheck mr. ceo :)


Cloudfront costs a non-trivial amount of money per GB: https://aws.amazon.com/cloudfront/pricing/


How does that compare to Cloudflare? Is Cloudflare completely free for CDN? I can't find a clear pricing page.


https://www.cloudflare.com/plans/

Cloudflare dont charge per GB. They are more like per "feature" CDN.


Just a year ago B2 couldn't do server-side file copying.[1] If you wanted to rename or move a file you had to re-upload the whole thing (not great for large multi-gigabyte files)! That ruled them out of consideration for storing my personal backups.

Glad to see they've since fixed that, and with this update are clearly continuing to improve ergonomics. I'll have to give B2 a fresh look.

[1]: https://github.com/Backblaze/B2_Command_Line_Tool/issues/175


Yev here -> thanks! We're constantly working on making the platform better, and copy-file was definitely a widely requested feature! That plus S3 Compatibility, for folks who wanted to integrated with B2 Cloud Storage but didn't have the resources to write code to our B2 Native API.


I've been using B2 to disrupt a bunch of ugly & entrenched vendors in the price sensitive K12 market. Thanks for building it. :)


Ah that's awesome! I'd love to more know! How are you using B2 in general, and does this make it easier for you? If you want to leave a note here, or you can send it to: b2feedback@backblaze.com!


are there any plans to host public datasets like aws pds ?


Can you share what you do with B2? (Are you hosting educational content? Something else?)


I just finished writing a new feature for internal use using S3. I hadn't even considered Backblaze at the time, but now that I am seeing this news we may end up using it, considering there is virtually no cost (to switch) and we haven't deployed yet.


For backups, mapping one file to one object rarely works well. Tools that use this strategy come with a long list of scenarios requiring expensive operations. For instance renaming a directory or changing a few bytes in a large file.

On the other hand tools that don't view the object storage as a file system have way less gotchas.

In my experience B2 + restic works really well.


That's a fair point but this is for my family photos and videos; in the event I die I'd rather the S3 bucket I hand over to my wife/kids in my last wishes look like a real filesystem rather than a bazillion blobs that require a special tool with a programmer's expertise to reassemble


" ... in the event I die I'd rather the S3 bucket I hand over to my wife/kids in my last wishes look like a real filesystem rather than a bazillion blobs that require a special tool ..."

You'd need a cloud storage provider that just gave you a plain old UNIX filesystem to do whatever you want with.

It's too bad nobody does that ...


I was just about to pull out my rsync.net pitchfork before I saw the username. You should throwaway next time so as to not deprive me of the pleasure.


You'd need a cloud storage provider that just gave you a plain old UNIX filesystem to do whatever you want with.

Doesn't iCloud Drive fit the bill?

Access to it is slightly obscure, i.e. ~/Library/Mobile Documents/com~apple~CloudDocs/ but wouldn't that work?

It's free for me to use. That's because I'm already paying Apple $10/mo to backup the family's iPhones. We're only using a little over 200 GB out of the 2000 GB we have. (I'm sure that Apple is counting on most people not using their full amount).

I've only put a few files out there, so maybe there are a lot of potential pitfalls. But it doesn't get much simpler than using cp or mv.

In reality it's most emphatically not a "plain old UNIX filesystem". Apple is doing some magic and storing blobs out in Amazon S3 or in their own datacenters. But to me it has the appearance of a Unix (Posix?) filesystem.

I realize that rsync.net couldn't survive with a business model that limits users to 2000 GB, which is Apple's maximum. But I thought I'd mention it, since it just might be the perfect "free" solution for a lot of people.


Wouldn't that require his wife to install and configure an SFTP client or something? B2 lets you use any browser.


It's too bad you guys cost ~2x as much for storage as S3 when I evaluated you in 2018... ($0.04/GB vs $0.023/GB) ;) Glad to see you're beating S3 in $/GB now!


May I ask which solution you ended up going for in this situation? (I'm trying to solve for myself too.)


rclone is a nice way to sync files to B2. It also has a mount option that quite literally mounts the cloud storage as a filesystem, though this requires Linux and a little bit of know how. However, in a pinch B2 has a serviceable web interface, and Backblaze will even ship you a drive of your files if you request it, so I think it would be pretty usable by just about anyone.


I have a local ZFS pool of hard drives and a script that `rsync -avz`s my iPhone's photos+videos to it. Then a separate cron script that periodically syncs from the ZFS pool to my S3 bucket using `aws s3 sync`. The S3 bucket has versioning turned on so it's effectively append-only.

I used to be able to trigger my iPhone -> ZFS script when I plugged my iPhone into my Ubuntu desktop using udev (also had to wrap it in flock[1] because it would trigger multiple times for some reason), but at some point that stopped working and I've been too lazy to figure out why.

It's far from perfect but for me it works alright. In this scenario I prefer straightforward and slightly kludgy compared to something with hidden complexity that could go wrong in so many ways. Could you imagine if you used a tool like restic or borg and the pack encoding format changed, or if the tool sources are simply gone when your relatives have to figure out how to get at the files in 10, 15, 20 years - I don't want my relatives playing code detective or archaeologist!

Which reminds me of a downside to using tools like restic, borg, and the like I forgot to mention. When I evaluated them for my hundreds of GB of family pics+videos, there is a "dedupe" step that all these tools want to perform. When I tested them a couple years ago they were dog slow for my files, because pics + video are already highly compressed and there is very little "deduping" you're going to wring out of them unless you have multiple copies of the same files. IIRC borg took several hours to run at at the end it reported 0.01% or less deduping efficiency. Also as I recall there was no way to opt-out of the dedupe step due to the way borg stores "packs". Very annoying!

[1]: https://stackoverflow.com/a/169969/215168


> For backups, mapping one file to one object rarely works well.

Same for archiving. At my former workplace we however figured that out only after the horse left the barn.


Here is the reasoning why they didn't have "s3 compatibility" before: https://www.backblaze.com/blog/design-thinking-b2-apis-the-h...


>It requires Amazon to have a massive and expensive choke point in their network: load balancers. When a customer tries to upload to S3, she is given a single upload URL to use. For instance, s3.amazonaws.com/<bucketname>.

Now that Amazon has deprecated the single URL version and replaced it with region-specific URLs (e.g. s3.dualstack.us-east-1.amazonaws.com) and tooling has been mostly updated, this huge reason for not supporting the S3 API is gone.


Even though they are using region specific URLs, they would still have to load balance all of that traffic. Backblaze avoided this by having a 2 part request for a file. You would make a request to a centralized url and then that would return a url that connected you directly to a server that had that file.


> You would make a request to a centralized url and then that would return a url that connected you directly to a server that had that file.

Heh, kinda like how FTP worked. That's funny to see again.


So how did they manage to get rid of those hidden costs? Or is the new S3 compatible API more expensive?


Disclaimer: I work at Backblaze.

> So how did they manage to get rid of those hidden costs? Or is the new S3 compatible API more expensive?

The new S3 compatible APIs are the same cost as the original native B2 APIs.

I'm the author of that original blog post, and we were able to get rid of SOME of the internal costs, but in the end we ate the cost of the load balancer. Internally I voted that we externalize that cost, but it's in the spirit of our "no-sales-friction" pricing model to get rid of decision points for customers and just let them get their stuff done.

The internal math is that we believe the additional storage that (hopefully) will result from supporting the S3 API will help make up the cost of the upload balancers. It eats into our margin, but not by enough to justify a "friction pain point" that might reduce sales as customers struggle to decide which API to use. When you do the math over thousands of customers, we make the vast majority of money simply from renting the storage. Many many many customers upload things to Backblaze, and let them sit for long enough, that the cost of the load balancers becomes pretty small as a percentage.

One of our goals of the pricing of Backblaze Personal Backup (flat fee of $6/month regardless of how much data you backup) and not charging much for transactions is we aren't trying to nickle and dime customers. We just want to make our margin and provide a solid service that makes customers happy.


Are you guys apprehensive at all that supporting S3 might bring in more lower margin customers (ie higher transfer GB/store GB ratio)?

In any case, thanks for B2 (happy customer) and good luck. Sounds like an exciting time.


(backblaze ceo here) We're happy to take customers regardless of which API they choose. Sure, we make a bit more on our B2 Native APIs, but ultimately I'd rather we make it easy for them to use us how they wish.


It's possible they're eating the cost of the load balancing necessary to multiplex from their backends to client requests, which should theoretically stoke an increase in business due to reduced switching costs (and what timing, with an economic contraction likely pushing cost reductions at those needing cloud storage).

Disclaimer: Happy Backblaze Mac client and B2 customer, no other affiliation.

EDIT: @yev: I took the signal out after the sibling reply :) Appreciate the responses as always. Please stay awesome.


Yev here -> saw my Yev-signal. That's right, Gleb actually answered that question on our blog (https://www.backblaze.com/blog/backblaze-b2-s3-compatible-ap...) but we're eating the cost. Om nom nom.


Interesting. Implementation wise, is it some form of Minio gateway + hardware?


No gateway - native code written from scratch and optimized for our environment...sitting on top of regular, inexpensive hardware.


When you say regular, I think you mean awesome Storage Pods that were lovingly purpose-built for affordable and efficient data storage (https://www.backblaze.com/b2/storage-pod.html).

*Edit -> Well, those plus the load balancing servers =D


From Gleb (CEO) comment on the blog. "Yes - B2 is still a more cost efficient API as it allows customers to connect directly to the final storage location. However, we have built a highly cost efficient load balancing system as we always have - using software to optimize inexpensive hardware - and are swallowing the additional costs for our customers."


Swank. One of the reasons I'm not using Backblaze is because I couldn't find a way to generate a private url which allowed secure upload from the browser. It only allowed (so far as I can tell) a private url that had access to an entire bucket. If they've got an S3 compatibility layer now, this problem is solved. I'm gonna invest some time on this tomorrow.


This is huge because it means you can use things like S3 Fuse to mount your storage. Which means you can use it to extend your local disk, or run your own backups, or whatever.

Amusingly the price to store 1.2TB of data is the same as the cost of their backup plan, so if your disk is smaller than that, you could save a few bucks running your own backups. Until you have to restore (from what I can tell restores are free on their backup plans but would cost money on the S3 plan).


> Which means you can use it to .... or run your own backups

You could, but if i read correctly (s3fs-fuse limitations): "random writes or appends to files require rewriting the entire file".

So changing 1 bit of a 10GB file, means re-uploading 10GB.

https://github.com/s3fs-fuse/s3fs-fuse#limitations


This changed in 1.86 and I updated the README as follows:

> random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy

Now changing one bit means re-uploading 5 MB, the minimum S3 part size.


Only if the blackblaze implementation supports put byte range... Not supported by default.


rclone has long had a Fuse mount feature using the original B2 API.


Sure, but S3 Fuse is more mature, more stable, more feature complete, has a lot more usage and therefore a lot more visibility for possible bugs, especially corruption bugs.


I migrated a client from Cloudinary($1k+ /mo) to B2, a Go+ImageMagick program running in DigitalOcean and Cloudflare CDN for a total of $60 /mo. It’s been running for two years now, B2 has been incredibly reliable.


Yev here -> That's awesome to hear! Glad we can make things more affordable for you and that it's working great!


As a developer that supports B2 (I write ExpanDrive) I think it’s great that they are moving on from an API that doesn’t expose any extra value.

That being said, I wish B2 performance was better. Throughput is dramatically slower than S3.


B2 pricing is sooo attractive ... But I have been stopped by the much lower performance compared to S3 ... So unfortunately I can confirm this point :(


When you say performance, do you mean upload, download, or both? I can deal with poorer upload performance, and mitigate poorer download performance by leveraging a decent CDN like Cloudflare.


What region are you moving from/to? Last I checked, b2 only exists in datacenters on the US west coast.


(backblaze ceo here) We also have a region in Europe: https://www.backblaze.com/blog/announcing-our-first-european...


Please consider an Asia/Pacific data center. I am from India and my company was not able to use B2 due to high response times even from European DC. Even a DC in Singapore will be helpful for us.

- Thankful Personal Backup Customer


Bandwidth in asiapac is very expensive for non-incumbents and India is no exception.

I find it bizarre how in India you can get 100GB of LTE for a few dollars but cdn bandwidth can cost content providers more than that - which is absurd.


Mobile broadband is witnessing intense competition in grabbing customers as millions of rural Indians are coming online. This started with a Petro-chemical billionaire starting Jio Network and giving free unlimited 4G data for a year(his company has 300million subscribers now).

Already 4 networks have exited the market and 3rd & 4th largest networks(Vodafone and Idea) have combined due to cash crunch. Airtel(earlier largest) has been raising outside money in hopes that it can survive the low prices. So there are only 4 networks remaining. Only recently they started increasing prices.

That billionaire is also going into Fibre(purchased his bankrupt bother company's infrastructure), maybe we'll see that competition extend to DC and interconnects.



It remains the case even if you’re only a few ms away.


Used B2 heavily until recently as a origin server for a CDN. Few weeks ago we saw a spike in 502 / 504 responses.

When I contacted their customer supported , I was pointed to the following URL where they explain in detail how they handle these errors https://www.backblaze.com/blog/b2-503-500-server-error/

Essentially they are not considered as errors and expect the client to retry loading the file. This approach won't work in our use case.


You can definitely get read errors and your application should be aware of this and handle this use case. Amazon's own S3 is not immune to this and after years of running spark jobs which shard output across thousands of files on s3 (and load from the same) you'll see these underlying http errors on a daily scale even intra-region.

Even if you're doing multi region s3 replication you'll run into this for external clients semi-occasionally.


If you are using Cloudflare you could use a Worker script to automatically retry the origin pull and only cache it on success.


Retrying a failed network request seems like a normal thing to do -- whether its a network timeout, server error, or whatever random hiccup happened.


So, you're relying on the API being 100% reliable? no errors?


I not expecting 100 % reliability.

but when we get a 503 response it should be considered an an error and acknowledged by the provider that it's an error.

in my use case, we were using a CDN which was configured to pull files from B2. When B2 responds with a 503/500 I have no control on the retry mechanism.

The error rate was around 5-10%


Huh, I thought it already had this! Must have mixed it up with a different object storage service (maybe DigitalOcean?).

I've been using B2 for backup storage for some personal projects. It doesn't necessarily do anything "better" than S3 from what I've seen, but never having to log into AWS's dashboard is a reward enough on its own.

They do have a command-line client that's a quick PIP install, so you can do something like:

  b2 upload-file bucket-name /path/to/file remote-filename
Which is, of course, nice for backups.


I really wish the B2 client support uploading file from Unix pipe. It would be nice to be able to archive a huge directory into a tar.bz archive and directly pipe the result into the B2 client without having to save the archive into disk first.

Currently I have to save the tar.bz archive to disk first before uploading to balance. Took several hours to do so (huge spinning disks array, not as fast as ssd), while uploading to B2 is blazing fast. Saving the archive to ramdrive essentially solved this, but as the data grows I don't have enough memory to spare anymore for a ram drive that can fit the whole archive.


Can you use process substitution?

    b2 upload_file bucket <(tar -cj huge-directory) archive.tar.bz2
The argument the command sees will be something like "/dev/fd/42", and the shell will provide the output of tar through that file.


So apparently process substitution doesn't work. The b2 client is probably trying to read file size or something, keep failing with "ERROR: Invalid upload source: /dev/fd/xx". Maybe their api requires knowing the filesize upfront instead of allowing "streaming" upload.


Does process substitution actually wrote the content to disk first or not? The information on internet I found seem to be conflicting on this. If it's actually writing the data into disk first, then it's probably won't solve my problem (limited disk i/o). Afaik writing to pipe won't result in saving the data to disk temporarily. I guess the only way to know is to try it out on my system and see how it performs.


If process substitution doesn't work, shouldn't /dev/stdin work? I haven't tried it, but as long as b2 doesn't try to check the file size before uploading I don't see why it wouldn't work: b2 upload_file bucket /dev/stdin < file


It definitely doesn't write it to disk first. It's basically a pipe() under the hood, but exposed as a file descriptor. Downside is that seeking doesn't work, but that shouldn't affect your case.


Thanks for the idea! I'm going to try this.


>Huh, I thought it already had this!

Same since it seems to be on the new storage service launch checklist right after "buy hard drives".


Ooo, even more reason to set-up a NextCloud instance now! Previously, it wasn't really practical to set-up B2 as external storage because you'd need to also set up a compat layer.


I set this up yesterday, and it was a breeze.

Just had to be sure to omit the B2 external storage folder from the backups on my Nextcloud server.

Now only if Virtualmin (YC ‘08) supported virtual server backups to S3-compatible B2 cloud storage… There’s an open ticket for this at https://www.virtualmin.com/node/65024


S3 is now the standard for cloud storage APIs? Not sure if that's good or bad. I guess competitors have to reduce switching costs.


I think it essentially always has been. It was there first, [GCP copied it](https://cloud.google.com/storage/docs/xml-api/overview), [OpenStack has compatibility middleware](https://docs.openstack.org/swift/latest/middleware.html#modu...).


Yev here -> It's not so much a standard, though S3 is generally the most often used suite of APIs. 100s of integrations exist with our B2 Native APIs (https://www.backblaze.com/b2/integrations.html), but a lot of folks only know how to write to S3 Compatible APIs and don't have the resources to write to multiple API suites, so this makes integration easier for them!


That's how something becomes a standard... You are just responding to market realities.


Digital Ocean, OVH and others seem to use the S3 API. It does give you the benefit of clients written against S3 can now be used for your platform, and unlike a lot of AWS the API contract seems pretty reasonable and well kept.

Edit: Confused Hetzner and OVH, sorry


Hetzner offers S3 compatible storage? I didn't find anything about that...


Woops, looks like I mixed up their storage offering with OVH. Sorry about that!


That's a good question to ask. S3 has become so common that it's going toward that road.

It's like brand names that become so common that people use no more the material name, but the brand.


It took me quite some time to realize that TRX is actually a brand and not some abbreviation for the trainers :)


I don't see Google, MS Azure or Rackspace switching their API anytime soon, but yes, smaller players and new ones do.


(backblaze ceo here) Google and Azure continue to offer their own API as do we. They also offer S3 compatibility options to support customers who want to use S3-compatible products...as now do we ;-)


Now that we're talking about B2, has or is anyone using them for latency-sensitive small-file object storage? I'm about to take the plunge and set up benchmarks, my use-case is that I want to store and serve ~ 500k small files (30b-1MB) per day to website visitors. So far B2 support has told me that it shouldn't be a problem, and early benchmarking indicates the same, just curious if anyone had stories from the trenches.


We use B2 to store images on Vintage Aerial (https://vintageaerial.com), both high res scans and all kinds of thumbnail sizes.

It is... a little slower than I'd like but with Cloudfront in front it has been manageable. I love tips from Backblaze on how to increase performance there beyond caching to CF.


Can you go into detail what you mean by "slower than I'd like"? Are you talking about TTFB (Time-To-First-Byte) or sustained read or concurrent read performance? Are you using the API or the HTTP endpoints from a public bucket?


CloudFront or Cloudflare?


I'm seeing horrible TTFB from Europe. Most files are fast but sometimes a request is stuck for 10s or more...


Is that to their US datacenter or the European (Amsterdam) one? So far the European one has been pretty snappy for me, ping is ~20ms from my home ISP (Germany) and TTFB is good enough that I can instantly saturate my 300MBit home ISP line with concurrency of ~ 500 downloading small objects (5b-1MB), but I haven't looked at min/max/avg/median yet.

BIG gotcha btw: You have to choose between US and EU when you create your B2 account! You can't have buckets in the other location, so that means you'll need two accounts if you want to do that.


(backblaze ceo here) Just fyi, you can put both accounts into a single Group for easier management: https://help.backblaze.com/hc/en-us/articles/115000014914


It usually works , but we have had some issues with high loads, please check my other reply in this thread.


Does Backblaze offer strong consistency for files?

The killer feature of Google Cloud Storage in my eyes is its ability to be strongly consistent, if you set the right HTTP headers. This is not possible for Amazon S3, which is always eventually consistent and makes it unusable for many use cases where you need to be able to guarantee that customers will always see the newest version of a file.


Nilay from Backblaze here.

Yes - B2 is strongly consistent. When you upload an object using either the B2 Native or S3 API - the object is persisted to the final resting place before the upload completes. Therefore, you can list/download the file immediately after your upload completes.


I know that the underlying backbaze b2 is strongly consistent because of how they shard it. You get a different download endpoint depending on your bucket corresponding to the data centre. Not sure how they implemented their S3 endpoint though so that will be interesting.


As a Synology user, please let this mean that Hyper Backup can work with B2 now (or at least soon).


Nilay from Backblaze here.

Synology Hyper Backup absolutely works. Details are here: https://help.backblaze.com/hc/en-us/articles/360047171594

(If you saw my old answer, ignore it. I was misinformed.)


I was trying to see if this was now better than Glacier- and aside from the SLA's being much better in terms of retrieval, I am not sure they make sense for a backup use case- where you are only really planning on downloading that data back down in a worst case scenario. It may depend on what your incremental backups look like as well- mine are negligible- a dump of a few GB of photos after holidays, other records are tiny.

Glacier pricing in us-east is .0004 vs .0005 for B2. There is always pricing obfuscation with cloud, but AFAICT, there is no need to move off Glacier for a backup use-case.


My two cents is that there is no reason to _use_ Glacier as a backup strategy. Glacier's cost come for restoration: the more you restore and the faster you want to restore it, the more quickly costs rise. It's far better suited for a collection where you're pretty much most of it will never be restored but what you'll need to restore you don't know. Think video, art, music assets for projects. B2's retrieval is far, far lower cost and completely immediate for restoring an entire backup back to a server. If you're not careful that extra .5 cents you save will really cost you on a full restore.


If you can wait a few hours, the better comparison is probably Glacier Deep Archive, which is not $4/TB/month but $1/TB/month.

Amazon wants to charge you $90/TB* to get data to the outside world, compared to B2's $10, but you can mitigate it in various ways. At the low end that's using a lightsail instance as a VPN, depending whether you think the TOS allows that. At the high end it's paying flexify.io $40 to move your data to B2, then paying B2 $10.

There might be other ways to improve S3 egress costs. It's a very hard thing to search for. I only learned about flexify from this post.

So if you have to restore less than half of your data each year, Glacier Deep storage will save you money. It's worth considering, unlike normal Glacier which is almost entirely downside.

* There's also a $2.50/TB fee to get things out of Glacier, but that's dwarfed by the other costs.


Thanks for making me aware of this- I missed the Deep Glacier announcement last year it seems. Glacier is already cost effective as is, but this will make it even cheaper!

I use this as an offsite backup- that as long as disaster does not strike, I will never use, and even if does, I can be patient about restoring.


With backups, my assumption is that you will never need to restore except in real "oh shit" moments- fire, electrical system zapping, etc- and if those moments happen, you are willing to pay. I have been using synology for 13 years now, and have thankfully never had to resort to using backups.

This is for home use- which maybe I was wrongly assuming that most Synology users are. In a business context you might have a more routine need to recover backups and those costs become more tangible.


It means exactly that, congratulations!


Great. How about rsync.net-compatible (i.e. bog-standard, vendor-neutral) "APIs"?


Yev here -> Anyone can write to the B2 Native API or our S3 Compatible API, we have tons of integrations that do it, here's a list -> https://www.backblaze.com/b2/integrations.html


I don't think that's what gp meant. Why not use a standard protcol like SFTP?


I don't see how you could cram all of S3's functionality into sftp? How would you configure a lifecycle policy for a file for example? Or generate a signed URL?

It seems to me you would only get a very narrow subset of the functionality.


I see, I suppose there wasn't a free standard before and S3's API just inofficially became one.


Because the S3 API has become a more popular standard.


what python blob storage libraries are using "SFTP"? How is that the "standard"?

I literally have NEVER seen SFTP being used for blob storage in any python project - is this a real thing somewhere?


I've looked at B2 from time to time, but doing database blob storage over S3 or to disk and backing up database and files over rsync made us stick to our existing technology (eg Transip cloud storage which also charged 10 EUR/month per 2TB). One thing we didn't look forward to was having to reimplement cataloging and garbage collection for all of disk, S3 and B2, so we just stuck to a rsync hardlinking solution (which makes incremental backups painless)

having access to primary storage and cheap backup storage using the same S3 API will make us reconsider that and will probably make it worth the effort to dump our rsync-based solution for B2.


I would absolutely love to replace my use of S3 with B2 as a backup for data stored elsewhere. Personally, I would much rather this storage to to a service that only does storage, rather than everything else that AWS does, so I don't have to worry about anything strange happening in a cloud service I don't use every day.

When they first launched B2, I inquired about ability to enter into a BAA (Business Associates Agreement) for HIPAA compliance and was told that it wasn't "on the roadmap". It sounds like B2 has come a long way on the compliance side. Would be great if they were open to this.


Yev here -> Double good news for you this morning: we're now signing BAAs for B2 Cloud Storage ;-) Just contact sales and they can get you sorted out!


That's great to hear! I'll definitely be reaching out.


Actually excited by this. I was benchmarking S3 vs B2 vs others 2 years ago and I had to give up on B2 because its implementation for performance was so much more difficult. (88 lines vs 36 lines for all others in Ruby)


So how many times a month do you have to implement this to be reasonable compared to the cost of the storage?


This is not an implementation cost issue.

It was just super hard to make the code perform well. Like you have to manage client sessions on your side and chose optimizations on your side. Like you have to spread things manually. Which is hard to do. Whereas S3 is maximising your bandwidth with no custom code required. It's not really S3 compatibility that was needed but B2 API wasn't good.


HashBackup (author here) was one of the 1st if not the first B2 integration. I didn't find the B2 API any more difficult to use than the S3 API. It has the same functionality with similar kinds of API requests. The only significant difference is that you have to request an upload URL and download URL, and requests can sometimes return a code to get a new URL when a vault is full or overloaded.

There is a price/performance trade-off: B2 has higher request latency than S3, no matter where you are (my experience), but they also are 5x cheaper on storage costs, 10x cheaper on download bandwidth, and have no price gimmicks like minimum object sizes or minimum object lifetimes like many other services (S3 IA for example).

To make up for B2's request latency it is more important to issue requests from multiple threads, especially for short-running requests like removing files.

Another key difference is that B2 always uses SSL whereas S3 can be accessed without SSL with little security impact because each S3 request is individually signed with a secret key. Setting up an SSL connection is more overhead, so another key to performance is to reuse connections.

Both of these suggestions apply to S3 as well, just more to B2 because of the latency difference.


Yev from Backblaze here -> Glad that you took a look at us in the past, excited that this announcement will warrant another look! :)


Can Amazon actually patent their API (per the google vs oracle case) - basically like prevent other vendors to provide S3 APIs so that Amazon can lock in users.

I am not a lawyer. So this is a genuine / dumb question.


Disclaimer: I work at Backblaze. I'm also not a lawyer. :-)

> Can Amazon actually patent their API (per the google vs oracle case) - basically like prevent other vendors to provide S3 APIs so that Amazon can lock in users.

Most likely yes. Backblaze plans going forward are to fully, uncompromisingly maintain our original native B2 APIs for a few reasons including this concern.

It's probably up to Amazon whether they want to boot all 3rd parties off their S3 API. Backblaze has a viable fallback if that occurs. I hope for customer's sake Amazon doesn't declare war in that fashion.

If Amazon decides on this path, internally at Backblaze we have discussed immediately doing the opposite - declaring for all of time anybody can copy our B2 APIs. Remember, our APIs are technically superior to the S3 APIs. They are lower cost to implement, and are shockingly easier to use for developers. They don't make all the mistakes S3 made. We had the luxury of learning from all their mistakes over the years. :-)


This is kind of scary that companies can patent an interface.

So google cloud is actually expected to also have this potential legal time bomb?

Amazon can sue you and retroactively force you to pay them right? So all they need to do is to wait for the alternatives to become popular


thats very short term thinking though. you win the battle but lose the war by being so partner-hostile. amazon has thousands of partners pay to join it at re:invent for a reason.


I suspect a move like that would trigger a cloudpocalpyse that would actually be beneficial in the mid- and long-term.


It reminds me a moment three years ago, when I asked Dropbox to make their API similar to Google Drive, as they basically provide the same service. https://github.com/dropbox/dropbox-api-spec/issues/3#issueco...

It is just awful to see, how everyone tries to reinvent the wheel and not to be compatible with anyone else.


Fear of lawsuits related to copying APIs may also be a factor.

See https://en.wikipedia.org/wiki/Google_v._Oracle_America


Backblaze is clearly violating Oracle's copyrighted copy of Amazon's S3 API: https://docs.cloud.oracle.com/en-us/iaas/Content/Object/Task...


This is great news... there are lots more good clients for S3 than B2, and implementing one is less than trivial because of some special considerations B2 had in the beginning (namely: uploading directly to a pod.)

I see this isn't available for old buckets, is there a straightforward way to duplicate a bucket to make it compatible or do you have to use something like rclone?


(backblaze ceo here) Yes, easy to move the data to a compatible bucket using our B2 CLI or Transmit: https://help.backblaze.com/hc/en-us/articles/360047120614-Ho...


I'm curious what their load balancing layer looks like. There's alot of interesting options. (Disclaimer: I've worked in the CDN and the storage space in the past)

If their load balancer is smart enough it can call the dispatcher, and make use of something like https://zaiste.net/nginx_x_accel_header/ to figure out where to forward the request. Unfortunately this still requires uploads be proxied through the dispatcher.

You could get crazy and involve a CDN (akamai or cloudflare or fastly) that could do some smart logic, especially if you can emit your dispatcher as a lookup table that's updated frequently. I don't know what bandwidth costs would be for that though. Probably high.

It's an interesting problem space and I'd love to talk to these folks about it.


Hi! Backblaze employee who did some of the LB stuff here. It's relatively standard/straightforward. There's a L4 load balancing layer using IPVS and ECMP-via-BGP, then a custom application that does the actual proxying/forwarding to the appropriate vault.


This is great. Their current API requires you to identify a unique host to send data to, so you’re constantly performing a metric ton of DNS queries. Until I white listed the base domain it was the #1 client of my Pihole installation by multiple orders of magnitude.


The DNS resolver library of your client is allowed to cache the IP address for a given hostname for up to TTL. If it does so, the cost should be negligible.


Disclaimer: I work at Backblaze.

> The DNS resolver library of your client is allowed to cache the IP address for a given hostname for up to TTL

Not only that, but one mistake a lot of developers made early on was asking for a location to upload for every upload. That was NEVER the intention. In fact that annoys our servers also.

Developers are supposed to request a location to upload ONCE, and then upload to that location for hours, or even DAYS. Unless you have a bug in that software, it really shouldn't come anywhere close to being a high runner in DNS. We're talking 9 or 10 requests per day, at most, if you are unlucky. Feel free to reach out to our support if you aren't seeing that!


Although that implies that you need to cache that location to upload somewhere and request it from your own cache. Also, you need to write error-handling code (which will only fire rarely, so is hard to test) to deal with redirects or fall back to re-fetching the location if it changes.

These things are not particularly difficult, but they require additional mind-space to accommodate. Most developers will just do the simplest thing that works, performance be damned. If the simplest path is slow, they'll just remember "B2 is slow", no matter how unfair that is.


If you can use one of these arbitrary domain names for hours or days ... why wouldn't you just handle it on your end and provide the public with a single domain?


> why wouldn't you just handle it on your end and provide the public with a single domain?

That is what we did for the S3 protocol. It adds cost via a load balancer.

The whole original storage design was based on the fact that in our original product line (Backblaze Personal Backup) we owned both ends of the protocol - our servers on the back end, and our client on the customer laptop. We were able to eliminate all load balancers from our datacenter by being a little tiny bit more intelligence in the client application (maybe 50 lines of code). The client asks the central server where there is some free space. The server tells it. Then the client "hangs up" and calls the storage vault directly, no load balancer required! Then the client uploads as long as that storage vault does not fill up or crash. If the storage vault crashes, or is taken offline, or fills up all the spare space it has, the client is responsible to go back and ask the central server for a NEW location. This fault tolerance step in the client ENTIRELY eliminates load balancers! Normally you need an array of servers and a load balancer to accept uploads, because what if one of the array of servers crashed, had a bad power supply, or needed to update the OS? The load balancer "fixes that" for you by load balancing to another server. Pushing the intelligence down into the client saved us money. Nobody ever noticed or cared because our programmers could write the extra 50 lines of code, to save the $1 million worth of F5 load balancers (or whatever solution Amazon S3 has).

We based our original B2 api protocols on this cost savings and higher reliability, but it does push the 50 lines of code logic down to the client. It caused a lot of developers this extreme, extreme angst. They just couldn't imagine a world where their code had to handle upload failures and retries. They would ask us "how many retries should we try before we just fail to backup"? Should I try 2 retries, or 3 before the backup entirely fails and the customer loses data? Our client guys had a whole different approach, since it was a computer we just went ahead and tried FOREVER. Never endingly, until the end of time, in an automated fashion. A couple times a year one client gets unlucky and it requires several round trips before getting a vault to upload to, but who cares? It's a computer, it can retry forever. It never gets tired, never gives up.

But S3 never figured this out, and they require the one upload point have "high availability". It saves any app developers about 50 lines of code and a lot of angst, but then we (Backblaze) has to purchase a big expensive load balancer, or build our own. We mostly built our own.


(I work at Google on Cloud Storage.)

Developers working with cloud storage APIs generally need to get used to the idea that not everything is going to work all of the time. Retries and proper status code/error handling are critical to making your application work properly in real-world conditions, and as "events" occur. Every major cloud storage provider has circumstances under which developers must retry to create reliable applications; Backblaze is no different. For GCS, we document truncated exponential backoff as the preferred strategy [1].

Google has its Global Service Load Balancer (GSLB) [2], which handles...let's just say an enormous amount of traffic. GSLB is just part of the ecosystem at Google.

It's hard to design a storage system that's "all things to all people"! There are a series of tradeoffs that need to be made. Backblaze optimizes for keeping storage costs as low as possible for large objects. There are other dimensions that customers are willing to pay for.

[1] https://cloud.google.com/storage/docs/exponential-backoff [2] https://landing.google.com/sre/sre-book/chapters/production-...


(I work at Google on Cloud Storage).

"Developers are supposed to" is a scary way to start a sentence. ;)


Yev here -> Glad you're excited and that this will make things a bit easier for you!


That's awesome, but I really want to see lightning fast response times and TTFB... Second pain point is the number of retries needed for uploading a large batch of small files. Those are the main reasons I'm still considering migrating away. I really wish I shouldn't as otherwise I love the pricing and the philosophy.

Edit: also think DigitalOcean Spaces and B2 might be better off merging together, or Spaces being a whitelabel B2 in disguise (both are part of BWA).


Disclaimer: I work at Backblaze.

> I really want to see lightning fast response times and TTFB (Time To First Byte Served)

If a file is "cold" (nobody has requested it in the last 24 hours) then it needs to be reconstructed from the Backblaze Vaults and there is a little delay. After that, it should serve pretty fast for the following requests (off of a caching layer with SSDs).

In the end, Backblaze B2 is a good solution for some customers, and not ideal for others. If your application requires blinding speed, like sub 1 millisecond serve times, Backblaze B2 may not be perfect for you. But how often is that the case? Certainly not when fetching a web page, or storing a backup for a year, right? In those cases a small delay is FINE. This is an example web page served by Backblaze B2 here, how does it load for you? https://f001.backblazeb2.com/file/ski-epic-c/full/2015_scotl... Fast? Slow? How is it?

For comparison, my regular hosting provider serving the same web page here: https://www.ski-epic.com/2015_scotland_will_macdonald_birthd...

Personally I can't tell any difference. I still look silly in a kilt in both versions. :-)

> Second pain point is the number of retries needed for uploading a large batch of small files.

It really shouldn't take any retries, or geez, at VERY MOST something like less than 1% - why is that an issue? Software should handle the tiny failure rate. I'm honestly curious, we want to know why people aren't choosing our solution!!


I understand you're probably not in a position to say anything about it, but I'd love to see the "little delay" when reconstructing a file qualified somewhat. Are we talking < 5s or < 10s? What do the percentiles for restore latency look like? How does file size play into it? This, for me, is one of the biggest unknowns right now since it's not easy to create a test benchmark for this case (i.e. upload a bunch of stuff and let it sit idle for at least 24 hours, hoping it will be expired from the caching layer).


> I'd love to see the "little delay" when reconstructing a file qualified somewhat. Are we talking < 5s or < 10s?

I asked the engineers that work on that code, and they pulled a random sample from the logs (we time all of this) and said for files less than 1 MByte, it averaged around 250 milliseconds to reconstruct the file from the Backblaze Vault and get it onto the cache servers where it is then served up. In 95% of requests completed within 900 milliseconds, but there were a few up over 1 second (1.2 seconds was the highest they found). Those are live production numbers so it includes all the load on those Vaults.

A couple other notes just to add color. Any one Backblaze account is bound for life to what we call a "cluster", for example there is one cluster in Europe so all files are stored in Europe for any account in Europe. There is a load balanced array of "cache servers" in front of all the vaults specific to that cluster (the caching servers are physically located close to the vaults for latency reasons), and our biggest cluster has something like 20 of these SSD based caching servers. Ok, so the cache layer is not "shared", meaning each cache server only pulls directly from the Backblaze Vault. So if you were serving a file, and 20 separate customers got amazingly unlucky, the file would get the 250 millisecond lag every time for those first 20 fetches. The cool parts of this architecture is that then you have 20 populated caches that are completely unrelated to each other so you have 20x the bandwidth available to serve it up (and a rack of really fast 20 servers to serve it). Plus they are all totally independent so they can crash or be brought offline to upgrade the software without any downtime.

We can add these cache machines as we need them, they are these 1U units and we have "warm spares" for a variety of things. When we have had spikes in load in the past we toss some hardware at it pretty fast.


When I tested B2 a month or so ago, I was seeing pretty frequent failure rates. Granted, I was on a free tier, and it was probably around the time that this whole COVID thing started spiking traffic everywhere...


> Second pain point is the number of retries needed for uploading a large batch of small files.

We upload a few hundred GiB to B2 daily and have this issue as well. Really annoying…


Okay dumb it down for Monday Me. Does this mean I can read from and write to my B2 storage using AWS S3 libraries (like the CLI, Python, and Node libs)?


Yes.


Would S3cmd cli work as well?


I tried s3cmd according to their blog post without success. Just kept complaining that the access key was invalid. Which is a shame because `s3cmd` is much easier to use than `b2`.


Yev here -> make sure you ping our b2feedback@backblaze.com address and let us know about that experience, we're writing it all down and keeping tabs on what's not working as intended.


Thanks for following up! I eventually got it working - I'd followed the example too closely and had a rogue `us-west-002` left in the config which broke things because I'm apparently on `us-west-000`. But I'll drop an email anyway because I can't see an easy way to see what region you're in other than visually parsing the endpoint URL.


I have a decent sized music collection consisting a lot of lossless vinyl rips that I've made from my record collection. It totals around 200gigs at the moment but is growing weekly. I've been looking for somewhere to back this all up in the cloud and backblaze is looking most promising at the moment. Anyone here have any thoughts on where I should go with this?


There are lots of tools you can use to back up to Backblaze, including their comsumer backup service.

If you want to sync to B2 specifically with a lightweight tool, check out https://rclone.org/


Rclone is really a fantastic tool, its configurability based on backends allow for amazing combinations!

You can configure any cloud storage backend (B2, S3, GCS ...) and combine it with other utility storage backends, like "crypt" [1], "cache" [2] and "chunker" [3], I highly recommend it to anyone searching for a backup solution.

The only feature I miss from Rclone is automatic directory monitoring and mirroring, which I solved using Syncthing (but forces me to host an additional server for it).

[1] https://rclone.org/crypt/ [2] https://rclone.org/cache/ [3] https://rclone.org/chunker/


Any good tips on a DIY backup setup using something like a home NAS and B2? In particular, something I can use on a load of devices including phones?


restic (if you need deduplication, snapshots, encryption, etc, which it doesn't sound like you do), or rclone, to BackBlaze or rsync.net. I use the latter and am very happy, I've never used the former.


I would recommend restic with B2. Really simple to set up. I use it for nightly homedir backups for multiple machines. Currently at ~400GB and it's costing me < $3/month.


Do you automate your process somehow? I'm trying to develop a strategy where if I open my laptop after a certain time it backups in the background. My go to for this kind of thing was hammerspoon but it won't execute the shell task in the background thread and completely locks my machine. I guess I could cron it?


I use a systemd user service. It sounds like you're on OS X, so you won't have systemd. Cron should work.

Here are my systemd files for reference.

Service:

  [Unit]
  Wants=network-online.target
  After=network-online.target

  [Service]
  Type=simple
  ExecStart=/bin/bash -c "/usr/bin/restic unlock && /usr/bin/restic backup --verbose --one-file-system --exclude-caches --exclude=$XDG_CACHE_HOME %h && /usr/bin/restic forget --keep-last 5 --prune && /usr/bin/restic cache --cleanup --max-age 30"

  [Install]
  WantedBy=default.target
Environment (fill these in with your keys):

  [Service]
  Environment="B2_ACCOUNT_ID="
  Environment="B2_ACCOUNT_KEY="
  Environment="RESTIC_REPOSITORY="
  Environment="RESTIC_PASSWORD="
Timer:

  [Timer]
  OnCalendar=*-*-* 00:30:00
  RandomizedDelaySec=20min

  [Install]
  WantedBy=timers.target


I also use restic and I have a cron script which runs early Monday morning every week. Restic supports auto-pruning, so I have it set up to backup a new snapshot then purge all except the most recent 100 years, 12 months, and 5 weeks.


I've made systemd profiles and timers to do restic backups. It's really clean and works well. Also much better testable than cron. It will most likely also work for more complex scenarios, like you're mentioning.


Any chance you could clue me in on the scripts? Or point me to some resources to learn systemd? I've really disliked cron since having to use it for some auto push CI/CD stuff.


I just set this up last week, and it works well:

https://fedoramagazine.org/automate-backups-with-restic-and-...

edit: one gotcha is that using "user" systemd units, they will only run when the user is logged in. So, for a personal device like a laptop, this is fine, but for a server, it might not do what you're thinking. For the server use case, you probably want to enable linger for that utility user, so that units will run even with no active logged in sessions:

# loginctl enable-linger username


I used blackblaze for years (redundant DB backups) and never once had a problem with it or its API. It's just a great service.


I have been using backblaze to backup our Synology NAS with lot's of personal data, mainly photos and videos.

Have been restoring ponctual data easily. Did not do any major restore yet (hopefully).

I found them really good and cheap. I can only recommend.


My current setup is Freenas VM with 2 3TB ZFS mirrored WD reds. I run restic from those to a 3rd 3TB red over USB. Then I rclone the encrypted restic backup to B2, so it's e2e encrypted from Backblaze's point of view.

I'm considering ditching the VM and using ZFS on linux, and just doing straight rclone to both the USB disk and Backblaze, for the following reasons:

TL;DR - my thread model is essentially me accidentally deleting something, and bitrot, and I think this setup is overkill for that.

* All operations mentioned take forever. Cutting the encryption I think might speed things up a lot (not sure about this, the key question is if the encryption is causing a lot of extra io operations).

* I'm afraid of losing my restic encryption key. I have multiple copies of my keepass file but if syncthing decides to delete them, both of my backups become useless.

* Backblaze has earned my trust, and in the unlikely event someone hacks my data, there's not much valuable there. I would probably switch to encrypting a single directory with the few things I care about being exposed. Even if I keep encrypting the whole thing, I would use something more boring than restic.

* I'm not sure I've ever needed to restore an old version of a file with restic (there may have been one time early on). I think deduplication is overkill for my needs as well. It's cool tech, I just don't think I need the complexity.


This is wonderful news for me. I host a video on demand site Codemy.net and all the original source videos are on backblaze. Originally I had to write a library to connect to the backblaze api. Now I look forward to using the existing aws client libraries, one less thing I have to maintain.


Yev here -> That's awesome to hear! Ease of use is one of the things that we strive for at Backblaze and I'm glad that the S3 Compatible APIs are going to unlock some use-cases and make things easier for people that don't have the bandwidth to maintain different codepaths!


I remember when they had all their servers in one room and the redundancy boiled down to erasure encoding in single servers.

They've been doing incredible work in the open (storage server design, hardware reliability data, etc) and I'm really happy they've grown to where they are today.


I was actually looking at B2 vs S3 literally 2 days ago and went S3 for the universal API. Luckily, it was a personal project and I I can probably migrate everything very quickly. This is a killer feature, and I bet this will convince a lot of people to move to Backblaze.


Given all the Cloudflare discussion - Cloudflare webinar with Backblaze coming up next week: https://www.brighttalk.com/webcast/14807/405472


Just switched to Wasabi last week for better pricing and S3 interface... but great to see this.


Great news. Only a few days ago I was trying to figure out ways to use minio to use backblaze as mattermost cloud storage which needs to be s3 compatible. I expect that will work straight now. Have anyone already tried this integration?


Is there an open source S3-compatible component that I could rollout on prem?


I think this is what you're looking for: https://min.io/


Minio is open source and S3 compatible https://minio.io


Thanks, looks like spinning up a cluster is AGPL, so it's a no go.


Every day I have to use something like 4gb of data to let Backblaze sync. This is despite the fact that I might only have created/changed 100mb worth of files since the previous day's sync.


Is there a cheap s3 compatible service that is less reliable? I dont want to pay for redudancy Eg its my backups I can handle a 3% chance that my data is lost as long as I find out about it.


If you want super cheap, use DreamObjects by DreamHost.


doesn't look cheap at 5x the price of blackblaze. https://www.backblaze.com/b2/cloud-storage-pricing.html


Does this mean I can use awscli to interact with b2 by specifying some backblaze server with --endpoint-url? What is the endpoint I would use?


When you create the bucket it shows a url along with the keys.

Not sure how unique that URL is, looking at the structure it could depend what data centre your bucket gets created in.


Just signed up to try it out - would really like to see some form of two factor auth that isn't SMS based. TOTP and/or FIDO U2F.


I've been using TOTP with B2 for ages. I think SMS is just to set up the account.


Ah, the copy next to "Turn on Two Factor" sure sounded like SMS only, but sure enough, it gives you the option to use TOTP later on. Thanks!


Is B2 suitable for streaming video files? Can I stream the same file to 1,000 viewers at the same time?


(backblaze ceo here) B2 is a great origin store for your video files. If you're streaming to lots of viewers, using a CDN with Backblaze B2 is optimal. We partnered with Cloudflare as a founding member of the Bandwidth Alliance so you can store your videos with B2 and transit them for free to Cloudflare, which can serve to your viewers.


Please correct me if I'm wrong, but my understanding was that Cloudflare should not be used to deliver video files unless using Cloudflare's "stream" product, IE specifically this [1].

[1]: https://community.cloudflare.com/t/cloudflare-how-not-to-vio...


Would love to confirm this.


Did somebody actually tried to use the S3 API? It seems like they are not working for me.


Fantastic news. B2 Storage is one of the most requested backup storage backend for us.


happy customer of backblaze. Love how transparent they are with everything (especially the Harddisk statistics) and how the CEO takes time to respond to a lot questions only confirms how down to earth they are.


Any chance of adding Azure storage compatible APIs in the future?


There is no mention of the durability guarantees that s3 has.


Disclaimer: I work at Backblaze.

> There is no mention of the durability guarantees that s3 has.

I wrote this blog post doing some of our math around this: https://www.backblaze.com/blog/cloud-storage-durability/

But here is the thing: if you value your data, like if you will really go out of business if you lose it, then you should store three copies with AT LEAST two separate vendors. No matter how reliable any one vendor is, "stuff can happen" like your credit card is declined and the vendor deletes all of it.

I would recommend you use two separate vendors like Amazon S3 and Backblaze B2, and use two separate credit cards that expire on different cycles. I believe the credentials for login should be different on those two accounts, and the same one employee shouldn't have the credentials to both. Because one disgruntled employee should NOT have the ability to put you out of business. If you want some other thoughts, here is a blog post Backblaze wrote called the "3-2-1 Backup Strategy": https://www.backblaze.com/blog/the-3-2-1-backup-strategy/


Their durability is eleven 9's - 99.999999999% That's the same as Amazon S3. https://help.backblaze.com/hc/en-us/articles/218485257-B2-Re...


Another step towards Amazon acquiring them.


Oh man, I hope not... I enjoy my independent B2.


Any plans for an Azure Blob API?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: