Backblaze B2 Cloud Storage Now Has S3 Compatible APIs

ksec · on May 4, 2020

Backblaze is also the founding member of Bandwidth Alliance, meaning getting those B2 via Cloudflare is essentially free.

So you are only paying for Storage. ( Correct me if I am on this one )

I wonder why doesn't ALL non HyperScale Cloud Vendors, like Linode and DO provide one click third party backup to B2. You should always store offsite backup somewhere. And B2 is perfect.

atYevP · on May 4, 2020

Yev from Backblaze here -> That's correct. We're a founding member of the Bandwidth Alliance and you can find more info about that on our Blog (https://www.backblaze.com/blog/backblaze-and-cloudflare-part...) or FAQ (https://help.backblaze.com/hc/en-us/articles/217666928-Using...). May the 4th be with you ;-)

drenvuk · on May 4, 2020

Hey I've been wondering this but are the transaction costs also waived or is it just the bandwidth fees? I feel like if both are gone it'd be hard to make a profit...

atYevP · on May 4, 2020

For that integration it's just the download fees. The transactions do still get billed, but many people can stay within the free daily allotment.

q92z8oeif · on May 5, 2020

What's the point of the Alliance in this case?

if I use against-the-alliance corp, i will just pay the storage+download from backblaze, just the same as using in-bed-with-alliance corp.

Or does the alliance will ensure i only get charged for 1download per file per billing cycle, no matter if the CDN cache can't hold all the data and they download it many times? Does the alliance ensure (and more importantly, who monitors) it?

atYevP · on May 5, 2020

Hey good question, it's essentially a peered connection so you're not paying for the transfer between providers, you can learn more here -> https://www.backblaze.com/blog/backblaze-and-cloudflare-part...

TechBro8615 · on May 4, 2020

Where can I find the pricing info for this allotment?

atYevP · on May 4, 2020

All of our pricing is available on the site (no need to call anyone) - B2 Cloud Storage is listed here -> https://www.backblaze.com/b2/cloud-storage-pricing.html. Just scroll down for the transactions!

dxxvi · on May 4, 2020

                                    Storage        Download
    BackBlaze                       .5 cents/GB    1 cent/GB
    S3 (one zone infrequent access) 1 cent/GB      9 cents/GB

I'll transfer my 250GB of videos and images from S3 to BackBlaze some day.

atYevP · on May 4, 2020

Heh, we make it easy! Check out this Flexify partnership -> https://www.backblaze.com/blog/supercharged-data-migration-w...

rossjudson · on May 5, 2020

What does "Charged for any portion of a GB." mean?

KMnO4 · on May 5, 2020

It means 1.01gb, 1.1gb, and 2.0gb are all charged the same

rossjudson · on May 5, 2020

What about 0.1gb, 0.5gb? I guess that's my question. Is this a minimum charge per download?

atYevP · on May 6, 2020

Yev here -> No there's no minimum charges for downloads, it's just $0.01/GB down and $0.005/GB/Mo for storage, but everything is calculated on a byte hour level :)

rossjudson · on May 8, 2020

Thanks! That's where I was curious.

anotheracct_ · on May 4, 2020

Isn't this incompatible with Net Neutrality?

I think Cloudflare already was against net neutrality, but people who believe in that principle might need to avoid Backblaze as well if that is the case.

manigandham · on May 4, 2020

No, this has nothing to do with net neutrality. Your ISP isn't discriminating what data you consume, whether it's from cloudflare or another CDN.

This is about the fact that bandwidth is usually bought by capacity between networks rather than usage and they're passing those savings on to you instead of charging expensive per-GB costs like the major clouds.

anotheracct_ · on May 4, 2020

I'm referring to Cloudflare as the network provider. It seems not-net-neutral to be charged different amounts depending on the relationship between the storage provider and the network provider.

If the argument is a semantic one, that net neutrality is so narrowly defined that it doesn't apply to them as a CDN rather than a conventional ISP, it still stands that the situation is not-net-neutral if we generalize that to include CDNs, which it ought to, since that's what the N stands for.

toast0 · on May 4, 2020

Network neutrality is a difficult concept to apply because its core concept doesn't map well to reality.

The core concept, in my mind, is that all packets should be treated equally, and should cost the customer the same amount per byte.

The problem is that different packets take different network paths and have different costs to your provider. The bandwidth alliance partners are exchanging traffic via peering links, so there's only the fixed costs of equipment and any port charges/interconnection fees to the peering facility. Packets sent to other destinations may pass through a paid transit link and cost the provider. If your provider charges you the same amount for both types of packets, they're not being transparent about their network costs which is bad for you, and is also bad for them, because you may be able to adjust your traffic so more is settlement free and less is paid --- but you won't do that without an incentive.

Of course, if you don't like how Backblaze manages its network, there's a bunch of other network storage vendors who you could switch to (many of which are members of the Bandwidth Alliance). Or, it's not too hard to build a storage box or two and get them into a colocation somewhere.

For your residential ISP though, for most people, if you don't like the network policies, you might have another option, but they will likely have similar policies. You don't have meaningful choice in a single residence, and moving residences to get better choices isn't a meaningful option either. So, network neutrality is a policy to (attempt to) regulate residential ISP behavior, in order to provide a reasonable policy for users. It's still a problem that it doesn't match reality, and it doesn't provide user choice, but it has gotten a lot of support.

I would rather see mandatory line sharing for residential ISPs, it's a lot easier to define and with proper regulation offers a path towards consumer choice, and gives people a way to take action if their provider has poor network policies --- if your provider runs an acceptable last mile service, but provides poor interconnection to the rest of the world, you could become a line sharing partner and provide better interconnection to the world without having to build out an overlay last-mile network (which is incredibly capital intensive and difficult)

fragmede · on May 5, 2020

Is it really that difficult a concept? Comcast charging Netflix for access to Netflix's customers, when Netflix's customers are paying Comcast for Internet access, is bullshit.

Reword it to be generic, and dress it up in more polite language, but that's what it is.

Comcast, nor any other ISP, doesn't get to shake down Netflix or Google or Facebook nor any service providers, big or small, on the Internet.

Sure, behind the scenes there are backbone providers and peering arrangements and BGP, but those are technical details. Network neutrality means as a consumer I pay an ISP and get connected to the Internet.

Whereas Comcast is a cable TV company, the game between companies of consumer chicken is not theoretical, eg

https://www.thedenverchannel.com/news/local-news/why-you-sti...

https://mynbc15.com/news/local/att-and-directv-have-removed-...

https://www.washingtonpost.com/business/cbs-blackout-on-dire...

toast0 · on May 5, 2020

> Is it really that difficult a concept? Comcast charging Netflix for access to Netflix's customers, when Netflix's customers are paying Comcast for Internet access, is bullshit.

I agree, that it's bullshit, but please consider:

A telephone company charges me to have a phone line and per minute to place and receive calls, and charges whoemever I'm conversing with to have a phone line, and to call me.

If Comcast wasn't large enough for Netflix to want peering, Netflix would probably use paid transit to get there (they might even use paid transit that Comcast would also have to pay for).

So, the reason it's bullshit can't be because Netflix has to pay for a service that Comcast is providing; because paying for network services is the norm. The reason also can't be because Comcast is charging both ends of the connection for the service it is providing to both, namely connecting the two parties; because both sides paying for connectivity is the norm in telecommunications.

There's some other reason it's bullshit. A big factor, of course, is that internet video services compete with the ISPs video services, and there's some implicit unfairness there. I would say it's because the residential customer is stuck with a very limited set of choices, and can't pick an ISP that distinguishes itself with open peering. It might be because the norm for internet peering is if there is a significant imbalance in traffic, the party sending the most pays; so the residential ISPs should be paying their residential customers; this one doesn't quite work, because it's not actually a peering relationship, and the norm is transit customers pay the transit provider for traffic in either direction.

> Sure, behind the scenes there are backbone providers and peering arrangements and BGP, but those are technical details. Network neutrality means as a consumer I pay an ISP and get connected to the Internet.

The problem here is there is no "the Internet" to connect to. An ISP needs to connect to substantially all of the other networks; and while a small ISP may simply use a single upstream ISP, where it would be easy for the small ISP to be neutral (bits in = bits out), a larger ISP is going to have a diversity of connections, and that's going to lead to defacto non-neutrality as some connections are bigger than others, and some connections are longer than others, and some connections are more expensive than others.

I fully understand the desire to restrain residential ISPs (and mobile ISPs) from anti-competitive behavior; I guess network neutrality might work for that.

But, when you apply it to other networks, it doesn't seem to make sense. In this case, you have a group of networks that would like to lower costs, both for themselves, and their customers, and they've agreed to settlement free peering, and to also not charge customers for bandwidth on the settlement free links. Network neutrality doesn't really speak towards the interconnection, but says that customers should be charged the same rate for all bandwidth, even when the underlying costs are different. I don't understand why that's a good thing in this case?

fragmede · on May 5, 2020

Mandatory line sharing is an interesting idea. We've tried it before, back in the DSL era (2000s). Not saying we shouldn't try it again, but it only sorta worked. Whether instructions came down from ATT corporate offices or were malice, or incompetence, on the part of locally contracted installing technicians; the shared boxes were the worst of the commons.

If a neighbor got from a rival DSL provider, your Internet was liable to go out until your ISP was able to get a truck out to fix the damage to your connection that the competition's technicians did.

Those laws are still around, but unfortunately, (A)DSL tops out at single-digit megabits. An upgrade from dial-up, but not competitive with todays broadband market for those with other options.

toast0 · on May 5, 2020

I'm pretty aware of the issues with past implementations of line sharing; a competent regulator would be required to iron out these issues. Of course, network neutrality requires a competent regulator as well.

Pricing was a big issue in the past --- where the 'wholesale' per-line tariff charged to the competitive carrier was more than the retail price charged to incumbent customers.

I think, in most cases, new install breaks other user types of issues are related to poor records of which line is used by which customer; that happens within the incumbent as well though (when I got ADSL2 installed, one of my neighbor's connections went out, and pretty soon there were three AT&T trucks on the street to work everything out). That particular issue could probably be solved by allowing the incumbent to manage all the installs and monitoring for install-time customer steering.

The federal mandate no longer covers line sharing, it only covers line leasing for copper telephone service; wherever your premises is directly connected to, if there's space for competitive equipment, the incumbent must make it available. Of course, most telephone companies have moved customers to remote terminals for better speeds, and there's no room for competitive equipment in the remote terminals; and a lot of telephone companies are replacing copper networks with optical networks, and those aren't covered either. The whole concept got submarined by the insistence that it apply to telephone networks and not cable or other "new technology" networks, and then deciding not to apply it anywhere in light of court cases that applying to one and not the other was unfair.

manigandham · on May 4, 2020

Cloudflare doesn't charge for bandwidth, it's as neutral as it gets.

The charges are from the cloud (storage) providers. And most of them charge for bandwidth usage, but they don't discriminate based on what data you send. It doesn't matter if it's source code, photos, or linux binaries so there's no net-neutrality issue here.

You can potentially argue that Backblaze is discounting traffic based on the upstream network but that's not active discrimination. There are hard costs to internet transit and some companies can offer better pricing depending on where the traffic goes. For example, downloading from Australia is more expensive than downloading in America because of the infrastructure costs, and this isn't considered a net-neutrality violation.

khc · on May 4, 2020

cloudflare is not charging different amount for different storage providers, some storage providers are waiving the fee when the upstream is cloudflare.

disclaimer: ex-cloudflare

anotheracct_ · on May 4, 2020

So, Backblaze is not-net-neutral.

atYevP · on May 4, 2020

As far as our service is concerned everyone pays the same price for B2 Cloud Storage, $0.005/GB. We're not a network provider and in this case neither is Cloudflare. We are both service providers. The partnership between us simply allows people to move data between us freely, but you still need to pay Backblaze B2 for the storage of the data, and you need to play Cloudflare for the distribution!

NikolaeVarius · on May 4, 2020

You are ignoring the responses to your original point. This is not relevant to net neutrality.

anotheracct_ · on May 4, 2020

I think I addressed that comment, actually!

atYevP · on May 4, 2020

Yev here -> unsure how those are related, but you certainly don't NEED to use Cloudflare if you're using B2 Cloud Storage as the data origination point, it's just an option that was brought up in the thread!

anotheracct_ · on May 4, 2020

It sounded like the price would be different if you used a certain network provider, which is not-net-neutral.

atYevP · on May 4, 2020

I believe in this case Cloudflare and Backblaze would be considered service providers, not network providers (like Comcast or AT&T, etc...). And you'd still need to pay each service individually for the service they're providing. It's an alternative to say, Amazon S3 and their own CDN which also does not charge for egress in between the two services (since they own both). But in our case, you pay Backblaze B2 Cloud Storage of the storage, and you pay Cloudflare for their CDN capabilities, the partnership simply makes the transfer between our storage and their CDN free.

sp332 · on May 4, 2020

Transit does cost different amounts of money depending on who you negotiate transit with. That's not part of net neutrality.

brainless · on May 5, 2020

Airlines have alliances, it does not stop you from taking any airline you want for your route, but costs may vary.

It is just a property of networks. As an end user you still get to use all providers, but two of them may use a dedicated connection to reduce internal prices (the price benefit of which they may or may not give back to the user).

Neutrality from the point of the user is not affected when it comes to service being available.

Now if one airline alliance said you are not allowed to get on their flight if you hoped off from a rival alliance, then we have an issue.

hangonhn · on May 4, 2020

This is so mind-blowing-ly crazy and awesome. I've always loved the idea of Backblaze but never tried it. But this combination is so insanely good and valuable. When an efficient market/economy works correctly, the results are just amazing: Cloudflare and Backblaze essentially got their unit costs for their respective products so low that for just a small money fee, I can do something that would have been prohibitively expensive maybe just a decade ago.

ksec · on May 4, 2020

On the note of unit cost I doubt it will be much lower in the near future. Cost / GB isn't falling much if at all, and their Load balancer are already eating into their margin which means any further reduction in unit cost are only gaining back what they took in with the S3 API. Hopefully the new S3 compatible API means they could make all that back with volume.

But this is also a sad realisation that storage tech aren't getting cost reduction any further.

( Of course Blackblaze can always prove me wrong :D )

toohotatopic · on May 4, 2020

Wasn't there a startup on hn some weeks ago that used a crypto mining approach to server management that was much cheaper?

ADefenestrator · on May 5, 2020

Disclaimer - Backblaze employee here, but just speaking for myself:

It was.. sort of cheaper. They didn't actually build the servers, and as described the server wouldn't work (onboard SATA didn't support port multipliers, lack of ECC would probably cause problems in practice, bit hand-wavey on power/space/network/manpower costs, etc). The goal of the article was to get other people to build cheap storage and put it up for rent on their network. They do have some amount of storage space available for very cheap on the network now, but personally I suspect it's people who figured "what the heck, I'll give it a try!" as opposed to people actually building storage servers and making a profit renting them out.

I was honestly pretty disappointed - I'd hoped they'd found a cheap motherboard with ECC and support for port multipliers, but nope.

Dylan16807 · on May 5, 2020

With the client responsible for chunking, redundancy, and error correction, I don't think the lack of ECC really matters for their use case. The rest of the issues are more important.

Edit: The motherboard they picked does support ECC memory, anyway. In general ASRock models do.

ADefenestrator · on May 5, 2020

I haven't dug too deeply, but it's unclear whether anything but the Asrock Rack boards have full validated ECC implementations for Ryzen (vs just using the memory but ignoring ECC), and I think that may depend some on which Ryzen CPU is used (maybe PRO-only). I'd love a source of better info there, though.

You're right though - since the client's doing the work and they have a lot of redundancy/diversity in the storage it's not as big of a deal for them as it would be for us. I'd be a bit wary because the client-only verification does mean that there's no verification-with-ECC step in the entire chain, but I'm not sure that's significantly worse in terms of actual risk.

justinclift · on May 5, 2020

How far off do you reckon Backblaze is from designing your own motherboards + electronics, with just the pieces you need? :)

ADefenestrator · on May 5, 2020

Pretty far. Since we cram so many drives into each server, our total server count is actually relatively low for the amount of storage we have. I'm not sure exactly how many units you need to amortize the design costs across to make it worth it for a custom ODM design, but I suspect it's in the tens of thousands.

ksec · on May 4, 2020

Yes and it had little redundancy with limited up time and speed guarantee, with no S3 API compatibility. And if I remember correctly even that was $0.003 per GB, with lots of unit cost that a lot of people including me thought to be miscalculated, where it should be closer to $0.4 for it to be somewhat break even.

So may be it is cheaper, but not really comparable.

Hamuko · on May 4, 2020

How does that work? Can I just dumb two terabytes of video into Backblaze B2, setup a Cloudflare account and have people watch those videos with it costing me only $10 a month? Because that doesn't sound right.

heipei · on May 4, 2020

The Cloudflare ToS explicitly exclude that use-case.

2.8 Limitation on Serving Non-HTML Content The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other functionally equivalent applications and rendering Hypertext Markup Language (HTML) or other functional equivalents. Use of the Service for serving video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.

mikehearn · on May 4, 2020

In a previous HN post [1] where someone wrote up how to use B2 as an image host, the Cloudflare CEO chimed in and addressed rule 2.8 specifically, saying if Cloudflare workers are used for URL prettifying and redirecting, a different ToS is applied and that use-case would be fine.

Does that mean your video use-case would also be fine? I have no idea. An HN comment from the CEO doesn't seem like it would hold up if Cloudflare suddenly shut down your free account.

I'd love for Cloudfront to officially clarify the limits of the Cloudfront/B2 alliance in terms of external traffic. The confounding issue here is that B2, as a storage service, is not really intended for "serving web pages and websites" — it's for larger files, binaries, etc. — and therefore any traffic from B2 going through Cloudfront is sort of de facto in violation of 2.8.

[1]: https://news.ycombinator.com/item?id=20790857

brianwski · on May 4, 2020

Disclaimer: I work at Backblaze so I'm biased. :-)

> B2, as a storage service, is not really intended for "serving web pages and websites" — it's for larger files, binaries, etc

It might be missing a couple features (which is a pet peeve of mine) but we SURELY intend for it to be used for serving web pages. That's one of the largest differences between "Backblaze Personal Backup" (our original product line) and Backblaze B2. The largest parts of the redesign/refit when we originally did B2 was around the concept of what we call "Friendly URLs" (web page names, folder names) instead of just ugly 82 character hexadecimal file names like Backblaze Personal Backup stores all your files in.

For full disclosure, Backblaze B2 isn't a great "hosting" solution for something like WordPress because we lack two or three things, one of which is comically easy to fix and I keep trying to convince everyone to do it. The issue is URLs that end in a "/" (trailing slash) basically need to "guess" that after that is an ".html" or ".php" or whatever. So the URL: https://f001.backblazeb2.com/file/ski-epic-c/full/2015_scotl... does not work, but the URL: https://f001.backblazeb2.com/file/ski-epic-c/full/2015_scotl... does work. All modern web servers do this automatically filling in of the "index.html", but it is missing from Backblaze B2 currently. And it would take just a day or two for one of our developers to fix it. And dang it, I'm going to get it done one of these days.

john-shaffer · on May 4, 2020

S3's use of separate servers for website hosting is actually very sensible. Options like usage of index.html and error.html only apply on the website servers and won't cause any surprises for people using the service as a key-object store.

That said, I would absolutely not consider using B2 without support for index.html, error.html, and Website-Redirect-Location.

erichocean · on May 5, 2020

You can trivially use Cloudflare Workers to implement that functionality on top of B2.

mahesh_rm · on May 4, 2020

Please be inspired by how Netlify handles url rewriting through their netlify.toml file, in doing so, if it will be possible. :-)

Dylan16807 · on May 5, 2020

Sounds like workers are fine in general, especially on the non-free tier that only increases the budget by $5 a month.

That covers 10 million requests, which could each be 10MB chunks, or 512MB cached files, or possibly larger.

(A raw mp4 needs about 3 requests to start plus one per seek.)

But I wouldn't be surprised if that doesn't scale to enormous amounts of data.

judge2020 · on May 4, 2020

To clarify, the key line is "a disproportionate amount of non-HTML content". Serving videos isn't prohibited but CF still has to prevent free plan websites from becoming huge loss-leaders.

Another big asterisk is that this applies to all proxied (DDOS-protected) content, not just the ones that use the CF cache. CF pays for all of their uplink bandwidth out-of-pocket regardless of if it's cached. You can see what happens when you proxy multiple terabytes of content on the free plan in this thread[0] (again all proxied bandwidth costs CF, which is why this user had their zone unproxied).

0: https://community.cloudflare.com/t/the-way-you-handle-bandwi...

DoctorOW · on May 4, 2020

From what I can tell that's more about their caching service. It mentions exceptions for hosting from a paid service and correct me if I'm wrong but wouldn't Backblaze be that paid service?

jjeaff · on May 4, 2020

They are talking about their own paid service, obviously. I don't think they care whether you are paying for someone else's product or not.

celicaraptor · on May 4, 2020

Perhaps they are talking about a paid service that they offer,like Cloudflare Stream https://www.cloudflare.com/products/cloudflare-stream/

anon102010 · on May 4, 2020

This is huge if true - folks keep on saying Cloudflare can replace cloudfront for free. If we could put 20TB of binary content for software distro / videos etc on B2, then stream by cloudflare I'd actually believe the claims.

Every time we've tried this with other "free" providers there always seems to be "fine print" when you start pushing huge bandwidth on these "unlimited" plans. It seriously is not worth the time at some point.

Dylan16807 · on May 4, 2020

Cloudflare's main service excludes video.

tpmx · on May 4, 2020

This is key for many.

_ugfj · on May 4, 2020

The way I understand, you can do this with images, eastdakota confirmed that here earlier https://news.ycombinator.com/item?id=20791660 not sure about videos.

celicaraptor · on May 4, 2020

This thread might help https://news.ycombinator.com/item?id=20790857

varikin · on May 4, 2020

I believe you still pay Cloudflare costs, but the traffic between Cloudflare and B2 is free on both platforms. But it might be worth double-checking the fine print.

Hamuko · on May 4, 2020

Isn't the Cloudflare CDN included in the free plan as well?

georgyo · on May 4, 2020

It is, but they cap out the file size they will cache on the free plan to be 512MB. So you would need to chunk up your videos to get free bandwidth.

derefr · on May 4, 2020

> So you would need to chunk up your videos to get free bandwidth.

Having a functional video player on your site or in your app (e.g. one where you can skip to arbitrary times without requiring the video be buffered up to that point; or where a video can be "resumed" from the middle if you leave it and come back) already requires that you use MPEG-DASH or HLS; which in turn implies/necessitates pre-chunking, no?

Is there some use-case where people are currently serving 400MB contiguous video files from a CDN? I can't think of one. YouTube doesn't. Netflix doesn't. Even porn sites don't.

I guess Archive.org has some large video files on there in various places, that can be direct-downloaded; but the recommendation Archive.org itself makes, is to consume those via BitTorrent. Presumably they don't have a CDN partner willing to handle their unique workload for cheap.

jcampbell1 · on May 4, 2020

You don’t need DASH or HLS to seek to an arbitrary point quickly. The mp4 container format has an index which stores playhead time -> byte. This index is either at the end of the file or beginning, and tools like qt-faststart move the index to the beginning, which makes videos start much quicker when serving over http. Browsers will use the index to issue range get requests an be able to seek just fine.

Serving a 400mb video via a CDN is highly dependent on the CDN. Some will construct a cache entry from a slop of range get requests and translate them to get the missing pieces and work brilliantly and other CDNs should be avoided.

_ugfj · on May 4, 2020

ffmpeg.exe -i input.mp4 -c copy -movflags faststart -y output.mp4

Dylan16807 · on May 4, 2020

> Having a functional video player on your site or in your app (e.g. one where you can skip to arbitrary times without requiring the video be buffered up to that point; or where a video can be "resumed" from the middle if you leave it and come back) already requires that you use MPEG-DASH or HLS; which in turn implies/necessitates pre-chunking, no?

Browsers are smart. They only buffer a few megabytes at a time and can seek around pretty efficiently.

derefr · on May 4, 2020

That requires the video to be encoded in a way where you can just start reading the stream from any random byte offset, and everything will still work. Video files are not usually encoded this way (any more.) Resume an MP4 or MKV video half-way through, without reading the TOC-ish stuff from the first chunk, and you'll get garbage that maybe resyncs after 20 seconds.

It's totally possible to "encode for streaming", but it usually results in both an increase in overhead [more keyframes] and a decrease in quality [inability to use predictive interpolation, instead relying only on forward-interpolation.]

Mind you, this streaming-enabled encoding is how things were done on the web, before the advent of MPEG-DASH/HLS; and it's still how e.g. the MP2 encoding of digital cable/satellite video works. But we don't really want to go back to those days. They kind of sucked.

Jumping to random byte offsets in a video also tends to screw with any embedded data streams like subtitles or thumbnails, which tend to just be stored in most media container formats as a single chunk at the beginning/end of the file, rather than being spread or copied across the stream. Again, the kind of captioning done back in the MP2 days is immune to this, but it kind of sucked as well (e.g. it wouldn't trigger if you happened to skip to the millisecond after the instruction for it appeared in the stream, often leaving you with ~30 seconds of untranslated audio.)

janjanson · on May 5, 2020

I don't think this is quite right. If you serve an mp4 with H.264 video statically on any basic webserver it will just work in browser through plain HTTP, without need need for MPEG-DASH/HLS. Every widely used mediaplayer/browser just downloads the nearest chunk (keyframe point) behind the time that was seeked to can resume the playback from there. This point is found through an index stored in a container format. For basically every video format these days (say, at least as new as H.264), regular settings make this only a few more seconds of video to download and decode before the seeked point and basically happens instantly for normal online consumption. In H.264 forward prediction (through two-pass encoding) will playback fine too.

I think what you're saying applies more to a setting where the video is being streamed live, so that you cannot access the start of the file to get keyframe metadata. In that case HLS and MPEG-DASH help.

apitman · on May 4, 2020

That's odd. I can't recall ever having a problem with browsers playing mp4s in a vanilla <video> tag, as long as I encode them in the main h264 profile, AAC audio, and MOOV atom at the front (see [0] for ffmpeg command). Obviously the server has to support Range: byte requests.

My impression is DASH/HLS are mostly useful for adjusting bitrate on the fly.

[0]: https://superuser.com/a/438471/402047

sp332 · on May 4, 2020

When the parent talks about jumping to random byte offsets, they mean you don't have the first part of the file at all. You just have an arbitrary 512-MB chunk out of the middle.

Dylan16807 · on May 4, 2020

But they were claiming that a single monolithic file would break too, which is not the case. The browser does a range request to get the first part, then a range request to get the part you're playing, and it works.

derefr · on May 4, 2020

Right. Like tuning into a digital-cable signal "in the middle"†. You just get bytes of the stream starting from an arbitrary offset, without having seen/processed anything before that (and without even being able to request anything before that), and you need to resynchronize from what you've got.

† I mean, a digital-cable video stream is always "in the middle" unless you're just starting a VOD stream, but still.

Dylan16807 · on May 4, 2020

The browser just reads the TOC-ish stuff from the first chunk. Trust me, it works. I regularly load plain old multi-hundred-megabyte mp4s in my browser, off a web server, and skip around without problems. The default keyframe interval from x264 is fine. You don't have to do any horrible things to the encoding, you just have to start loading a few seconds before the seek point. Which the browser does automatically.

Do browsers even support embedded subtitles?

shockinglytrue · on May 4, 2020

CloudFlare's free plan can and does end at any moment, I wouldn't rely on it for any serious application

hartator · on May 4, 2020

Curious as well.

rochester46 · on May 4, 2020

Does AWS (and GCP and Azure) not do the exact same thing when transferring data out via Cloudfront (or their respective CDN)? Is Backblaze really special in that regard?

budmang · on May 4, 2020

(backblaze ceo here) AWS is free to Cloudfront because that's an AWS service. AWS/GCP/Azure are not free (and quite expensive) to Cloudflare. Backblaze is free to Cloudflare.

microcolonel · on May 4, 2020

And Cloudflare is also the winning reverse proxy caching provider, the one that people tend to use even despite transit charges to AWS/GCP/Azure.

bvivek77 · on May 5, 2020

Bit confusing. Need more clarity. Read 10 times. Pls recheck mr. ceo :)

wrkronmiller · on May 4, 2020

Cloudfront costs a non-trivial amount of money per GB: https://aws.amazon.com/cloudfront/pricing/

rochester46 · on May 4, 2020

How does that compare to Cloudflare? Is Cloudflare completely free for CDN? I can't find a clear pricing page.

ksec · on May 4, 2020

https://www.cloudflare.com/plans/

Cloudflare dont charge per GB. They are more like per "feature" CDN.

waffle_ss · on May 4, 2020

Just a year ago B2 couldn't do server-side file copying.[1] If you wanted to rename or move a file you had to re-upload the whole thing (not great for large multi-gigabyte files)! That ruled them out of consideration for storing my personal backups.

Glad to see they've since fixed that, and with this update are clearly continuing to improve ergonomics. I'll have to give B2 a fresh look.

[1]: https://github.com/Backblaze/B2_Command_Line_Tool/issues/175

atYevP · on May 4, 2020

Yev here -> thanks! We're constantly working on making the platform better, and copy-file was definitely a widely requested feature! That plus S3 Compatibility, for folks who wanted to integrated with B2 Cloud Storage but didn't have the resources to write code to our B2 Native API.

rarrrrrr · on May 4, 2020

I've been using B2 to disrupt a bunch of ugly & entrenched vendors in the price sensitive K12 market. Thanks for building it. :)

atYevP · on May 4, 2020

Ah that's awesome! I'd love to more know! How are you using B2 in general, and does this make it easier for you? If you want to leave a note here, or you can send it to: b2feedback@backblaze.com!

ajinkyapatil · on May 4, 2020

are there any plans to host public datasets like aws pds ?

budmang · on May 4, 2020

Can you share what you do with B2? (Are you hosting educational content? Something else?)

shliachtx · on May 4, 2020

I just finished writing a new feature for internal use using S3. I hadn't even considered Backblaze at the time, but now that I am seeing this news we may end up using it, considering there is virtually no cost (to switch) and we haven't deployed yet.

nicolaslem · on May 4, 2020

For backups, mapping one file to one object rarely works well. Tools that use this strategy come with a long list of scenarios requiring expensive operations. For instance renaming a directory or changing a few bytes in a large file.

On the other hand tools that don't view the object storage as a file system have way less gotchas.

In my experience B2 + restic works really well.

waffle_ss · on May 4, 2020

That's a fair point but this is for my family photos and videos; in the event I die I'd rather the S3 bucket I hand over to my wife/kids in my last wishes look like a real filesystem rather than a bazillion blobs that require a special tool with a programmer's expertise to reassemble

rsync · on May 4, 2020

" ... in the event I die I'd rather the S3 bucket I hand over to my wife/kids in my last wishes look like a real filesystem rather than a bazillion blobs that require a special tool ..."

You'd need a cloud storage provider that just gave you a plain old UNIX filesystem to do whatever you want with.

It's too bad nobody does that ...

apitman · on May 4, 2020

I was just about to pull out my rsync.net pitchfork before I saw the username. You should throwaway next time so as to not deprive me of the pleasure.

PhantomGremlin · on May 5, 2020

You'd need a cloud storage provider that just gave you a plain old UNIX filesystem to do whatever you want with.

Doesn't iCloud Drive fit the bill?

Access to it is slightly obscure, i.e. ~/Library/Mobile Documents/com~apple~CloudDocs/ but wouldn't that work?

It's free for me to use. That's because I'm already paying Apple $10/mo to backup the family's iPhones. We're only using a little over 200 GB out of the 2000 GB we have. (I'm sure that Apple is counting on most people not using their full amount).

I've only put a few files out there, so maybe there are a lot of potential pitfalls. But it doesn't get much simpler than using cp or mv.

In reality it's most emphatically not a "plain old UNIX filesystem". Apple is doing some magic and storing blobs out in Amazon S3 or in their own datacenters. But to me it has the appearance of a Unix (Posix?) filesystem.

I realize that rsync.net couldn't survive with a business model that limits users to 2000 GB, which is Apple's maximum. But I thought I'd mention it, since it just might be the perfect "free" solution for a lot of people.

icebraining · on May 5, 2020

Wouldn't that require his wife to install and configure an SFTP client or something? B2 lets you use any browser.

waffle_ss · on May 5, 2020

It's too bad you guys cost ~2x as much for storage as S3 when I evaluated you in 2018... ($0.04/GB vs $0.023/GB) ;) Glad to see you're beating S3 in $/GB now!

sime2009 · on May 4, 2020

May I ask which solution you ended up going for in this situation? (I'm trying to solve for myself too.)

ac29 · on May 4, 2020

rclone is a nice way to sync files to B2. It also has a mount option that quite literally mounts the cloud storage as a filesystem, though this requires Linux and a little bit of know how. However, in a pinch B2 has a serviceable web interface, and Backblaze will even ship you a drive of your files if you request it, so I think it would be pretty usable by just about anyone.

waffle_ss · on May 5, 2020

I have a local ZFS pool of hard drives and a script that `rsync -avz`s my iPhone's photos+videos to it. Then a separate cron script that periodically syncs from the ZFS pool to my S3 bucket using `aws s3 sync`. The S3 bucket has versioning turned on so it's effectively append-only.

I used to be able to trigger my iPhone -> ZFS script when I plugged my iPhone into my Ubuntu desktop using udev (also had to wrap it in flock[1] because it would trigger multiple times for some reason), but at some point that stopped working and I've been too lazy to figure out why.

It's far from perfect but for me it works alright. In this scenario I prefer straightforward and slightly kludgy compared to something with hidden complexity that could go wrong in so many ways. Could you imagine if you used a tool like restic or borg and the pack encoding format changed, or if the tool sources are simply gone when your relatives have to figure out how to get at the files in 10, 15, 20 years - I don't want my relatives playing code detective or archaeologist!

Which reminds me of a downside to using tools like restic, borg, and the like I forgot to mention. When I evaluated them for my hundreds of GB of family pics+videos, there is a "dedupe" step that all these tools want to perform. When I tested them a couple years ago they were dog slow for my files, because pics + video are already highly compressed and there is very little "deduping" you're going to wring out of them unless you have multiple copies of the same files. IIRC borg took several hours to run at at the end it reported 0.01% or less deduping efficiency. Also as I recall there was no way to opt-out of the dedupe step due to the way borg stores "packs". Very annoying!

[1]: https://stackoverflow.com/a/169969/215168

guenthert · on May 4, 2020

> For backups, mapping one file to one object rarely works well.

Same for archiving. At my former workplace we however figured that out only after the horse left the barn.

nielsole · on May 4, 2020

Here is the reasoning why they didn't have "s3 compatibility" before: https://www.backblaze.com/blog/design-thinking-b2-apis-the-h...

tambre · on May 4, 2020

>It requires Amazon to have a massive and expensive choke point in their network: load balancers. When a customer tries to upload to S3, she is given a single upload URL to use. For instance, s3.amazonaws.com/<bucketname>.

Now that Amazon has deprecated the single URL version and replaced it with region-specific URLs (e.g. s3.dualstack.us-east-1.amazonaws.com) and tooling has been mostly updated, this huge reason for not supporting the S3 API is gone.

jjeaff · on May 4, 2020

Even though they are using region specific URLs, they would still have to load balance all of that traffic. Backblaze avoided this by having a 2 part request for a file. You would make a request to a centralized url and then that would return a url that connected you directly to a server that had that file.

yjftsjthsd-h · on May 5, 2020

> You would make a request to a centralized url and then that would return a url that connected you directly to a server that had that file.

Heh, kinda like how FTP worked. That's funny to see again.

rzk · on May 4, 2020

So how did they manage to get rid of those hidden costs? Or is the new S3 compatible API more expensive?

brianwski · on May 4, 2020

Disclaimer: I work at Backblaze.

> So how did they manage to get rid of those hidden costs? Or is the new S3 compatible API more expensive?

The new S3 compatible APIs are the same cost as the original native B2 APIs.

I'm the author of that original blog post, and we were able to get rid of SOME of the internal costs, but in the end we ate the cost of the load balancer. Internally I voted that we externalize that cost, but it's in the spirit of our "no-sales-friction" pricing model to get rid of decision points for customers and just let them get their stuff done.

The internal math is that we believe the additional storage that (hopefully) will result from supporting the S3 API will help make up the cost of the upload balancers. It eats into our margin, but not by enough to justify a "friction pain point" that might reduce sales as customers struggle to decide which API to use. When you do the math over thousands of customers, we make the vast majority of money simply from renting the storage. Many many many customers upload things to Backblaze, and let them sit for long enough, that the cost of the load balancers becomes pretty small as a percentage.

One of our goals of the pricing of Backblaze Personal Backup (flat fee of $6/month regardless of how much data you backup) and not charging much for transactions is we aren't trying to nickle and dime customers. We just want to make our margin and provide a solid service that makes customers happy.

apitman · on May 4, 2020

Are you guys apprehensive at all that supporting S3 might bring in more lower margin customers (ie higher transfer GB/store GB ratio)?

In any case, thanks for B2 (happy customer) and good luck. Sounds like an exciting time.

budmang · on May 5, 2020

(backblaze ceo here) We're happy to take customers regardless of which API they choose. Sure, we make a bit more on our B2 Native APIs, but ultimately I'd rather we make it easy for them to use us how they wish.

toomuchtodo · on May 4, 2020

It's possible they're eating the cost of the load balancing necessary to multiplex from their backends to client requests, which should theoretically stoke an increase in business due to reduced switching costs (and what timing, with an economic contraction likely pushing cost reductions at those needing cloud storage).

Disclaimer: Happy Backblaze Mac client and B2 customer, no other affiliation.

EDIT: @yev: I took the signal out after the sibling reply :) Appreciate the responses as always. Please stay awesome.

atYevP · on May 4, 2020

Yev here -> saw my Yev-signal. That's right, Gleb actually answered that question on our blog (https://www.backblaze.com/blog/backblaze-b2-s3-compatible-ap...) but we're eating the cost. Om nom nom.

justinclift · on May 4, 2020

Interesting. Implementation wise, is it some form of Minio gateway + hardware?

budmang · on May 4, 2020

No gateway - native code written from scratch and optimized for our environment...sitting on top of regular, inexpensive hardware.

atYevP · on May 4, 2020

When you say regular, I think you mean awesome Storage Pods that were lovingly purpose-built for affordable and efficient data storage (https://www.backblaze.com/b2/storage-pod.html).

*Edit -> Well, those plus the load balancing servers =D

fthead9 · on May 4, 2020

From Gleb (CEO) comment on the blog. "Yes - B2 is still a more cost efficient API as it allows customers to connect directly to the final storage location. However, we have built a highly cost efficient load balancing system as we always have - using software to optimize inexpensive hardware - and are swallowing the additional costs for our customers."

christophilus · on May 4, 2020

Swank. One of the reasons I'm not using Backblaze is because I couldn't find a way to generate a private url which allowed secure upload from the browser. It only allowed (so far as I can tell) a private url that had access to an entire bucket. If they've got an S3 compatibility layer now, this problem is solved. I'm gonna invest some time on this tomorrow.

jedberg · on May 4, 2020

This is huge because it means you can use things like S3 Fuse to mount your storage. Which means you can use it to extend your local disk, or run your own backups, or whatever.

Amusingly the price to store 1.2TB of data is the same as the cost of their backup plan, so if your disk is smaller than that, you could save a few bucks running your own backups. Until you have to restore (from what I can tell restores are free on their backup plans but would cost money on the S3 plan).

DavideNL · on May 4, 2020

> Which means you can use it to .... or run your own backups

You could, but if i read correctly (s3fs-fuse limitations): "random writes or appends to files require rewriting the entire file".

So changing 1 bit of a 10GB file, means re-uploading 10GB.

https://github.com/s3fs-fuse/s3fs-fuse#limitations

gaul · on May 4, 2020

This changed in 1.86 and I updated the README as follows:

> random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy

Now changing one bit means re-uploading 5 MB, the minimum S3 part size.

tgtweak · on May 5, 2020

Only if the blackblaze implementation supports put byte range... Not supported by default.

mappu · on May 4, 2020

rclone has long had a Fuse mount feature using the original B2 API.

jedberg · on May 4, 2020

Sure, but S3 Fuse is more mature, more stable, more feature complete, has a lot more usage and therefore a lot more visibility for possible bugs, especially corruption bugs.

bithavoc · on May 4, 2020

I migrated a client from Cloudinary($1k+ /mo) to B2, a Go+ImageMagick program running in DigitalOcean and Cloudflare CDN for a total of $60 /mo. It’s been running for two years now, B2 has been incredibly reliable.

atYevP · on May 4, 2020

Yev here -> That's awesome to hear! Glad we can make things more affordable for you and that it's working great!

hemancuso · on May 4, 2020

As a developer that supports B2 (I write ExpanDrive) I think it’s great that they are moving on from an API that doesn’t expose any extra value.

That being said, I wish B2 performance was better. Throughput is dramatically slower than S3.

js4ever · on May 4, 2020

B2 pricing is sooo attractive ... But I have been stopped by the much lower performance compared to S3 ... So unfortunately I can confirm this point :(

christophilus · on May 4, 2020

When you say performance, do you mean upload, download, or both? I can deal with poorer upload performance, and mitigate poorer download performance by leveraging a decent CDN like Cloudflare.

eyegor · on May 4, 2020

What region are you moving from/to? Last I checked, b2 only exists in datacenters on the US west coast.

budmang · on May 4, 2020

(backblaze ceo here) We also have a region in Europe: https://www.backblaze.com/blog/announcing-our-first-european...

karambir · on May 4, 2020

Please consider an Asia/Pacific data center. I am from India and my company was not able to use B2 due to high response times even from European DC. Even a DC in Singapore will be helpful for us.

- Thankful Personal Backup Customer

tgtweak · on May 5, 2020

Bandwidth in asiapac is very expensive for non-incumbents and India is no exception.

I find it bizarre how in India you can get 100GB of LTE for a few dollars but cdn bandwidth can cost content providers more than that - which is absurd.

karambir · on May 5, 2020

Mobile broadband is witnessing intense competition in grabbing customers as millions of rural Indians are coming online. This started with a Petro-chemical billionaire starting Jio Network and giving free unlimited 4G data for a year(his company has 300million subscribers now).

Already 4 networks have exited the market and 3rd & 4th largest networks(Vodafone and Idea) have combined due to cash crunch. Airtel(earlier largest) has been raising outside money in hopes that it can survive the low prices. So there are only 4 networks remaining. Only recently they started increasing prices.

That billionaire is also going into Fibre(purchased his bankrupt bother company's infrastructure), maybe we'll see that competition extend to DC and interconnects.

celicaraptor · on May 4, 2020

They have an Amsterdam Datacenter https://www.backblaze.com/blog/announcing-our-first-european...

hemancuso · on May 4, 2020

It remains the case even if you’re only a few ms away.

ing33k · on May 4, 2020

Used B2 heavily until recently as a origin server for a CDN. Few weeks ago we saw a spike in 502 / 504 responses.

When I contacted their customer supported , I was pointed to the following URL where they explain in detail how they handle these errors https://www.backblaze.com/blog/b2-503-500-server-error/

Essentially they are not considered as errors and expect the client to retry loading the file. This approach won't work in our use case.

tgtweak · on May 5, 2020

You can definitely get read errors and your application should be aware of this and handle this use case. Amazon's own S3 is not immune to this and after years of running spark jobs which shard output across thousands of files on s3 (and load from the same) you'll see these underlying http errors on a daily scale even intra-region.

Even if you're doing multi region s3 replication you'll run into this for external clients semi-occasionally.

Shakahs · on May 4, 2020

If you are using Cloudflare you could use a Worker script to automatically retry the origin pull and only cache it on success.

nacs · on May 5, 2020

Retrying a failed network request seems like a normal thing to do -- whether its a network timeout, server error, or whatever random hiccup happened.

TickleSteve · on May 5, 2020

So, you're relying on the API being 100% reliable? no errors?

ing33k · on May 5, 2020

I not expecting 100 % reliability.

but when we get a 503 response it should be considered an an error and acknowledged by the provider that it's an error.

in my use case, we were using a CDN which was configured to pull files from B2. When B2 responds with a 503/500 I have no control on the retry mechanism.

The error rate was around 5-10%

avolcano · on May 4, 2020

Huh, I thought it already had this! Must have mixed it up with a different object storage service (maybe DigitalOcean?).

I've been using B2 for backup storage for some personal projects. It doesn't necessarily do anything "better" than S3 from what I've seen, but never having to log into AWS's dashboard is a reward enough on its own.

They do have a command-line client that's a quick PIP install, so you can do something like:

  b2 upload-file bucket-name /path/to/file remote-filename

Which is, of course, nice for backups.

neurostimulant · on May 4, 2020

I really wish the B2 client support uploading file from Unix pipe. It would be nice to be able to archive a huge directory into a tar.bz archive and directly pipe the result into the B2 client without having to save the archive into disk first.

Currently I have to save the tar.bz archive to disk first before uploading to balance. Took several hours to do so (huge spinning disks array, not as fast as ssd), while uploading to B2 is blazing fast. Saving the archive to ramdrive essentially solved this, but as the data grows I don't have enough memory to spare anymore for a ram drive that can fit the whole archive.

jorams · on May 4, 2020

Can you use process substitution?

    b2 upload_file bucket <(tar -cj huge-directory) archive.tar.bz2

The argument the command sees will be something like "/dev/fd/42", and the shell will provide the output of tar through that file.

neurostimulant · on May 14, 2020

So apparently process substitution doesn't work. The b2 client is probably trying to read file size or something, keep failing with "ERROR: Invalid upload source: /dev/fd/xx". Maybe their api requires knowing the filesize upfront instead of allowing "streaming" upload.

neurostimulant · on May 4, 2020

Does process substitution actually wrote the content to disk first or not? The information on internet I found seem to be conflicting on this. If it's actually writing the data into disk first, then it's probably won't solve my problem (limited disk i/o). Afaik writing to pipe won't result in saving the data to disk temporarily. I guess the only way to know is to try it out on my system and see how it performs.

Ineentho · on May 4, 2020

If process substitution doesn't work, shouldn't /dev/stdin work? I haven't tried it, but as long as b2 doesn't try to check the file size before uploading I don't see why it wouldn't work: b2 upload_file bucket /dev/stdin < file

ADefenestrator · on May 5, 2020

It definitely doesn't write it to disk first. It's basically a pipe() under the hood, but exposed as a file descriptor. Downside is that seeking doesn't work, but that shouldn't affect your case.

neurostimulant · on May 4, 2020

Thanks for the idea! I'm going to try this.

Hamuko · on May 4, 2020

>Huh, I thought it already had this!

Same since it seems to be on the new storage service launch checklist right after "buy hard drives".

jszymborski · on May 4, 2020

Ooo, even more reason to set-up a NextCloud instance now! Previously, it wasn't really practical to set-up B2 as external storage because you'd need to also set up a compat layer.

christefano · on May 6, 2020

I set this up yesterday, and it was a breeze.

Just had to be sure to omit the B2 external storage folder from the backups on my Nextcloud server.

Now only if Virtualmin (YC ‘08) supported virtual server backups to S3-compatible B2 cloud storage… There’s an open ticket for this at https://www.virtualmin.com/node/65024

mgamache · on May 4, 2020

S3 is now the standard for cloud storage APIs? Not sure if that's good or bad. I guess competitors have to reduce switching costs.

advisedwang · on May 4, 2020

I think it essentially always has been. It was there first, [GCP copied it](https://cloud.google.com/storage/docs/xml-api/overview), [OpenStack has compatibility middleware](https://docs.openstack.org/swift/latest/middleware.html#modu...).

atYevP · on May 4, 2020

Yev here -> It's not so much a standard, though S3 is generally the most often used suite of APIs. 100s of integrations exist with our B2 Native APIs (https://www.backblaze.com/b2/integrations.html), but a lot of folks only know how to write to S3 Compatible APIs and don't have the resources to write to multiple API suites, so this makes integration easier for them!

mgamache · on May 4, 2020

That's how something becomes a standard... You are just responding to market realities.

vorpalhex · on May 4, 2020

Digital Ocean, OVH and others seem to use the S3 API. It does give you the benefit of clients written against S3 can now be used for your platform, and unlike a lot of AWS the API contract seems pretty reasonable and well kept.

Edit: Confused Hetzner and OVH, sorry

fahrradflucht · on May 4, 2020

Hetzner offers S3 compatible storage? I didn't find anything about that...

vorpalhex · on May 4, 2020

Woops, looks like I mixed up their storage offering with OVH. Sorry about that!

geniium · on May 4, 2020

That's a good question to ask. S3 has become so common that it's going toward that road.

It's like brand names that become so common that people use no more the material name, but the brand.

tiborsaas · on May 4, 2020

It took me quite some time to realize that TRX is actually a brand and not some abbreviation for the trainers :)

guenthert · on May 4, 2020

I don't see Google, MS Azure or Rackspace switching their API anytime soon, but yes, smaller players and new ones do.

budmang · on May 4, 2020

(backblaze ceo here) Google and Azure continue to offer their own API as do we. They also offer S3 compatibility options to support customers who want to use S3-compatible products...as now do we ;-)

heipei · on May 4, 2020

Now that we're talking about B2, has or is anyone using them for latency-sensitive small-file object storage? I'm about to take the plunge and set up benchmarks, my use-case is that I want to store and serve ~ 500k small files (30b-1MB) per day to website visitors. So far B2 support has told me that it shouldn't be a problem, and early benchmarking indicates the same, just curious if anyone had stories from the trenches.

willcodeforfoo · on May 4, 2020

We use B2 to store images on Vintage Aerial (https://vintageaerial.com), both high res scans and all kinds of thumbnail sizes.

It is... a little slower than I'd like but with Cloudfront in front it has been manageable. I love tips from Backblaze on how to increase performance there beyond caching to CF.

heipei · on May 4, 2020

Can you go into detail what you mean by "slower than I'd like"? Are you talking about TTFB (Time-To-First-Byte) or sustained read or concurrent read performance? Are you using the API or the HTTP endpoints from a public bucket?

qeternity · on May 4, 2020

CloudFront or Cloudflare?

haywirez · on May 4, 2020

I'm seeing horrible TTFB from Europe. Most files are fast but sometimes a request is stuck for 10s or more...

heipei · on May 4, 2020

Is that to their US datacenter or the European (Amsterdam) one? So far the European one has been pretty snappy for me, ping is ~20ms from my home ISP (Germany) and TTFB is good enough that I can instantly saturate my 300MBit home ISP line with concurrency of ~ 500 downloading small objects (5b-1MB), but I haven't looked at min/max/avg/median yet.

BIG gotcha btw: You have to choose between US and EU when you create your B2 account! You can't have buckets in the other location, so that means you'll need two accounts if you want to do that.

budmang · on May 4, 2020

(backblaze ceo here) Just fyi, you can put both accounts into a single Group for easier management: https://help.backblaze.com/hc/en-us/articles/115000014914

ing33k · on May 4, 2020

It usually works , but we have had some issues with high loads, please check my other reply in this thread.

rkrzr · on May 4, 2020

Does Backblaze offer strong consistency for files?

The killer feature of Google Cloud Storage in my eyes is its ability to be strongly consistent, if you set the right HTTP headers. This is not possible for Amazon S3, which is always eventually consistent and makes it unusable for many use cases where you need to be able to guarantee that customers will always see the newest version of a file.

nilayp · on May 4, 2020

Nilay from Backblaze here.

Yes - B2 is strongly consistent. When you upload an object using either the B2 Native or S3 API - the object is persisted to the final resting place before the upload completes. Therefore, you can list/download the file immediately after your upload completes.

kcolford · on May 4, 2020

I know that the underlying backbaze b2 is strongly consistent because of how they shard it. You get a different download endpoint depending on your bucket corresponding to the data centre. Not sure how they implemented their S3 endpoint though so that will be interesting.

kstrauser · on May 4, 2020

As a Synology user, please let this mean that Hyper Backup can work with B2 now (or at least soon).

nilayp · on May 4, 2020

Nilay from Backblaze here.

Synology Hyper Backup absolutely works. Details are here: https://help.backblaze.com/hc/en-us/articles/360047171594

(If you saw my old answer, ignore it. I was misinformed.)

kevstev · on May 4, 2020

I was trying to see if this was now better than Glacier- and aside from the SLA's being much better in terms of retrieval, I am not sure they make sense for a backup use case- where you are only really planning on downloading that data back down in a worst case scenario. It may depend on what your incremental backups look like as well- mine are negligible- a dump of a few GB of photos after holidays, other records are tiny.

Glacier pricing in us-east is .0004 vs .0005 for B2. There is always pricing obfuscation with cloud, but AFAICT, there is no need to move off Glacier for a backup use-case.

cdumler · on May 4, 2020

My two cents is that there is no reason to _use_ Glacier as a backup strategy. Glacier's cost come for restoration: the more you restore and the faster you want to restore it, the more quickly costs rise. It's far better suited for a collection where you're pretty much most of it will never be restored but what you'll need to restore you don't know. Think video, art, music assets for projects. B2's retrieval is far, far lower cost and completely immediate for restoring an entire backup back to a server. If you're not careful that extra .5 cents you save will really cost you on a full restore.

Dylan16807 · on May 5, 2020

If you can wait a few hours, the better comparison is probably Glacier Deep Archive, which is not $4/TB/month but $1/TB/month.

Amazon wants to charge you $90/TB* to get data to the outside world, compared to B2's $10, but you can mitigate it in various ways. At the low end that's using a lightsail instance as a VPN, depending whether you think the TOS allows that. At the high end it's paying flexify.io $40 to move your data to B2, then paying B2 $10.

There might be other ways to improve S3 egress costs. It's a very hard thing to search for. I only learned about flexify from this post.

So if you have to restore less than half of your data each year, Glacier Deep storage will save you money. It's worth considering, unlike normal Glacier which is almost entirely downside.

* There's also a $2.50/TB fee to get things out of Glacier, but that's dwarfed by the other costs.

kevstev · on May 5, 2020

Thanks for making me aware of this- I missed the Deep Glacier announcement last year it seems. Glacier is already cost effective as is, but this will make it even cheaper!

I use this as an offsite backup- that as long as disaster does not strike, I will never use, and even if does, I can be patient about restoring.

kevstev · on May 5, 2020

With backups, my assumption is that you will never need to restore except in real "oh shit" moments- fire, electrical system zapping, etc- and if those moments happen, you are willing to pay. I have been using synology for 13 years now, and have thankfully never had to resort to using backups.

This is for home use- which maybe I was wrongly assuming that most Synology users are. In a business context you might have a more routine need to recover backups and those costs become more tangible.

lillecarl · on May 4, 2020

It means exactly that, congratulations!

WalterBAmaQ · on May 4, 2020

Great. How about rsync.net-compatible (i.e. bog-standard, vendor-neutral) "APIs"?

atYevP · on May 4, 2020

Yev here -> Anyone can write to the B2 Native API or our S3 Compatible API, we have tons of integrations that do it, here's a list -> https://www.backblaze.com/b2/integrations.html

solarkraft · on May 4, 2020

I don't think that's what gp meant. Why not use a standard protcol like SFTP?

amiga-workbench · on May 4, 2020

I don't see how you could cram all of S3's functionality into sftp? How would you configure a lifecycle policy for a file for example? Or generate a signed URL?

It seems to me you would only get a very narrow subset of the functionality.

solarkraft · on May 6, 2020

I see, I suppose there wasn't a free standard before and S3's API just inofficially became one.

toomuchtodo · on May 4, 2020

Because the S3 API has become a more popular standard.

anon102010 · on May 4, 2020

what python blob storage libraries are using "SFTP"? How is that the "standard"?

I literally have NEVER seen SFTP being used for blob storage in any python project - is this a real thing somewhere?

unilynx · on May 4, 2020

I've looked at B2 from time to time, but doing database blob storage over S3 or to disk and backing up database and files over rsync made us stick to our existing technology (eg Transip cloud storage which also charged 10 EUR/month per 2TB). One thing we didn't look forward to was having to reimplement cataloging and garbage collection for all of disk, S3 and B2, so we just stuck to a rsync hardlinking solution (which makes incremental backups painless)

having access to primary storage and cheap backup storage using the same S3 API will make us reconsider that and will probably make it worth the effort to dump our rsync-based solution for B2.

numbsafari · on May 4, 2020

I would absolutely love to replace my use of S3 with B2 as a backup for data stored elsewhere. Personally, I would much rather this storage to to a service that only does storage, rather than everything else that AWS does, so I don't have to worry about anything strange happening in a cloud service I don't use every day.

When they first launched B2, I inquired about ability to enter into a BAA (Business Associates Agreement) for HIPAA compliance and was told that it wasn't "on the roadmap". It sounds like B2 has come a long way on the compliance side. Would be great if they were open to this.

atYevP · on May 4, 2020

Yev here -> Double good news for you this morning: we're now signing BAAs for B2 Cloud Storage ;-) Just contact sales and they can get you sorted out!

numbsafari · on May 4, 2020

That's great to hear! I'll definitely be reaching out.

hartator · on May 4, 2020

Actually excited by this. I was benchmarking S3 vs B2 vs others 2 years ago and I had to give up on B2 because its implementation for performance was so much more difficult. (88 lines vs 36 lines for all others in Ruby)

simplyinfinity · on May 4, 2020

So how many times a month do you have to implement this to be reasonable compared to the cost of the storage?

hartator · on May 4, 2020

This is not an implementation cost issue.

It was just super hard to make the code perform well. Like you have to manage client sessions on your side and chose optimizations on your side. Like you have to spread things manually. Which is hard to do. Whereas S3 is maximising your bandwidth with no custom code required. It's not really S3 compatibility that was needed but B2 API wasn't good.

prirun · on May 4, 2020

HashBackup (author here) was one of the 1st if not the first B2 integration. I didn't find the B2 API any more difficult to use than the S3 API. It has the same functionality with similar kinds of API requests. The only significant difference is that you have to request an upload URL and download URL, and requests can sometimes return a code to get a new URL when a vault is full or overloaded.

There is a price/performance trade-off: B2 has higher request latency than S3, no matter where you are (my experience), but they also are 5x cheaper on storage costs, 10x cheaper on download bandwidth, and have no price gimmicks like minimum object sizes or minimum object lifetimes like many other services (S3 IA for example).

To make up for B2's request latency it is more important to issue requests from multiple threads, especially for short-running requests like removing files.

Another key difference is that B2 always uses SSL whereas S3 can be accessed without SSL with little security impact because each S3 request is individually signed with a secret key. Setting up an SSL connection is more overhead, so another key to performance is to reuse connections.

Both of these suggestions apply to S3 as well, just more to B2 because of the latency difference.

atYevP · on May 4, 2020

Yev from Backblaze here -> Glad that you took a look at us in the past, excited that this announcement will warrant another look! :)

sida · on May 4, 2020

Can Amazon actually patent their API (per the google vs oracle case) - basically like prevent other vendors to provide S3 APIs so that Amazon can lock in users.

I am not a lawyer. So this is a genuine / dumb question.

brianwski · on May 4, 2020

Disclaimer: I work at Backblaze. I'm also not a lawyer. :-)

> Can Amazon actually patent their API (per the google vs oracle case) - basically like prevent other vendors to provide S3 APIs so that Amazon can lock in users.

Most likely yes. Backblaze plans going forward are to fully, uncompromisingly maintain our original native B2 APIs for a few reasons including this concern.

It's probably up to Amazon whether they want to boot all 3rd parties off their S3 API. Backblaze has a viable fallback if that occurs. I hope for customer's sake Amazon doesn't declare war in that fashion.

If Amazon decides on this path, internally at Backblaze we have discussed immediately doing the opposite - declaring for all of time anybody can copy our B2 APIs. Remember, our APIs are technically superior to the S3 APIs. They are lower cost to implement, and are shockingly easier to use for developers. They don't make all the mistakes S3 made. We had the luxury of learning from all their mistakes over the years. :-)

sida · on May 4, 2020

This is kind of scary that companies can patent an interface.

So google cloud is actually expected to also have this potential legal time bomb?

Amazon can sue you and retroactively force you to pay them right? So all they need to do is to wait for the alternatives to become popular