I'm more interested in the lifetime of the medium. Durable backup media for consumers is still a holy grail as I understand. M-DISC didn't hold up to their 1000 year lifetime promise. Archival-grade DVD's are also not good enough as I understand. Syylex went bankrupt. I want a consumer-grade backup medium that can provide at least a 100 years of lifetime.
That said, I was able to recover 90% of the voice recordings my father made between 1959-1963 on reel-to-reel tapes 60 years later. Tape can be very durable but what I recovered was analog voice very tolerant of errors. I'm not so sure about gigabytes snuck into an inch-square.
Brings back memories of a great remix contest in 2006 where the digital copies of analog tape tracks from Peter Gabriel’s “Shock the Monkey” were made available to remixers.
To my surprise the pitch of the samples was a little lower (and varying ever so slightly over the duration of the song) than what you’d have expected with a440 tuning. It baffled me, since I expected some of the early digital synths used in the original sessions should habe been rock solid 440 tuning.
And that’s how I learned about “tape stretch” where analog audio tape stretches just enough to make the pitch of everything a few cents lower over long period of time.
p.s. I ended up applying digital pitch correction, so I could “jam along” with my own synths :-)
A different problem happened to me when digitizing my dad's tapes. Dad bought the player in USA and brought it to Turkey and made the recordings there. When I digitized them in USA, everything sounded in higher pitch. It later turned out that the voltage difference in AC (60hz/50hz) caused the rotation speed to change proportionally, so I slowed it down to 5/6, and it was perfect afterwards.
Ask HN: How can I create the most reliable and durable NAS today? I have a lot of very sentimental, very-important files, such as family photos and videos. And I simply like to hoard data.
I currently have 8TB of data stored on a Synology DS218+ with RAID1, and monthly data scrubs (verifying checksums). It is backed up remotely to Google Drive (in encrypted form), and I also maintain an infrequently-updated, once-per-quarter disk clone with an external HDD.
My biggest concern with my current setup is that the memory is non-ECC. Even though the files are checksummed, I am concerned that specific memory corruption / bit-flips could propagnate into the checksums, and hence result in data corruption.
I am considering:
* Building my own FreeNAS box using AMD Ryzen (which semi-officially support ECC memory). My concerns here are the semi-official nature of support: how do I know if ECC works, before a rare cosmic bit-flip?
* Purchasing a Synology DS1621+. This is AUD$1400 which is a tough pill to swallow, for the equivalent of a first-gen Ryzen quad core and 4GB of memory.
You’ll know ECC works when it matters; When you encounter a flipped bit. With an 8TB RAID, that is likely to be within the next 24 months.
Go with option 1, and Raid-Z2+. With RaidZ2, you’ll be able to not only detect but correct a flipped bit - Even if that flip happens when writing out the data.
Pay attention to the counters. Your ZFS scrubs will report how many resilvers they have. You’re likely to encounter 1 in a scrub. You’re unlikely to encounter more than 2. If you see that, that’s when you check for memory errors. A single bad sector is likely the hard drive. Even a single flipped bit is likely a transient error; It could be your memory, or your disk, or anything in between. It happens at scale, and 8TB, read repeatedly, is a lot of bits.
Look into rasdaemon and memtest86 - They’re the tools you use to debug what happens when
The other advice I can give you: Don’t be paranoid. Your photos are likely to acquire bit rot. You will have dozens, even hundreds of bit flips that will happen in your lifetime. Of the many thousands of photos you will take, the chances that you will ever notice the discoloration or bad line that a bit flip will have in a photo are pretty small. Bit rot happens. Your photos are important to you, and you should treasure them, but treasure them for what they are: Things that you protect and are under your care, not things that must be twenty nines of correct. You can realistically achieve 10, even 12 nines of correct reads on your data. You don’t need more.
"Next, you read a copy of the same block – this copy might be a redundant copy, or it might be reconstructed from parity, depending on your topology. The redundant copy is easy to visualize – you literally stored another copy of the block on another disk. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally – no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted. A later scrub will attempt to read all copies of that block and validate them just as though the error had never happened, and if this time either copy passes, the error will be cleared and the block will be marked valid again (with any copies that don’t pass validation being overwritten from the one that did)."
(dont just read the quote, read the link)
I am using ZFS since 2007/2008 and I have never had any issues (except with those damn Seagate 3tb DeathStar hdds, where I was barely able to replace them fast enough - in 3 months 3 were gone - i will never buy seagate again)
I am having a asus microatx board with 16gb of non ecc ram, additional SAS HBA and 3x3Tb Toshiba into zraid and additional 10TB HGST He3 disks + Ultrium 3000 (LTO-5, they are quite cheap today and are designed for 24/7 operation which is barely what I will ever encounter while tapes can be restored on latest LTOs if needed, while you can get cartridges on some sale for peanuts) for backups. There is no way in hell to go for my important data (like images) to disk only and tape is nice. You take the cartridge and store it at parents/gf/worspace drawer/...
Anyway if I remembered correctly, google has lost 150k of user accounts in 201x and restored them from tapes. So even for cloud opinionated people, it still makes sense to shovel important data to tapes if you dont use the data in everyday processing (and just an info - even shelved disks die)
Write the really important data for long-term storage to MDISC BD XL media (100GB each).
If it's pictures you care about, i'm sure it's far less than 8TB that need the VIP treatment
I would love your part list! Did you go with ECC memory? Did you have any way of verifying the ECC is working and actually detecting/correcting bitflips?
The other thing I am interested in is minimizing idle power consumption. Just to be more environmentally friendly.
I don't know if you want to build the same kind of system but at least you can get a list of parts that work together.
I use my Truenas box for storage using ZFS, VMs and NFS server for different PCs.
I bought ECC memory as I understand this is more or less a requirement for ZFS.
I found out that FreeBSD which Truenas is based on can give you info about what type of RAM is present.
The command is:
# dmidecode -t memory
According to this I have ECC RAM. :)
I did a build with the following parts:
Case: SST-CS380 V2 (space for 8 3.5" drives, 2 x ).
Mainboard: ASRock X470D4U2-2T
Power supply: Seasonic Focus Plus 550W Gold 80 Plus Full Modular Power Supply
RAM: Kingston Server Premier, DDR4, 16 GB, KSM26ED8/5 M
CPU: AMD Ryzen 5 3600X Wraith Spire CPU
Cooler: Arctic Liquid Freezer II
NVMe to PCI bridge: ASUS Hyper M.2 x16 Gen 4 (PCIe 4.0/3.0) supports 4X NVMe M.2 devices (2242/2260/2280/22110) up to 256 Gbps for AMD TRX40 / X570 PCIe 4.0 NVMe RAID and Intel® RAID platform - CPU features
I bought all this from Amazon.de. The main board is a server main board with 10 GE Ethernet and a console for flashing bios and remoting into the machine - no graphics card needed. This was expensive and you can most likely save a lot using a consumer grade main board.
Be careful not to buy an AMD APU - this doesn't support ECC RAM for some insane reason. An APU would have build in graphics and the CPU.
I use both SATA drives (long term storage) and SDD drives (for speed).
I created two ZFS volumes (I don't remember the proper terms). One for rotating discs (which could sleep most of the time) and one using SDDs for fast storage which doesn't use much power.
I have 6 KW solar cells with battery so I don't really care if the box uses a lot of power. During daylight its more or less free when sun is shining. I get next to nothing when selling the electricity that I generate and would like to use as much as possible locally.
There really isn't much market for it. You can pay Google or Apple or one of the large cloud providers a very reasonable and decreasing rate for a literal guarantee that your data is accessible. The only risk is the company goes under, which is extraordinarily unlikely for someone like Google / Apple and the shutdown would have advance notice.
I realize for the hacker news audience there are multitudes of reasons the solution above doesn't fit your needs, but realize the consumer market is near nonexistent.
If nobody is "inheriting" your data (or rather—nobody cares enough to keep your data around), it seems kind of moot to ensure it hangs around. That is, if I put stuff in an S3 bucket and pre-pay for 100 years, if nobody is around to download it in 100 years then why bother?
If you wanted to make a sort of digital time capsule and didn't care who discovered it, your next best bet would probably be the Internet Archive or some other archival community.
If your data isn't appropriate for archival (i.e., can't be publicly consumed) and isn't interesting enough for your friends/family/etc. to keep around on your behalf, keeping the data is purposeless.
I absolutely take inheritance into account when having backups lasting for a hundred years, but regardless of how uninteresting my data looks, we don't know if today's boring data would be invaluable for science in the future. We show slippers from 5000 years ago in museums today and they're invaluable. Consider the person who owned it, walking on a national treasure, unaware. Maybe, they didn't even like the slippers, found them boring. :)
I was thinking that DNA is a pretty robust storage medium. Perhaps we could use it in coming years to store data for long term survival.
Though considering these comments and the advent of mRNA/CRISPR, perhaps we could store data for future generations in our own DNA. That'd be fascinating if you could read journals or even audio/video of your ancestors from your biological inheritance from them. What if we could engineer an extra chromosome to do just that, then let them remix and recombine segments of memories so everyone's would be unique.
Just store your diaries in a line of yeast that produces tasty beer or wine. That could work. I wonder what the oldest yeast lines in use today are, and how stable their genomes are.
Or if you really want your data to survive, engineer it into a virus for your local species of cockroaches! Getting the data back could be gross, but it'll survive nuclear holocaust. ;)
The importance of those slippers is tied strongly to their rarity. So little survived from 5000 years ago that almost anything from that time is valuable.
By comparison, we'll create more data in the next ten minutes than entire centuries from our relatively recent history. Lots of stuff is getting preserved in lots of places with substantive redundancy for virtually nothing. Your slippers today are likely to be more valuable than the near-infinite troves of documents and photos and whatever else.
I agree consumer market doesn't exist because everyone is seduced by the cloud nowadays. Having said that supposed guarantees provided by these companies should not translate into blind and complete trust.
A single bad bug or security issue can make data inaccessible or corrupt. Tapes on the other hand does not have that issue. IMHO trusting all your important data to a single vendor or technology is a recipe for disaster.
The lifetime of the medium is half the equation. The biggest problem imho is the lifetime of the device. Say I give you an old floppy drive for 8" floppies from the seventies. Where would you connect it?
You're right but not all media are the same. Arctic Vault used an optical film with QR codes. You can theoretically even take its photo and decode it by hand if you want. They even added a Rosetta Stone to the entrance so, even if all the knowledge is lost, one can hypothetically decode the data stored there. For magnetic media, you need some more specialized equipment for sure.
Floppies are an interesting case because the protocols and physical specifications are all documented publicly, which means that one could literally build a drive from scratch today --- the trickiest part being the heads, but considering that they are many orders of magnitude lower density than HDDs, it would not be a big obstacle in the future.
(I believe 8" floppy drives have the same interface as 5.25" ones --- and there's no shortage of adapters from the retrocomputing community for those, some of them even open-source.)
Tape is far more closed, AFAIK most of the common formats are proprietary and the specs are behind NDAs and other walls.
I have hobby books from 30 years ago that teach you how to build magnetic heads for cassette tapes. No pictures or anything but with enough patience you definitely can build one today, at home (even then). Mind you, the size of the gap is not that big of the problem if you put your mind at polishing.
Tapes are uniquely terrible at this. I'd argue it's the Achilles' heel of the medium, even moreso than the actual limitations of linear tape.
First off, the drives themselves are going to be expensive brand-new. You're going to be paying thousands of dollars for a drive, and it's probably going to have some weird interconnect that you'll have to spend even more money and waste a PCIe slot on an adapter in order to use the drive. Most common is SAS or Fibre Channel; although there's at least one company selling drives in Thunderbolt enclosures for the Mac market.
(Aside from all that, SAS is actually pretty cool for things like hard drive enclosures, since it has SATA compatibility. I have a jury-rigged disk shelf built out of an old HP SAS expander, a slightly-modified CD duplicator case, and some 5 1/4" hard drive bays.)
Second, tape formats are constantly moving. LTO and IBM 3592 come out with a new format every 2-3 years and backwards compatibility is limited. Generally speaking you can only write one generation behind on LTO and read two generations back. So, if you want a format that's got drives still being made for it, you'll need to migrate every 5-7 years. Sure, the actual tape is shelf-stable for longer, but you're going to be buying spare drives or jumping on eBay if you want to keep old tapes around that long.
(eBay is actually not a bad place to buy used tape drives, but the pricing varies wildly. It's perfect for hobbyists and small-fry IT outfits looking for cheap backup media. Absolutely terrible if you're a large outfit with reliability guarantees and support agreements to maintain.)
Third, actually using a tape drive is a nightmare. First off, Windows hasn't shipped tape software since 2003 (I think?), so you'll be in the market for proprietary backup solutions. Second, if you're just writing data directly to the tape, you will shoe-shine for days. Common IT practice is to actually have a second disk array sitting in front of the tapes as a write cache and custom software to copy data to the tapes at full speed once all the slow nonsequential IO is done. Reading from tape doesn't have to worry about this as much, but the fact that you had to use custom software just to archive your files means that you now have proprietary archive formats to deal with. So you can have tapes that rely on both having access to working drives and licenses for proprietary backup utilities.
(Of course, if you had decently fast SSDs and a parallel archival utility, you could sustain decent write speeds on tape. I actually wrote this myself as an experiment: https://github.com/kmeisthax/rapidtar and it can saturate the LTO-5 drive I tested this with.)
That's probably not going to happen. Tapes have high longevities and you can buy used LTO drives on eBay for a few hundred bucks, but the biggest issue in 100 years is going to be finding a device to read the tape, and finding an adapter to hook it to the USB-H 12 quantum optical port on your motherboard.
A better way would be to use something like that to store it for a decade or two, then copy the data onto whatever the newer version of archive medium is (LTO drives can typically read the last one or two versions as well). Rinse and repeat every decade, and it also lets you test if there has been any bitrot.
> The objective of this study was to investigate the behavior of the GlassMasterDisc of Syylex under extreme climatic conditions (90°C and 85% relative humidity) and to demonstrate the potential of this technology for digital archiving.
> The result of this study is that the GlassMasterDisc has a much longer lifetime in accelerated aging than other available DVD±R
I wouldn't draw any other conclusions on normal ageing of other tested media. They did an accelerated aging test at 90°C and 85% RH, where most discs didn't last a single test cycle (of 10 days), two discs lasted a single cycle, and only syylex lasted all 4 cycles.
Quote on a brand-name DVD
> This DVD model had the longest lifetime (i.e. 1500h) at 80°C and 85% RH. At 90°C, it is destroyed after the first cycle of 250 hours.
For an idea of what it does to the substrate:
> [for measurement] DVDs have to be taken [out] .. To prevent the formation of water droplets in the polycarbonate, it is necessary to "purify" the polycarbonate from the water that was absorbed at high temperature.
OTOH, I had CDs (Verbatim, upper middle class), of which about 1-2 of 50 had issues after 20 years storage (in dark, mostly room-temperature conditions).
Yes, they did an accelerated test, and M-DISC has performed as good as archival grade DVDs. Syylex, which promises the same lifetime as M-DISC, performed significantly better. That clearly shows either M-DISC didn't live up to their promise, or Syylex and archival grade DVDs surpassed expectations. Either is bad news for M-DISC, isn't it? What am I missing?
The "accelarated test" may not be in any way indicative of true lifetime in moderate conditions. Their own conclusion does not draw any such implications, the only other test they reference is done at 80°C (10°C lower), and the only writing on how or why this test could be indicative of archival lifetime was a generic two sentence: harsher conditions -> faster degradation (in part 4).
It was a pupose-built test to see how much of X would Syylex take. It took X better than others, none of which took X well. Tests like these are very good, if you want to go with Syylex, to make sure it's not worse in some way (X, or Y, Z), which would then suggest a need for further examination. In real aging, factor X may be completely meaningless, while Y and Z are crucial, so you cannot conclude which one will last more.
Why test 90°C and 85% RH, not 80°C, 50°C or 110°C, or bending, UV light, scratching, drop in acid .. whatever? For a proper accelerated lifetime test, you would need to identify (all) relevant degradation modes and model their behaviour (and interaction) in target vs. accelarated test conditions, and then extrapolate behaviour in target conditions. They didn't even write what type of degradation they are testing.
I'm not convinced. I'm no expert on aging simulation however. But heating a DVD to 90°C seems like it would do different things to the disc than normal aging at recommended temperatures, wouldn't it?
Given the pace at which storage capacity increases, what is the rationale for not copying over your data every 5-10 years onto the next cheapest consumer mass storage of the moment? You get all your data in one place and you don’t have to deal with standards disappearing (I read 2020’s game consoles can’t read CDs anymore, people should rip their CD collection right now).
The bookkeeping it requires for one, since you don't usually buy all your backup media at once, you acquire them over time, that gets unnecessarily complicated. It's riskier to copy the media periodically as you might increase the chances of data corruption due to the fault in the copying process (faulty RAM, faulty software, not concentrating good enough etc). You periodically introduce possibility of user/hardware/software errors to the longevity of your backups.
Also, when others inherit the media they may not have proper equipment or skill to do it themselves as goal of the preservation is to get it 100 years ahead, not keep it always in a usable state per se. For example, I'd like my children to keep my backups until my grandchildren could access them 60 years later.
For data corruption and mis-manipulation, I would be more concerned about the long term decay of any media than some bit flipping in RAM (even for tapes as their endurance relies on certain storage conditions, but it is likely to be a hard drive, writable DVD/Bluray or something flash based, these do not particularly age well).
For book keeping, I think my point is that storage media are becoming so big that you always consolidate into a single device every time you carry the data over (you may still want to duplicate for reliability). Like you can buy a 18TB hard drive today. A consumer isn't going to require more than one or perhaps two of those for anything to be preserved long term. And in 5y-10y, you will likely have 25-30TB hard drives.
The equipment problem is precisely what this addresses. You are always using the latest hardware, and the previous hardware you are using is still supported if you stick to a 5-10y cycle. For instance you would have moved away from IDE drives while you could still find motherboards with both IDE and SATA ports. But if your data is stored on an IDE drive, good luck connecting it on a computer in 2030 (if we haven't moved full Apple's "you can't customize your hardware and we deprecate everything very frequently").
Skills (and I would say mostly dedication) is still a problem. But we are talking about copy-pasting files between two media, it's not rocket science even if you don't script it.
Every 10 to 15 years, you send in your archived (and new/interim) personal data and get it back on the current top-tech storage medium. That way it's not stored in the cloud and you can keep moving the stored data forward without having to deal with it all yourself.
Not really. You can buy a 18TB hard drive now. Even if your data is humongous and needs several of those, it will likely fit on a single drive in 5-10y. So it takes an increasingly smaller amount of time to replicate (excluding the copying time which keeps the machine busy but not you).
Use https://en.wikipedia.org/wiki/Parchive and add as much redundancy as you like. A lot cheaper to over-provision, than to create the uber-archive medium.
Hell, tell me it will last 20 years and I'm okay with that, if you can guarantee at least 15 years I can then buy replacements every decade and transfer files over...
That's right, but regardless of how "extreme" testing is, archival grade DVDs have performed as good as M-DISC, and Syylex has surpassed the rest by a huge margin. Syylex promised the same lifetime as M-DISC (unlike archival grade DVDs). I think the results are good enough to see M-DISC either doesn't live up to the expectations or archival grade DVDs exceed the expectations. Either way, bad news for M-DISC. If Syylex hadn't bankrupted, it would have been the best option of course.
Yes, what I mean is that - set aside the "glass" disc from Syylex - we don't know if both M-DISKs and archival grade DVDs suck or excel, let alone how long they actually last (readable) in the real world.
IF my last guess in comparison is correct, 100,000 hours at "normal" temperature/humidity is roughly 11 years, but it may well be that without "cooking" them at 90 C°, the duration is for both 200,000 hours (or whatever) ...
That said, I was able to recover 90% of the voice recordings my father made between 1959-1963 on reel-to-reel tapes 60 years later. Tape can be very durable but what I recovered was analog voice very tolerant of errors. I'm not so sure about gigabytes snuck into an inch-square.