I've been looking for something like this for years. Never found it during my go...

Scaevolus · on April 6, 2020

It uses size and/or modified time if the underlying backend doesn't have MD5s available. https://rclone.org/commands/rclone_sync/

l33tman · on April 6, 2020

But for a repository with millions of files, the GET lists required would be a substantial cost (and something that has to be done every day).

It would perhaps be useful with a mode where a repo of file hashes could be kept in parallell at the cloud provider, as just a file that could be downloaded every day, all the logic is done locally and then the files uploaded that differs, then the file hash repo is updated.

nickcw · on April 6, 2020

If you want to keep the costs down with S3 you use --checksum or --size-only to avoid reading the metadata with HEAD requests.

You can also use --fast-list to use less transactions at the cost of memory (basically just GETs to read 1000 objects at a time).

You can also do top up syncs using `rclone copy --max-age 24h --no-traverse` for example which won't do listings then do a full `rclone sync` once a week (say) which will delete stuff on the remote which has been deleted locally.

There is also the cache backend which does store metadata locally.

l33tman · on April 6, 2020

Hmm, ok seems like you could end up with something practical! Unfortunately any misstep will give you a $100/month extra bill. It was stuff like this I was hoping would be solved already without extra hacks :)