Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing you're competing against in terms of performance and efficiency is:

1. JSON.stringify()

2. Diff-match-patch: https://code.google.com/p/google-diff-match-patch/

3. JSON.parse()

There are several short-comings to the above approach though, specifically, you have to explicitly trade off performance vs. efficiency depending on the JSON you're dealing with.

JSON Patch is going to give you a consistent performance vs. efficiency curve which is desirable in a lot of circumstances.



It is my understanding that the JSON encoding of objects is not specified canonically because the order of keys can be arbitrary. That is, JSON.stringify() is free to return different strings for the same JSON object.

Is my understanding correct? If it is, that would totally break your suggestion.


You are correct, but if you control the environment enough to be doing this at all, then you control the environment enough to get the serialization consistent: either by having ordered dictionaries in the language a la Python and PHP, or implementing your own, or by sorting the keys lexicographically a la bencode.

That is, `diff(sorted_json_stringify(json_parse(p)))` will be adequate if `diff(p)` is not.


Your proposed algorithm is a particularly poor solution in terms of correctness if nothing else. For text, DMP performs quite well- patches can typically be applied in arbitrary order with the same result. For complex data formats, you very quickly arrive at invalid (JSON).


This is about sending a diff over the network without having to either have the full json document on hand, or worrying about other concurrent changes that might wreck the diff-match-patch. It's also json-aware versus operating on pure text which might lead to invalid json documents.


1. Errr, to apply any delta, including these, you need the document on hand. If your goal is to store deltas, this is no better than anything else.

2. This is in no way better at handling concurrent changes than anything other diff format. In fact, the extra operations it has do nothing to help you here (it is as good and bad as copy/add, or insert/delete diffs). I'd love to know why you seem to think otherwise.

JSON awareness only helps when trying to apply partial patches. At that point, the only thing it can possibly buy you is syntactic correctness, which is worthless without semantic correctness (and in fact, sometimes worse)


> Errr, to apply any delta, including these, you need the document on hand. If your goal is to store deltas, this is no better than anything else.

Doesn't this allow the sender to send updates without knowledge of the existing values, though? With a string diff, the existing value would have to be known.

> This is in no way better at handling concurrent changes than anything other diff format. In fact, the extra operations it has do nothing to help you here (it is as good and bad as copy/add, or insert/delete diffs). I'd love to know why you seem to think otherwise.

If you're sending a regular diff, you would clobber changes to other parts of the object if they were updated between your initial retrieval and subsequent change. But yeah, it doesn't seem to help with applying concurrent changes to the same part of an object, though I don't know if the patch format should shoulder that responsibility.


> Doesn't this allow the sender to send updates without knowledge of the existing values, though?

This system requires knowledge of existing values/nodes to create the diff/patch document.

However for JSON data, it does have the advantage of not clobbering data unlike a string diff as you mentioned.


You can know keys/paths but nothing else. Or you might have user-specific parts that won't be concurrently modified, but the base JSON doc might be modified by others. You'd have to have the latest version to do a text-based patch.

A text-based diff doesn't really gain much for what it loses in use cases.


I don't see why you need to know values. The operations don't require the sender to supply them.


I think you're missing the point. If I want to patch just one node and don't care what the rest of the document is, I cannot just use a text diff.


Can you explain why?

This seems like a trivial application of any copy/add delta format, which are easily expressed in text diff format.


A simple example: if the order of the keys is different, say because of a different json implementation, semantically the document is identical, but you can't apply the patch anymore.


That's why you use a canonicalizing JSON serializer, which you'd also want to use if you were tracking changes to JSON in git, for example.


The OP was talking about the current incumbent, and the solution proposed does not work for web applications, since the JSON.stringify in most browsers do not sort keys. Sure, you could implement your own javascript version, but it will be at least a couple of orders of magnitude slower.


That's very limiting. So I'm suppose to use the same Json library on various platforms for client as well as server? And what have you really gained?


There are plenty of textual tree delta formats too.


Because you may not want to serialise your data. Because the order of a json dictionary key is undefined.


Text diff is based on surrounding lines, specifically, siblings, parents, and children in this case. In JSON Patch, only the parents matter.


So then it's a bad implementation of a textual tree diff algorithm .... which again, is a standard copy/add delta format (with a small amount of metadata).


JSON data is not always stored as texts.


How else would you store it? Javascript Object Notation is text. Other formats such as BSON are different.

Or are you simply talking about javascript style object literals?


JSON is external representation of objects. You can store the actual data on server in any form you like. Doesn't mean you have to store them in text.


But its Javascript Object Notation - Notation in this case implies text - if your storing it in something other than text it's not JSON.

From Wikipedia: JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.


Like you pointed out JSON is used to transmit data objects, but storage of data objects on the server or local file is entirely different story. I hope I don't have to explain further.


Yes, then its no longer json...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: