The subtitle claims that a “universal way” was found to solve the nearest-neighbour-search problem for any kind of data, but actually the result is restricted to the (rather huge, of course) set of normed spaces, i.e. spaces whose distance measures obey the triangle inequality.
Incidentally, the zone system for the Danish public transit system is not a metric space. (The ticket you need from A to B is not necessarily the same as from B to A.)
That's a kind of practical graph where you want efficient shortest path algorithms for.
The rail network in the UK is also strange. On some routes it is cheaper to split your journey across multiple tickets. On others it is cheaper to buy a ticket for a longer journey. However, this is illegal and the train companies do attempt prosecutions.
To be clearer, splitting tickets is legal, and while the second ticket may become invalid if the first train runs late, train companies now honour the second ticket if the reason you miss your train is because the incoming train has run late.
However, what can be in breach of the terms and conditions is to buy a ticket for a longer journey than desired and then get off at an intermediate station. If you read the comment to which you're replying, it's the "overlength ticket" that he is saying can be illegal.
Fair enough for split ticketing, but they prosecuted someone last year for overlong ticketing. It only failed because they got the facts (as opposed to the law) wrong.
It is illegal because the railway byelaws make it illegal. The railway doesn't just have its own law, it also has its own police force, The British Transport Police, though as far as I know they don't do the ticket enforcement.
You are not allowed to break a journey either, so a ticket inspector can spot a ticket for a longer journey at one end of the other. The multiple tickets ruse is more of a grey area.
Rules vary by train company and route. In general you cannot break your outward journey (except for necessary changes of train) but you can break your return journey.
The National Conditions of Carriage actually permit breaking an outbound journey, provided the ticket remains valid for the continuation. Outbound tickets are usually only valid on the day, so in that case you can't continue the journey the next day. But a regular standard (or first-class) ticket on a train that's not the last train of the day can be broken on the outbound journey.
For most tickets.
Some tickets, such as Advance and some Super-Off-Peak, have additional conditions. However, I do have a letter from a train company saying that they were wrong to have charged for an extra ticket, and wrong to have tried to start proceedings.
So breaking an outbound journey can be legal, and in such cases buying an over-length ticket is possible. It's circumstances like these that led in part to some of the specifics of the recent shake-up of the fare structures.
Part of the reason for all this is that the train system has been broken up into a collection of franchises. Some companies will then incentivise some journeys, and so the fare doesn't always reflect the time or distance of the journey being made. As a result you can find and exploit anomalies, but only if you're willing to spend the (sometimes considerable) time and effort.
For busses, the B stop on the way out may be in another zone as the B stop on the way back because they're geographically displaced. For busses and trains, depending on your type of ticket (2-8 zones, 8+ zones, commuter card) and whether the zone intersection is on or between stops/stations, you may have to pay for that zone one way but not the other. Those are edge cases.
But the most confusing part is the ring topology (here shown with the zone-ring center being the actual city center):
When buying 2-8 zones you don't pay for individual zones, but rather rings from your origin zone. Going one way, the zone rings look one way. Going back they look different, because the origin zone is different.
Your origin zone on the way out was adjacent to both other zones needed (needing only 1 ring), but your origin zone on the way back was not (needing 2 rings).
So they're not just different prices: The graph that spans train stations via zones in the ring topology is not metric. So getting a return ticket is not the inverse of getting the ticket out. And while you have a ticket that's valid for the timespan of your return, it may not be valid for your return path.
What the... that was painful to read and comprehend. Was this kind of complexity really necessary? Did someone get a promotion for coming up with this?
I copied and translated a quite limited subset of the official traveller guidelines.
There are several reasons why it ended up like this, and I don't know all of them.
But two reasons I can think of:
1) With the introduction of RFID-based commuter cards, the zone system was revised for traditional ticket types as well. It appears convenient that a price can always be determined by knowing the check-in point, the check-out point and the shortest path in the graph. Unfortunately, with the old ticket types, you have to perform the calculation instead.
2) It is the odd shape of zone 2 that causes some tickets to not work on the way back, even though the trip goes in a straight line. But for a large part of all trips, you only need zone 1 and 2, or zone 2 and zone (30, 31, 32, 33). Only the ones that start in zone 2, go through zone 1, go back into zone 2 and farther into another zone (or vice versa) have the anomaly. (Trips that actually bend can have this too, but it is slightly more intuitive.)
Other possibilities: Trains going in different directions don't make the same stops (express vs local), and time-based fares applying to a specific direction based on commuting patterns.
In the US, on Amtrak, a ticket is specific to an exact train, like buying a plane ticket. There's no generalized "I want a ticket from DC to NYC."
Some of the shorter ones have unreserved coach class tickets that aren't restricted to a specific train. So, while there is no generalized, "I want a ticket from DC to NYC", there is a generalized, "I want a ticket from Chicago to Milwaukee."
Another possibility is that there is more demand in one direction.
You might have a major city, A, a center of finance and law and full of corporate headquarters, and a smaller city, B, where many of the companies whose headquarters are in A have their engineering and manufacturing facilities.
Someone making a deal with one of those companies may need to visit both engineering to work out technical details first and then visit headquarters to work out business, financial, and legal details. They will travel from home to B, and then travel to A, and then when done travel directly home from A. This may happen much more than the other direction--people needing to visit headquarters first and then going to visit engineering and manufacturing, without needing to go back to headquarters.
Net result: more passengers needing B to A than A to B. This can result in A to B trips having empty seats. It might even result in some trips with no passengers at all! They can't simply cancel those trips, because they need to get the vehicles to B to handle the B to A traffic, so they sell seats cheap.
Chicago has a minor version of this, even without zone fares: The entry fee to get on a train is $2.50 everywhere except O'Hare Airport, where it's $5.
Nah. Most the people I see on that train are people commuting to/from work, and locals who live along the blue line corridor. Some tourists, but I think most of them are opting to pay a driver 8x as much to take even longer to get them downtown.
You are right, I wantonly omitted some of the important properties. Cf. the response of one of the authors in another comment on how their method tackles general metric spaces.
The method gives a data structure for any metric space, and the quality of the data structure is controlled by how well expanders embed in the metric space. This implies interesting results for metrics which are not norms, like geodesics on certain manifolds. The surprise to us was how well this approach works for any norm, however. (I am one of the authors.)
Also, a math correction: spaces which obey the triangle inequality are called metric spaces. The class of normed spaces is more restricted, but still very large.
I don't think the general property of expander graphs is needed, perhaps a relaxed property, such as the following definition:
A graph G is a distance(k)-expander if
#edges(U, V-U) > c min(#U,#(V-U)) where U is a set of diameter <=k. Now study the embeding.
I am also curious about what happens if one use SVM to classify vertices of an expander graph: Let G be an expander graph and let u1,u2 be vertices of G such that d(u1,u2) = diameter of G = an even number 2k. We classify the vertices of G like this: f(u) = 0 if d(u,u1)<=k, f(u)=1 if d(u,u1)>=1, then using SVM applied to the vertices encoded by the rows of the adjancecy matrix. I would expect the SVM algorith to have some special property but I don't know what to expect, the best or worst classification, or the support vectors can gives us some information about symmetries in the graph.
I guess the best conversion of cosine similiarity to distance would of course be d(A,B) = arcCos(Cosine_similiarity(A,B)), i.e. the angle between the 2 directions, which would have the property that subdividing a geodesic arc and calculating the sum of distances along the subdivision results in the same distance as the start and end point.
> For example, “Manhattan” distance forces you to make 90-degree turns, as if you were walking on a street grid. Using Manhattan distance, a point 5 miles away as the crow flies might require you to go across town for 3 miles and then uptown another 4 miles.
It's interesting that this is called Manhattan distance because it's only relevant in a town where everyone jaywalks... Like Manhattan. It's far from true anywhere where jaywalking is frowned upon.
Because of (the lack of) crosswalk synchronization, it's a lot faster to walk to a place 2 blocks over and 2 blocks up than it is to walk to a place 4 blocks in one direction. Because at the first two lights you have the option of crossing in either direction, at which point you may only have to wait a few moments before crossing in the other direction.
Yes, this is the value of optionality, as N.N.Taleb would say.
I had exactly the same thought about crossing streets when reading one of his books. Always choose a route where you have the choice to keep walking in some direction towards your destination, with a delayed jaywalk to cross to the side you want during a gap in traffic, rather than wait on a corner when a crossing is blocked. The reason for blockage can be a crosswalk light (if you obey them) or traffic itself - the rule applies even without any signal-controlled junctions.
Ah that's so cool. I've been reading about expander graphs lately. Their interesting properties make them pertinent to lots of questions. In particular, you can use expander graphs to cook up error correcting codes and prove the PCP theorem. There's a class of expander graphs called Ramanujan graphs which are characterized by an analogue to the Riemann hypothesis involving a zeta function that counts the prime cycles in a graph.
I really hope this leads to databases with good support for fast nearest neighbour search. This would be especially useful in word2vec cases, where you have millions of word vectors, and want to find "words similar to this one" without either having all of them in memory, or going through all of them to find out.
I wonder what this can mean for fuzzing, optimization, learning or any kind of task that has to do with tip-toeing into potentially high dimensional spaces?
There are a lot of places in practical neural nets with attention where you want softmax(queryvector · memorymatrix), where memory can be quite large. If you have a decent ANN implementation, you can approximate by only calculating the dot product for the vectors of memory that neighbor the query.
There are currently a ton of mediocre ways to do this because nothing really works very well in high dimensions, and calculating this can easily be the bottleneck in training and evaluation.
I'm curious to if this result will extend to the k-nearest neighbors (k-NN) algorithms.
Two problems that have to do with k-NN are
1. It's a non-parametric method: the number of parameters grow linearly with the size of the training set since the distance function must be calculated for all training points and the test point.
2. The curse of dimensionality: distance metrics like the Euclidean distance do not perform well in higher dimensions; points which seem "close" in 2D may be far in 3D, 4D, etc. As a result, we would need an exponential amount of more training data for every additional dimension. Locality sensitive hashing tries to combat this by reducing the dimensionality of the data.
I’m curious if it will extend to k q-flats, or other notions of points being near each other purely by being near to the same subspace, rather than pointwise nearness.
The article announces the algorithm, which isn't published yet. This first paper contains the proof that the result is possible, not yet the efficient algoithm.