how does it produce negative distances? cosine similarity is [1,-1], so add one ...

tlb · on Aug 14, 2018

Ah, I mistook the - at the beginning for a separator.

It does fail the triangle inequality, though, as the following python snippit demonstrates:

  import numpy as np

  def randomvec():
      return np.random.rand(4)*2-1

  def cossim(a, b):
      return np.dot(a,b) / np.sqrt(np.dot(a, a) * np.dot(b, b))

  def metric(a, b):
      return -np.log((1+cossim(a, b))/2)

  def main():
      for i in range(20):
          a = randomvec()
          b = randomvec()
          c = randomvec()
          a2b = metric(a, b)
          b2c = metric(b, c)
          a2c = metric(a, c)
          print a2b, b2c, a2c
          if a2b + b2c < a2c:
              print '  *** fails triangle inequality'
   
  main()

DoctorOetker · on Aug 14, 2018

good catch! upvoted

DoctorOetker · on Aug 14, 2018

I guess the best conversion of cosine similiarity to distance would of course be d(A,B) = arcCos(Cosine_similiarity(A,B)), i.e. the angle between the 2 directions, which would have the property that subdividing a geodesic arc and calculating the sum of distances along the subdivision results in the same distance as the start and end point.