Thursday, April 13, 2006

A 'Line Fitting' Problem

This post came out of a discussion with N. Ramana Rao.

Statement of the Problem:
------------------------
Given a distribution of N points in 2D Euclidean plane. Describe the line which minimizes the Sum of the Orthogonal Distances of the given points from itself.

Please note that we consider abs values of the perpendicular distances themselves, NOT their squares. We could call such a line a "min-SOD line" o r perhaps the "spine" of the point distribution. The line need not be unique.

An Extension: If we are to minimize (the SOD plus the minimum length of the line so that it contains the feet of the perpendiculars from all the input points to the min-SOD line) what could be said about a line that achieves it?

Proposed answer to the main part:
--------------------------------

(This is a 'lemma' which we have tried to use to find an efficient algorithm to compute the min-SOD line).

A min-SOD line (spine) has to pass thru at least two of the input points and should be such that on neither half plane it divides the plane into, there should be more than Floor (N/2) points.

Attempted proof of our lemma:
------------------------------

Consider a line which claims to be the spine and which pass thru none of the N input points. First we note that candidate line has to have equal number of the input points on either side ( else we could shift in parallel to itself in one of the two possible directions and achieve a reduction in SOD).

Let this candidate spine be infinitesimally rotated in the plane about ANY point on itself, say P. Now, for any input point Pi the perpendicular
distance from the candidate spine thru P is given by r* sin t where t is the angle made by the line from P to Pi and the candidate spine . When the line rotates infinitesimally by angle dt, the differential change in the perpendicular distance is r* cos t dt.

We note that if the line were to rotate in the opposite direction from what considered above, the differential change in the perpendicular distance of the point Pi from the candidate spine would simply change sign (because then dt -> -dt ).

So for the entire set of input points, the overall change in the Sum of Orthogonal Distances will also have opposite signs when the candidate spine is rotated in the two possible directions.

This proves that if SOD increases in one direction overall, it necessarily decreases in the other direction - it cannot increase in both directions => the position of the candidate spine is not a minimum for the SOD. However if the candidate line passes thru one of the input points, for that point the perpendicular distance increases in both directions of rotation and then the line COULD BE be one for which the SOD is a minimum.

Special case:
------------
The SOD could be unchanged for both infinitesimal rotations (for example if the input points are arranged along two parallel lines in equal numbers and the candidate spine is parallel to both these lines and equidistant from both). Then we note that this property of unchanging SOD holds only locally and gets lost the moment the we disturb the symmetry of the distribution of the input points about the candidate spine by an infinitesimal rotation of the candidate spine. This always seems to be the case as long as the input points are a discrete set.

Thus we see that the candidate spine has to pass thru at least one input point. We now repeat the above argument with this input point as the center of rotation for the candidate spine; we see that the candidate line has to pass thru one more input point.

We thus conclude that the spine passes thru at least two input points and neither side it has more than floor (N/2) points.

Remarks on computing the spine:
------------------------------
For input points in general position, the spine is one of the Halving lines. So in order to computationally determine the spine for a given point set, we could try to find all the halving lines and compare the Sum of Orthogonal Distances (SOD) of the input points from each halving line.

From the work of Tamal Dey and others on the 'k-set problem', the number of halving lines for N points could be as high as N^4/3. Finding them all could take O(log N) time per halving line using dynamic convex hull data structures. Finding the SOD for one halving line and to recalculate it for each successive halving line could take only constant time per iteration. Thus we could have approximately an O(N^4/3 * log N) algorithm to find the spine. We are grateful to Prof. David Eppstein (UC -Irvine) for pointing out these details to us.

We are not aware of better complexity results for this problem. We also do not know if this problem has been explored before.


The extension:
---------
If we are to minimize {the SOD plus the minimum length of the spine so that it just contains the feet of the perpendiculars from all the input points to the min-SOD line} , the spine again appears to satisfy the above property - it passes thru at least two input points and has <= floor(N/2) points on either side. the same arguments appear to hold.

However, the line that minizes only the SOD and the line minimizing {SOD + 'spine length'} need not be the same. For example, if there are 3 input points at the vertices of an isosceles right triangle, the min-SOD line is the hypotenuse but if we are to minimize the SOD + spine length, we choose either of the two equal sides of the triangle as the spine.

Additional Remark:
------------------

It would seem that the lemma could be improved to at most Floor( (N-1)/2) points on each side of the line. This makes a difference when N is even.

Suppose N is even and we have a line L passing through at least 2 points, with exactly N/2 points on one side. Shift L parallel to itself towards the N/2 points (which does not change the SOD), so it is not touching any points. Suppose L is now the x-axis. Rotate it through theta around the origin.

(Remark: because there are exactly N/2 points on each side, it doesn't matter which point we choose as pivot.)

If there were N/2 points (x_i,y_i) above the axis and N/2 points (z_j, - w_j) below the axis (notice y_i and w_j are all positive), then the SOD (as a function of theta) is ( sum y_i + sum w_j) (cos theta) + ( - sum x_i + sum z_j) (sin theta). The second derivative at theta=0 is ( - sum y_i - sum w_j ) which is negative, so a small rotation in one direction or the other will decrease SOD. So we did not have the minimum. Conclusion: at most Floor((N-1)/2) points on each side of the line.

Note:
-----
For a distribution of points in Euclidean 3D, the min-SOD line NEED NOT pass thru any of the points.

No comments:

Post a Comment