Readme file for mlsfit.f

Benjamin Weiner, August 2006

Introduction:

MLSFIT is a Fortran program that fits straight lines y = A + Bx to
(X,Y) data with errors in both coordinates and allows for intrinsic
scatter in the Y-coordinate of the linear relation - scatter beyond
that attributable to observational errors.  The algorithm is derived
in the Appendix of Weiner et al. 2006 (ApJ, submitted,
astro-ph/------); see that paper for further background and
references.

For more information about least-squares fitting, please
see that Appendix, the Readme file for the LSTSQ program
that I also distribute, and you might want to look at 
Chapter 15, "Modeling of Data," in Numerical Recipes.

To compile MLSFIT:

Extract the tar file to get Readme, source, Makefile, and
Linux binaries.  On a linux system you should be able to run
the binaries, but you need to setenv PGPLOT_DIR [local-pgplot-dir]
or PGPLOT won't find its fonts.  To compile, you need to
edit the Makefile to point at your local copy of the PGPLOT 
library.

Input:

You will need an input file, a text file with one data point per
line.  The columns are X, X-error, Y, Y-error.

The file does not have to be formatted but lines must be <80 chars long.

Note that due to the way the program does convolutions, it may not
work well if your data has a steep slope of Y on X.  If this is the
case, you should exchange X and Y so that the slope is shallow.
Also, the algorithm models intrinsic scatter in the Y coordinate,
so if your data has selection limits on one coordinate, make that
coordinate X.
If your data has a lot of intrinsic scatter in a coordinate _and_
a steep slope in that coordinate, you are likely to have problems
fitting with any algorithm.  If you have selection limits in X
and a steep slope of Y on X, you are trying to do a problem that
is very likely ill-conditioned.

Running the program:

MLSFIT prompts you for a bunch of parameters, in order:

1. Zeropoint to subtract from X before fitting
2. PGPLOT graphics device
3. Name of your data file
4. Grid to search in intercept A: Amin, Amax, dA
5. Grid to search in slope B:     Bmin, Bmax, dB
6. Grid to search in scatter D:   Dmin, Dmax, dD

As an example of this, for the included sample datafile, try typing:

mlsfit
0
/xs
mlsfit.testdata
-0.5 2 0.05
-0.5 0.5 0.01
0 0.26 0.05

At the end of this you should get a line like:
  Overall minimum: 
  0.700000048 -0.0400000103  0.150000006  50  63.9507065
indicating a best fit of A = 0.70, B = -0.04, and scatter = 0.15,
on 50 datapoints, with chi-squared 63.95.

The program loops over the values of the scatter, here going from 0 to
0.25 by 0.05.  At each value of the scatter it goes over a grid in A
and B, computing the probability of that model y = A + Bx.  At each
scatter, it makes two plots.  The first is the probability contours in
the A-B plane, with the best A,B and the nominal errors on A and B.
The second plot shows the X,Y data with the best relation and the
1-sigma scatter lines overplotted.

At each value of the scatter, MLSFIT spews out a bunch of output,
which will be cleaned up and documented slightly better in future 
versions.  The most important line is probably the one that begins
"Best a,b and err" which gives best values and errorbars at each
value of scatter:
  A_best A_lower-error A_upper-error  B_best B_lower-error B_upper-error
Note that the errorbars are found by determining what part of the grid
contains 68% of the probability.  If you use a coarse grid, the errors
will not be very accurate and the error-finding routine will occasionally
fail.

At the very end, MLSFIT prints out the "Overall minimum," the best fit
over all values of the scatter.  The contents of this line are:
  A_best B_best Best_scatter N_datapoints Chi-squared-best

If the best chi-squared is significantly greater than N_datapoints,
you could rerun with a finer grid to make a more accurate localization
of the best fit.

MLSFIT as provided is a bit more cumbersome (and slower) than 
LSTSQ-with-intrinsic-scatter, but does a similar fit.  However,
MLSFIT can be modified to handle non-Gaussian errors or scatter
and other such issues.  The way to do this is by modifying the
subroutine "calcprob1".  A few modified versions are buried
in the source file.