Tutorial

Installation

This package is supported just for Julia version 1.0. Consequently, it uses package 3.0. Currently RAFF ins't in Metadata.jl, so the package can be installed with the Julia package manager. From the Julia REPL, type ] to enter into Pkg REPL mode and run:

pkg> add https://github.com/fsobral/RAFF.jl#dev

Basic usage

Just to ilustrate the potential and basic usage of RAFF, let us consider the following dataset given by an array $A$:

\[A=\left[ \begin{array}{cc} -2.0 & 5.0 \\ -1.5 & 3.25\\ -1.0 & 2.0 \\ -0.5 & 1.25\\ 0.0 & 1.0 \\ 0.5 & 1.25\\ 1.0 & 2.0 \\ 1.5 & 3.25\\ 2.0 & 5.0 \\ \end{array}\right]\]

Lets suppose the first column of $A$ as an experimental measure with result given by second column. It is easy to see that the fitting function in this case is accurate and given by

\[\phi(t)=t^2 +1\]

Now lets perturb one result of second column of $A$. For example, consider $A_{6,2}=2.55$. Assuming the model for fitting given by

\[\varphi(x,t)=x_1 t^2 +x_2 \]

we have by classical least square as result x=[0.904329,1.3039]. But when we consider RAFF algorithm we obtain the correct answer x=[1.0,1.0]. Moreover, we have the number of possible outliers and which they are.

In order to get run RAFF algorithm we need to setup

julia> using RAFF

and define the dataset and model:

julia> A=[-2.0  5.0;
         -1.5  3.25;
         -1.0  2.0 ;
         -0.5  1.25;
          0.0  1.0 ;
          0.5  2.55;
          1.0  2.0 ;
          1.5  3.25;
          2.0  5.0 ;]
9×2 Array{Float64,2}:
 -2.0  5.0
 -1.5  3.25
 -1.0  2.0
 -0.5  1.25
  0.0  1.0
  0.5  2.55
  1.0  2.0
  1.5  3.25
  2.0  5.0

julia> model(x,t)=x[1]*t[1]^2+x[2]
model (generic function with 1 method)

After that, we can run raff:

julia> raff(model,A,2)
RAFFOutput(1, [0.999999, 1.0], 6, 8, 3.672268261041584e-11, [6])

The number 2 above is the number of variables in model. The output is a RAFFoutput type, consequently is possible to handle with some atributes of this type. For example to acess only the parameters of solution, we can use

julia> output = raff(model,A,2)
RAFFOutput(1, [0.999999, 1.0], 6, 8, 3.672268261041584e-11, [6])

julia> output.solution
2-element Array{Float64,1}:
 0.999998774479423
 1.000003442880683

Note that RAFF algorithm detects and ignores possible outliers. In order to see which points are outiliers, we can acess the outliers atribute.

julia> output.outliers
1-element Array{Int64,1}:
 6

More details about RAFFoutput type and others options can be obtained in API section.

By default RAFF uses automatic differentiation, more specifically ForwardDiff package. But is possible to define and handle RAFF with gradient vector of model. For example, considering the above example, we have,

\[\nabla \varphi(x,t)=[t^2,1].\]

Programming this gradient and run raff we have

julia> gmodel(x,t,g)=begin
       g[1]=t[1]^2
       g[2]=1.0
       return g
       end
gmodel (generic function with 1 method)

julia> raff(model,gmodel,A,2)
RAFFOutput(1, [0.999999, 1.0], 6, 8, 3.672268261041584e-11, [6])

Multivariate models

RAFF supports fits to data sets of different sizes. To illustrate how this works, consider the following example:

julia>     data = [1.0 1.0   2.0
                   0.0 0.0   4.0
                   7.0 1.5  -4.5
                   2.0 2.0 -17.0 # outlier
                   0.0 8.6  -4.6]
5×3 Array{Float64,2}:
 1.0  1.0    2.0
 0.0  0.0    4.0
 7.0  1.5   -4.5
 2.0  2.0  -17.0
 0.0  8.6   -4.6

and the following model

julia>     model(x, t) = x[1]*t[1] + x[2] * t[2] + x[3]
model (generic function with 1 method)

Note that this model has two variables $(t_1,t_2)$. Naturally, this problem has one outlier (data[4,:]), so there are 4 trust points. Let's run RAFF and check the answer.

julia>     output = raff(model,data, 3)
RAFFOutput(1, [-0.999999, -0.999999, 4.0], 7, 4, 2.9377653725356374e-11, [4])

The right answer is [- 1.0, - 1.0, 4.0]. As we can note RAFF get a good fit for the data set.

Handling the output follows the same pattern as the one-dimensional case.

In order to get improvements in processing time, we can code the gradient vector of model to.

julia>     gmodel(x, t,g) = begin
               g[1] = t[1]
               g[2] = t[2]
               g[3] = 1.0
               return g
           end
gmodel (generic function with 1 method)

julia>     output = raff(model,data, 3)
RAFFOutput(1, [-0.999999, -0.999999, 4.0], 7, 4, 2.9377653725356374e-11, [4])

Changing some options

Naturally, RAFF has options like precision of gradient stopping criteria and initial guess.

julia>     output = raff(model,data, 3;initguess=[0.5,0.5,0.5],ε=1.0e-4)
RAFFOutput(1, [-0.999999, -0.999999, 4.0], 7, 4, 2.9377653725356374e-11, [4])

RAFF is based on an optimization method. In this way, it is subject to stop at stationary points that are not global minimizers. For this reason, heuristics were implemented to find global minimizers. Such heuristics depend on random number generation. So if you want to run tests with more reliability this can be a useful strategy. To define in RAFF, say, 1000 different starting points, is enough to redefine the keyword argument MAXMS.

julia>     output = raff(model,data, 3;MAXMS=1,initguess=[0.5,0.5,0.5],ε=1.0e-10)
RAFFOutput(1, [-1.0, -1.0, 4.0], 10, 4, 5.2480273340321425e-25, [4])

Parallel running

RAFF can be run in a parallel or distributed environment, using the Distributed package and function praff. Let's use praff to solve the same problem from the beginning before. First, the Distributed has to be loaded and the number of workers is added. It is also possible to add the address of other machines.

using Distributed

addprocs(3) # Add 3 worker processes

This step can be replaced if Julia is initialized with the -p option

julia -p 3

Now we have to load RAFF and the fit function in all workers:

@everywhere using RAFF

@everywhere function gmodel!(x, t, g)
    g[1] = t[1]^2
    g[2] = 1.0
end

@everywhere function model(x, t)
   x[1] * t[1]^2 + x[2]
end

then, we call praff to solve the problem (note that we do not need to send the A matrix to all workers, since it will be sent by praff).

A=[-2.0  5.0;
  -1.5  3.25;
  -1.0  2.0 ;
  -0.5  1.25;
   0.0  1.0 ;
   0.5  2.55;
   1.0  2.0 ;
   1.5  3.25;
   2.0  5.0 ;];

n = 2

output = praff(model, gmodel!, A, n)
RAFFOutput(1, [1.0, 0.999996], 6, 8, 4.0205772365906425e-11, [6])

The true effectiveness of parallelism occurs when option MAXMS is set, which changes the number of random initial points that are tried for each subproblem solved. Better solutions can be achieved with higher values of MAXMS

n = 2

output = praff(model, gmodel!, A, n; MAXMS=1000)
RAFFOutput(1, [1.0, 1.0], 7, 8, 5.134133698545651e-13, [6])