Tutorial
Installation
This package is supported just for Julia version 1.0. Consequently, it uses package 3.0. Currently RAFF ins't in Metadata.jl, so the package can be installed with the Julia package manager. From the Julia REPL, type ]
to enter into Pkg REPL mode and run:
pkg> add https://github.com/fsobral/RAFF.jl#dev
Basic usage
Just to ilustrate the potential and basic usage of RAFF, let us consider the following dataset given by an array $A$:
Lets suppose the first column of $A$ as an experimental measure with result given by second column. It is easy to see that the fitting function in this case is accurate and given by
Now lets perturb one result of second column of $A$. For example, consider $A_{6,2}=2.55$. Assuming the model for fitting given by
we have by classical least square as result x=[0.904329,1.3039]
. But when we consider RAFF algorithm we obtain the correct answer x=[1.0,1.0]
. Moreover, we have the number of possible outliers and which they are.
In order to get run RAFF algorithm we need to setup
julia> using RAFF
and define the dataset and model:
julia> A=[-2.0 5.0;
-1.5 3.25;
-1.0 2.0 ;
-0.5 1.25;
0.0 1.0 ;
0.5 2.55;
1.0 2.0 ;
1.5 3.25;
2.0 5.0 ;]
9×2 Array{Float64,2}:
-2.0 5.0
-1.5 3.25
-1.0 2.0
-0.5 1.25
0.0 1.0
0.5 2.55
1.0 2.0
1.5 3.25
2.0 5.0
julia> model(x,t)=x[1]*t[1]^2+x[2]
model (generic function with 1 method)
After that, we can run raff:
julia> raff(model,A,2)
RAFFOutput(1, [0.999999, 1.0], 6, 8, 3.672268261041584e-11, [6])
The number 2
above is the number of variables in model. The output is a RAFFoutput type
, consequently is possible to handle with some atributes of this type. For example to acess only the parameters of solution, we can use
julia> output = raff(model,A,2)
RAFFOutput(1, [0.999999, 1.0], 6, 8, 3.672268261041584e-11, [6])
julia> output.solution
2-element Array{Float64,1}:
0.999998774479423
1.000003442880683
Note that RAFF algorithm
detects and ignores possible outliers. In order to see which points are outiliers, we can acess the outliers
atribute.
julia> output.outliers
1-element Array{Int64,1}:
6
More details about RAFFoutput type
and others options can be obtained in API section.
By default RAFF
uses automatic differentiation, more specifically ForwardDiff package. But is possible to define and handle RAFF
with gradient vector of model. For example, considering the above example, we have,
Programming this gradient and run raff we have
julia> gmodel(x,t,g)=begin
g[1]=t[1]^2
g[2]=1.0
return g
end
gmodel (generic function with 1 method)
julia> raff(model,gmodel,A,2)
RAFFOutput(1, [0.999999, 1.0], 6, 8, 3.672268261041584e-11, [6])
Multivariate models
RAFF
supports fits to data sets of different sizes. To illustrate how this works, consider the following example:
julia> data = [1.0 1.0 2.0
0.0 0.0 4.0
7.0 1.5 -4.5
2.0 2.0 -17.0 # outlier
0.0 8.6 -4.6]
5×3 Array{Float64,2}:
1.0 1.0 2.0
0.0 0.0 4.0
7.0 1.5 -4.5
2.0 2.0 -17.0
0.0 8.6 -4.6
and the following model
julia> model(x, t) = x[1]*t[1] + x[2] * t[2] + x[3]
model (generic function with 1 method)
Note that this model has two variables $(t_1,t_2)$. Naturally, this problem has one outlier (data[4,:]
), so there are 4 trust points. Let's run RAFF
and check the answer.
julia> output = raff(model,data, 3)
RAFFOutput(1, [-0.999999, -0.999999, 4.0], 7, 4, 2.9377653725356374e-11, [4])
The right answer is [- 1.0, - 1.0, 4.0]
. As we can note RAFF
get a good fit for the data set.
Handling the output follows the same pattern as the one-dimensional case.
In order to get improvements in processing time, we can code the gradient vector of model to.
julia> gmodel(x, t,g) = begin
g[1] = t[1]
g[2] = t[2]
g[3] = 1.0
return g
end
gmodel (generic function with 1 method)
julia> output = raff(model,data, 3)
RAFFOutput(1, [-0.999999, -0.999999, 4.0], 7, 4, 2.9377653725356374e-11, [4])
Changing some options
Naturally, RAFF
has options like precision of gradient stopping criteria and initial guess.
julia> output = raff(model,data, 3;initguess=[0.5,0.5,0.5],ε=1.0e-4)
RAFFOutput(1, [-0.999999, -0.999999, 4.0], 7, 4, 2.9377653725356374e-11, [4])
RAFF is based on an optimization method. In this way, it is subject to stop at stationary points that are not global minimizers. For this reason, heuristics were implemented to find global minimizers. Such heuristics depend on random number generation. So if you want to run tests with more reliability this can be a useful strategy. To define in RAFF, say, 1000 different starting points, is enough to redefine the keyword argument MAXMS
.
julia> output = raff(model,data, 3;MAXMS=1,initguess=[0.5,0.5,0.5],ε=1.0e-10)
RAFFOutput(1, [-1.0, -1.0, 4.0], 10, 4, 5.2480273340321425e-25, [4])
Parallel running
RAFF
can be run in a parallel or distributed environment, using the Distributed package and function praff
. Let's use praff
to solve the same problem from the beginning before. First, the Distributed has to be loaded and the number of workers is added. It is also possible to add the address of other machines.
using Distributed
addprocs(3) # Add 3 worker processes
This step can be replaced if Julia is initialized with the -p
option
julia -p 3
Now we have to load RAFF
and the fit function in all workers:
@everywhere using RAFF
@everywhere function gmodel!(x, t, g)
g[1] = t[1]^2
g[2] = 1.0
end
@everywhere function model(x, t)
x[1] * t[1]^2 + x[2]
end
then, we call praff
to solve the problem (note that we do not need to send the A
matrix to all workers, since it will be sent by praff
).
A=[-2.0 5.0;
-1.5 3.25;
-1.0 2.0 ;
-0.5 1.25;
0.0 1.0 ;
0.5 2.55;
1.0 2.0 ;
1.5 3.25;
2.0 5.0 ;];
n = 2
output = praff(model, gmodel!, A, n)
RAFFOutput(1, [1.0, 0.999996], 6, 8, 4.0205772365906425e-11, [6])
The true effectiveness of parallelism occurs when option MAXMS
is set, which changes the number of random initial points that are tried for each subproblem solved. Better solutions can be achieved with higher values of MAXMS
n = 2
output = praff(model, gmodel!, A, n; MAXMS=1000)
RAFFOutput(1, [1.0, 1.0], 7, 8, 5.134133698545651e-13, [6])