API · RAFF- Robust Algebraic Fitting Function

Summary

There are four main RAFF structures:

Main functions: directly called by user;
Auxiliary functions: used like internal auxiliary functions;
Random generation: used to generate random sets of data, in order to test RAFF
Output type: type defined to manipulate output information.

Main functions

RAFF.lmlovo — Function.

lmlovo(model::Function [, θ::Vector{Float64} = zeros(n)], data::Array{Float64, 2},
       n::Int, p::Int [; kwargs...])

lmlovo(model::Function, gmodel!::Function [, θ::Vector{Float64} = zeros(n)],
       data::Array{Float64,2}, n::Int, p::Int [; MAXITER::Int=200,
       ε::Float64=10.0^-4])

Fit the n-parameter model model to the data given by matrix data. The strategy is based on the LOVO function, which means that only p (0 < p <= rows of data) points are trusted. The Levenberg-Marquardt algorithm is implemented in this version.

Matriz data is the data to be fit. This matrix should be in the form

x11 x12 ... x1N y1
x21 x22 ... x2N y2
:

where N is the dimension of the argument of the model (i.e. dimension of x).

If θ is provided, then it is used as the starting point.

The signature of function model should be given by

model(x::Union{Vector{Float64}, SubArray}, θ::Vector{Float64})

where x are the variables and θ is a n-dimensional vector of parameters. If the gradient of the model gmodel!

gmodel! = (g::SubArray, x::Union{Vector{Float64}, SubArray},
           θ::Vector{Float64})

is not provided, then the function ForwardDiff.gradient! is called to compute it. Note that this choice has an impact in the computational performance of the algorithm. In addition, if ForwardDiff.jl is being used, then one MUST remove the signature of vector θ from function model.

The optional arguments are

MAXITER: maximum number of iterations
ε: tolerance for the gradient of the function

Returns a RAFFOutput object.

RAFF.gnlslovo — Function.

gnlslovo(model, gmodel!, θ, data::Array{T, 2}, n, p;
         ε::Number=1.0e-4, MAXITER=400, αls=2.0, dinc=2.0,
         MAXLSITER=100) where {T<:Float64}

gnlslovo(model, θ::Vector{Float64}, data::Array{Float64,2},
         n::Int, p::Int; kwargs...)

gnlslovo(model, gmodel!, data::Array{Float64,2}, n::Int,
         p::Int; kwargs...)

gnlslovo(model, data::Array{Float64,2}, n::Int, p::Int; kwargs...)

LOVO Gauss-Newton with line-search described in

R. Andreani, G. Cesar, R. M. Cesar-Jr., J. M. Martínez, and P. J. S. Silva, “Efficient curve detection using a {Gauss-Newton} method with applications in agriculture,” in Proc. 1st International Workshop on Computer Vision Applications for Developing Regions in Conjunction with ICCV 2007-CVDR-ICCV07, 2007.

Fit the n-parameter model model to the data given by matrix data. The strategy is based on the LOVO function, which means that only p (0 < p <= rows of data) points are trusted.

Matriz data is the data to be fit. This matrix should be in the form

x11 x12 ... x1N y1
x21 x22 ... x2N y2
:

where N is the dimension of the argument of the model (i.e. dimension of x).

If θ is provided, then it is used as the starting point.

The signature of function model should be given by

model(x::Union{Vector{Float64}, SubArray}, θ::Vector{Float64})

where x are the variables and θ is a n-dimensional vector of parameters. If the gradient of the model gmodel!

gmodel! = (g::SubArray, x::Union{Vector{Float64}, SubArray},
           θ::Vector{Float64})

is not provided, then the function ForwardDiff.gradient! is called to compute it. Note that this choice has an impact in the computational performance of the algorithm. In addition, if ForwardDiff.jl is being used, then one MUST remove the signature of vector θ from function model.

The optional arguments are

MAXITER: maximum number of iterations
ε: tolerance for the gradient of the function
αls: number >1 to increase/decrease the parameter t in line-search
dinc: number >1 to increase the diagonal of the J^T J matrix in order to escape from singularity
MAXLSITER: maximum number of Linear System increases in diagonal before exiting. Also defines the maximum number of Line Search trials to satisfy Armijo (but does not exit in such case)

Returns a RAFFOutput object.

RAFF.raff — Function.

raff(model::Function, data::Array{Float64, 2}, n::Int; kwargs...)

raff(model::Function, gmodel!::Function, data::Array{Float64, 2},
    n::Int; MAXMS::Int=1, SEEDMS::Int=123456789,
    initguess::Vector{Float64}=zeros(Float64, n),
    noutliers::Int=-1, ftrusted::Union{Float64,
    Tuple{Float64, Float64}}=0.5,
    inner_solver::Function=lmlovo, inner_solver_params...)

Robust Algebric Fitting Function (RAFF) algorithm. This function uses a voting system to automatically find the number of trusted data points to fit the model.

model: function to fit data. Its signature should be given by
```
model(x, θ)
```
where x is the multidimensional argument and θ is the n-dimensional vector of parameters
gmodel!: gradient of the model function. Its signature should be given by
```
gmodel!(g, x, θ)
```
where x is the multidimensional argument, θ is the n-dimensional vector of parameters and the gradient is written in g.
data: data to be fit. This matrix should be in the form
```
x11 x12 ... x1N y1
x21 x22 ... x2N y2
:
```
where N is the dimension of the argument of the model (i.e. dimension of x).
n: dimension of the parameter vector in the model function

The optional arguments are

MAXMS: number of multistart points to be used
SEEDMS: integer seed for random multistart points
initialguess: a good guess for the starting point and for generating random points in the multistart strategy
noutliers: integer describing the maximum expected number of outliers. The default is half. Deprecated.
ftrusted: float describing the minimum expected percentage of trusted points. The default is half (0.5). Can also be a Tuple of the form (fmin, fmax) percentages of trusted points.
inner_solver: solver to be used for the least square problems. By default, uses lmlovo. This function has the following mandatory parameters
```
inner_solver(model, gmodel!, θ, data, n, p;
             inner_solver_params...) = RAFFOutput
```
inner_solver_params...: the remaining parameters will be sent as optional arguments to the inner_solver

Returns a RAFFOutput object with the best parameter found.

RAFF.praff — Function.

praff(model::Function, data::Array{Float64, 2}, n::Int; kwargs...)

praff(model::Function, gmodel!::Function, data::Array{Float64, 2},
    n::Int; MAXMS::Int=1, SEEDMS::Int=123456789, batches::Int=1,
    initguess::Vector{Float64}=zeros(Float64, n),
    noutliers::Int=-1, ftrusted::Union{Float64,
    Tuple{Float64, Float64}}=0.5,
    inner_solver::Function=lmlovo, inner_solver_params...)

Multicore distributed version of RAFF. See the description of the raff function for the main (non-optional) arguments. All the communication is performed by channels.

This function uses all available local workers to run RAFF algorithm. Note that this function does not use Tasks, so all the parallelism is based on the Distributed package.

The optional arguments are

MAXMS: number of multistart points to be used
SEEDMS: integer seed for random multistart points
batches: size of batches to be send to each worker
initguess: starting point to be used in the multistart procedure
noutliers: integer describing the maximum expected number of outliers. The default is half. Deprecated.
ftrusted: float describing the minimum expected percentage of trusted points. The default is half (0.5). Can also be a Tuple of the form (fmin, fmax) percentages of trusted points.
inner_solver: solver to be used for the least square problems. By default, uses lmlovo. This function has the following mandatory parameters
```
inner_solver(model, gmodel!, θ, data, n, p;
             inner_solver_params...) = RAFFOutput
```
inner_solver_params...: the remaining parameters will be sent as optional arguments to the inner_solver

Returns a RAFFOutput object containing the solution.

RAFF.set_raff_output_level — Function.

set_raff_output_level(level::LogLevel)

Set the output level of raff and praff algorithms to the desired logging level. Options are (from highly verbose to just errors): Logging.Debug, Logging.Info, Logging.Warn and Logging.Error. The package Logging needs to be loaded.

Defaults to Logging.Error.

RAFF.set_lm_output_level — Function.

set_lm_output_level(level::LogLevel)

Set the output level of lmlovo algorithm to the desired logging level. Options are (from highly verbose to just errors): Logging.Debug, Logging.Info, Logging.Warn and Logging.Error. The package Logging needs to be loaded.

Defaults to Logging.Error.

Auxiliary functions

RAFF.voting_strategy
RAFF.eliminate_local_min!
RAFF.sort_fun!
RAFF.update_best
RAFF.consume_tqueue
RAFF.check_and_close
RAFF.check_ftrusted
RAFF.interval_rand!

Random generation

RAFF.generate_test_problems — Function.

generate_test_problems(datFilename::String, solFilename::String,
    model::Function, modelStr::String, n::Int, np::Int, p::Int;
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(n), std::Float64=200.0,
    out_times::Float64=7.0)

generate_test_problems(datFilename::String, solFilename::String,
    model::Function, modelStr::String, n::Int, np::Int, p::Int,
    cluster_interval::Tuple{Float64, Float64};
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(n), std::Float64=200.0,
    out_times::Float64=7.0)

Generate random data files for testing fitting problems.

datFilename and solFilename are strings with the name of the files for storing the random data and solution, respectively.
model is the model function and modelStr is a string representing this model function, e.g.
```
 model = (x, θ) -> θ[1] * x[1] + θ[2]
 modelStr = "(x, θ) -> θ[1] * x[1] + θ[2]"
```
where vector θ represents the parameters (to be found) of the model and vector x are the variables of the model.
n is the number of parameters
np is the number of points to be generated.
p is the number of trusted points to be used in the LOVO approach.

If cluster_interval is provided, then generates outliers only in this interval.

Additional parameters:

xMin, xMax: interval for generating points in one dimensional tests Deprecated
x_interval: interval for generating points in one dimensional tests
θSol: true solution, used for generating perturbed points
std: standard deviation
out_times: deviation for outliers will be out_times * std.

RAFF.get_unique_random_points — Function.

get_unique_random_points(np::Int, npp::Int)

Choose exactly npp unique random points from a set containing np points. This function is similar to rand(vector), but does not allow repetitions.

If npp < np, returns all the np points. Note that this function is not very memory efficient, since the process of selecting unique elements involves creating several temporary vectors.

Return a vector with the selected points.

RAFF.get_unique_random_points! — Function.

get_unique_random_points!(v::Vector{Int}, np::Int, npp::Int)

Choose exactly npp unique random points from a set containing np points. This function is similar to rand(vector), but does not allow repetitions.

If npp < np, returns all the np points. Note that this function is not very memory efficient, since the process of selecting unique elements involves creating several temporary vectors.

Return the vector v provided as argument filled with the selected points.

RAFF.generate_noisy_data! — Function.

generate_noisy_data!(data::AbstractArray{Float64, 2},
    v::Vector{Int}, model::Function, n::Int, np::Int, p::Int;
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(Float64, n),
    std::Float64=200.0, out_times::Float64=7.0)

Random generate a fitting one-dimensional data problem, storing the data in matrix data and the outliers in vector v.

This function receives a model(x, θ) function, the number of parameters n, the number of points np to be generated and the number of trusted points p.

If the n-dimensional vector θSol is provided, then the exact solution will not be random generated. The interval [xMin, xMax] (deprecated) or x_interval for generating the values to evaluate model can also be provided.

It returns a tuple (data, θSol, outliers) where

data: (np x 3) array, where each row contains x and model(x, θSol).
θSol: n-dimensional vector with the exact solution.
outliers: the outliers of this data set

RAFF.generate_noisy_data — Function.

generate_noisy_data(model::Function, n::Int, np::Int, p::Int;
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(Float64, n),
    std::Float64=200.0, out_times::Float64=7.0)

generate_noisy_data(model::Function, n::Int, np::Int, p::Int,
    x_interval::Tuple{Float64, Float64})

generate_noisy_data(model::Function, n::Int, np::Int, p::Int,
    θSol::Vector{Float64}, x_interval::Tuple{Float64, Float64})

Random generate a fitting one-dimensional data problem.

This function receives a model(x, θ) function, the number of parameters n, the number of points np to be generated and the number of trusted points p.

If the n-dimensional vector θSol is provided, then the exact solution will not be random generated. The interval [xMin, xMax] (deprecated) or x_interval for generating the values to evaluate model can also be provided.

It returns a tuple (data, θSol, outliers) where

data: (np x 3) array, where each row contains x and model(x, θSol).
θSol: n-dimensional vector with the exact solution.
outliers: the outliers of this data set

RAFF.generate_clustered_noisy_data! — Function.

generate_clustered_noisy_data!(data::Array{Float64, 2},
    v::Vector{Int}, model::Function, n::Int, np::Int, p::Int,
    x_interval::Tuple{Float64,Float64},
    cluster_interval::Tuple{Float64, Float64}; kwargs...)

Generate a test set with clustered outliers. This version overwrites the content of (np x 3) matrix data and vector v with integer indices to the position of outliers in data.

The arguments and optional arguments are the same for generate_noisy_data!, with exception of tuple cluster_interval which is the interval to generate the clustered outliers.

It returns a tuple (data, θSol, outliers) where

data: (np x 3) array, where each row contains x and model(x, θSol). The same array given as argument
θSol: n-dimensional vector with the exact solution.
outliers: the outliers of this data set. The same vector given as argument.

RAFF.generate_clustered_noisy_data — Function.

generate_clustered_noisy_data(model::Function, n::Int, np::Int,
    p::Int, x_interval::Tuple{Float64,Float64},
    cluster_interval::Tuple{Float64, Float64}; kwargs...)

generate_clustered_noisy_data(model::Function, n::Int,
    np::Int, p::Int, θSol::Vector{Float64},
    x_interval::Tuple{Float64,Float64},
    cluster_interval::Tuple{Float64, Float64}; kwargs...)

Generate a test set with clustered outliers.

The arguments and optional arguments are the same for generate_noisy_data!, with exception of tuple cluster_interval which is the interval to generate the clustered outliers.

It returns a tuple (data, θSol, outliers) where

data: (np x 3) array, where each row contains x and model(x, θSol). The same array given as argument
θSol: n-dimensional vector with the exact solution.
outliers: the outliers of this data set. The same vector given as argument.

RAFF.generate_circle — Function.

generate_circle(dat_filename::String, np::Int, p::Int;
    std::Float64=0.1, θSol::Vector{Float64}=1.0*randn(Float64, 3),
    outTimes::Float64=3.0, interval=(rand(i)*2.0*π for i = 1:np))

Generate perturbed points in a circle given by θSol and save to dat_filename in RAFF format. Return the np x 4 matrix with data (the 4th column is 0 if the point is "correct") and a np - p integer vector containing the points selected to be outliers.

dat_filename is a String with the name of the file to store generated data.
np is the number of points to be generated.
p is the number of trusted points to be used in the LOVO approach.

Additional configuration parameters are

std: standard deviation.
θSol: true solution, used for generating perturbed points.
out_times: deviation for outliers will be out_times * std.
interval: any iterable object containing np numbers between 0 and 2π.

RAFF.generate_ncircle — Function.

generate_ncircle(dat_filename::String,np::Int, p::Int;
  std::Float64=0.1, θSol::Vector{Float64}=10.0*randn(Float64, 3),
  interval=(rand()*2.0*π for i = 1:np))

Generate perturbed points and uniform noise in a square containing the circle given by θSol and save data to dat_filename in RAFF format. Return the np x 4 matrix with data (the 4th column is 0 if the point is "correct") and a np - p integer vector containing the points selected to be outliers.

dat_filename is a String with the name of the file to store generated data.
np is the number of points to be generated.
p is the number of trusted points to be used in the LOVO approach.

Additional configuration parameters are

std: standard deviation.
θSol: true solution, used for generating perturbed points.
interval: any iterable object containing np numbers between 0 and 2π.
leftd: number of times the radius of the circle that will be used for computing the lower left corner of the square for generation of the random noise
lngth: number of times the radius of the circle that will be used for computing the side of the square for generation of the random noise

RAFF.generate_image_circle — Function.

generate_image_circle(dat_filename::String, w::Int, h::Int,
    np::Int, p::Int; std=0.1,
    θSol::Vector{Float64}=10.0*randn(Float64, 3),
    interval=(rand()*2.0*π for i = 1:p), thck::Int=2,
    funcsize=min(w, h))

Generate perturbed points and uniform noise in a wxh image containing the circle given by θSol and save data to dat_filename in RAFF format. Return the 0-1 matrix representing the black and white image generate.

dat_filename is a String with the name of the file to store generated data.
w and h are the dimensions of the image
np is the number of points to be generated.
p is the number of trusted points to be used in the LOVO approach.

Additional configuration parameters are

std: standard deviation.
θSol: true solution, used for generating perturbed points.
interval: any iterable object containing np numbers between 0 and 2π.
thck: thickness of the point in the image
funcsize: size (in pixels) that the function will use in the image.

RAFF.generate_image_noisy_data — Function.

function generate_image_noisy_data(dat_filename::String,
w::Int, h::Int, model::Function, n::Int, np::Int, p::Int;
x_interval::Tuple{Number, Number}=(-10.0, 10.0),
θSol::Vector{Float64}=10.0 * randn(Float64, n), std=2,
thck::Int=2, funcsize=min(w, h))

Create a file dat_filename with data information to detect model in a wxh image containing random uniform noise. Attention: this function only works with 1-dimensional models.

Return a black and white matrix representing the image.

The parameters are

dat_filename: name of the file to save data
w and h: dimension of the image
model: real-valued model given by a function model(x, θ)
n: dimension of the parameters of the model
np: number of points to be generated
p: number of trusted points that will define the correct points in the model

The function also accepts the following optional arguments:

x_interval: tuple representing the interval for the x variable
θSol: vector with the 'exact' parameters of the solution
std: error that will be added to the simulated 'correct' points
thck: thickness of the point in the image
funcsize: size (in pixels) that the function will use in the image.

RAFF.model_list — Constant.

This dictionary represents the list of models used in the generation of random tests. Return the tuple (n, model, model_str), where

n is the number of parameters of the model
model is the model of the form m(x, θ), where x are the variables and θ are the parameters
model_str is the string representing the model, used to build random generated problems

Output type

RAFF.RAFFOutput — Type.

This type defines the output file for the RAFF algorithm.

RAFFOutput(status::Int, solution::Vector{Float64}, iter::Int,
           p::Int, f::Float64, nf::Int, nj::Int, outliers::Vector{Int})

where

status: is 1 if converged and 0 if not
solution: vector with the parameters of the model
iter: number of iterations up to convergence
p: number of trusted points
f: the residual value
nf: number of function evaluations
nj: number of Jacobian evaluations
outliers: the possible outliers detected by the method, for the given p
RAFFOutput()

Creates a null version of output, equivalent to RAFFOutput(0, [], -1, 0, Inf, -1, -1, [])

RAFFOuput(p::Int)
RAFFOuput(sol::Vector{Float64}, p::Int)

Creates a null version of output for the given p and a null version with the given solution, respectively.