API

Summary

There are four main RAFF structures:

  1. Main functions: directly called by user;
  2. Auxiliary functions: used like internal auxiliary functions;
  3. Random generation: used to generate random sets of data, in order to test RAFF
  4. Output type: type defined to manipulate output information.

Main functions

RAFF.lmlovoFunction.
lmlovo(model::Function [, θ::Vector{Float64} = zeros(n)], data::Array{Float64, 2},
       n::Int, p::Int [; kwargs...])

lmlovo(model::Function, gmodel!::Function [, θ::Vector{Float64} = zeros(n)],
       data::Array{Float64,2}, n::Int, p::Int [; MAXITER::Int=200,
       ε::Float64=10.0^-4])

Fit the n-parameter model model to the data given by matrix data. The strategy is based on the LOVO function, which means that only p (0 < p <= rows of data) points are trusted. The Levenberg-Marquardt algorithm is implemented in this version.

Matriz data is the data to be fit. This matrix should be in the form

x11 x12 ... x1N y1
x21 x22 ... x2N y2
:

where N is the dimension of the argument of the model (i.e. dimension of x).

If θ is provided, then it is used as the starting point.

The signature of function model should be given by

model(x::Union{Vector{Float64}, SubArray}, θ::Vector{Float64})

where x are the variables and θ is a n-dimensional vector of parameters. If the gradient of the model gmodel!

gmodel! = (g::SubArray, x::Union{Vector{Float64}, SubArray},
           θ::Vector{Float64})

is not provided, then the function ForwardDiff.gradient! is called to compute it. Note that this choice has an impact in the computational performance of the algorithm. In addition, if ForwardDiff.jl is being used, then one MUST remove the signature of vector θ from function model.

The optional arguments are

  • MAXITER: maximum number of iterations
  • ε: tolerance for the gradient of the function

Returns a RAFFOutput object.

source
RAFF.raffFunction.
raff(model::Function, data::Array{Float64, 2}, n::Int; kwargs...)

raff(model::Function, gmodel!::Function, data::Array{Float64, 2},
    n::Int; MAXMS::Int=1, SEEDMS::Int=123456789,
    initguess::Vector{Float64}=zeros(Float64, n),
    ε::Float64=1.0e-4, noutliers::Int=-1, ftrusted::Union{Float64,
    Tuple{Float64, Float64}}=0.5)

Robust Algebric Fitting Function (RAFF) algorithm. This function uses a voting system to automatically find the number of trusted data points to fit the model.

  • model: function to fit data. Its signature should be given by

    model(x, θ)

    where x is the multidimensional argument and θ is the n-dimensional vector of parameters

  • gmodel!: gradient of the model function. Its signature should be given by

    gmodel!(g, x, θ)

    where x is the multidimensional argument, θ is the n-dimensional vector of parameters and the gradient is written in g.

  • data: data to be fit. This matrix should be in the form

    x11 x12 ... x1N y1
    x21 x22 ... x2N y2
    :

    where N is the dimension of the argument of the model (i.e. dimension of x).

  • n: dimension of the parameter vector in the model function

The optional arguments are

  • MAXMS: number of multistart points to be used
  • SEEDMS: integer seed for random multistart points
  • initialguess: a good guess for the starting point and for generating random points in the multistart strategy
  • ε: gradient stopping criteria to lmlovo
  • noutliers: integer describing the maximum expected number of outliers. The default is half. Deprecated.
  • ftrusted: float describing the minimum expected percentage of trusted points. The default is half (0.5). Can also be a Tuple of the form (fmin, fmax) percentages of trusted points.

Returns a RAFFOutput object with the best parameter found.

source
RAFF.praffFunction.
praff(model::Function, data::Array{Float64, 2}, n::Int; kwargs...)

praff(model::Function, gmodel!::Function, data::Array{Float64, 2},
    n::Int; MAXMS::Int=1, SEEDMS::Int=123456789, batches::Int=1,
    initguess::Vector{Float64}=zeros(Float64, n),
    ε::Float64=1.0e-4, noutliers::Int=-1, ftrusted::Union{Float64,
    Tuple{Float64, Float64}}=0.5)

Multicore distributed version of RAFF. See the description of the raff function for the main (non-optional) arguments. All the communication is performed by channels.

This function uses all available local workers to run RAFF algorithm. Note that this function does not use Tasks, so all the parallelism is based on the Distributed package.

The optional arguments are

  • MAXMS: number of multistart points to be used
  • SEEDMS: integer seed for random multistart points
  • batches: size of batches to be send to each worker
  • initguess: starting point to be used in the multistart procedure
  • ε: stopping tolerance
  • noutliers: integer describing the maximum expected number of outliers. The default is half. Deprecated.
  • ftrusted: float describing the minimum expected percentage of trusted points. The default is half (0.5). Can also be a Tuple of the form (fmin, fmax) percentages of trusted points.

Returns a RAFFOutput object containing the solution.

source
set_raff_output_level(level::LogLevel)

Set the output level of raff and praff algorithms to the desired logging level. Options are (from highly verbose to just errors): Logging.Debug, Logging.Info, Logging.Warn and Logging.Error. The package Logging needs to be loaded.

Defaults to Logging.Error.

source
set_lm_output_level(level::LogLevel)

Set the output level of lmlovo algorithm to the desired logging level. Options are (from highly verbose to just errors): Logging.Debug, Logging.Info, Logging.Warn and Logging.Error. The package Logging needs to be loaded.

Defaults to Logging.Error.

source

Auxiliary functions

RAFF.voting_strategyFunction.
voting_strategy(model::Function, data::Array{Float64, 2}, sols::Vector{RAFFOutput}, pliminf::Int,
                plimsup::Int)

Utility function to compute the matrix representing the voting system used by RAFF.

It first applies a filtering strategy, to eliminate obvious local minima, then it calculates a magic threshold and constructs the distance matrix. The vector sols contains the solutions s_p, for p = pliminf, ... plimsup.

source
eliminate_local_min!(sols::Vector{RAFFOutput})

Check if the function value of the solution found by smaller values of p is not greater when compared with larger ones. This certainly indicates that a local minimizer was found by the smaller p.

source
RAFF.sort_fun!Function.

This function is an auxiliary function. It finds the p smallest values of vector V and brings them to the first p positions. The indexes associated with the p smallest values are stored in ind.

source
RAFF.update_bestFunction.
update_best(channel::RemoteChannel, bestx::SharedArray{Float64, 1})

Listen to a channel for results found by lmlovo. If there is an improvement for the objective function, the shared array bestx is updated.

Attention: There might be an unstable state if there is a process reading bestx while this function is updating it. This should not be a problem, since it is used as a starting point.

Attention 2: this function is currently out of use.

source
RAFF.consume_tqueueFunction.
function consume_tqueue(bqueue::RemoteChannel, tqueue::RemoteChannel,
                        squeue::RemoteChannel, model::Function, gmodel!::Function,
                        data::Array{Float64, 2}, n::Int, pliminf::Int,
                        plimsup::Int, MAXMS::Int, seedMS::MersenneTwister)

This function represents one worker, which runs lmlovo in a multistart fashion.

It takes a job from the RemoteChannel tqueue and runs lmlovo function to it. It might run using a multistart strategy, if MAXMS>1. It sends the best results found for each value obtained in tqueue to channel squeue, which will be consumed by the main process. All the other arguments are the same for praff function.

source
RAFF.check_and_closeFunction.
check_and_close(bqueue::RemoteChannel, tqueue::RemoteChannel,
                squeue::RemoteChannel, futures::Vector{Future};
                secs::Float64=0.1)

Check if there is at least one worker process in the vector of futures that has not prematurely finished. If there is no alive worker, close task, solution and best queues, tqueue, squeue and bqueue, respectively.

source
RAFF.check_ftrustedFunction.
check_ftrusted(ftrusted::Union{Float64, Tuple{Float64, Float64}}, np::Int)

Utility function to check ftrusted parameter in raff and praff. Throws an ErrorException if the percentage of trusted points is incorrect.

source
RAFF.interval_rand!Function.
interval_rand!(x::Vector{Float64},
    intervals::Vector{Tuple{Float64, Float64}})

Fill a vector x with uniformly distributed random numbers generated in the interval given by intervals. It is assumed that length(x) == length(intervals).

Throws an ErrorException if the dimension of x is smaller the dimension of intervals or if the intervals are invalid.

source

Random generation

generate_test_problems(datFilename::String, solFilename::String,
    model::Function, modelStr::String, n::Int, np::Int, p::Int;
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(n), std::Float64=200.0,
    out_times::Float64=7.0)

generate_test_problems(datFilename::String, solFilename::String,
    model::Function, modelStr::String, n::Int, np::Int, p::Int,
    cluster_interval::Tuple{Float64, Float64};
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(n), std::Float64=200.0,
    out_times::Float64=7.0)

Generate random data files for testing fitting problems.

  • datFilename and solFilename are strings with the name of the files for storing the random data and solution, respectively.

  • model is the model function and modelStr is a string representing this model function, e.g.

     model = (x, θ) -> θ[1] * x[1] + θ[2]
     modelStr = "(x, θ) -> θ[1] * x[1] + θ[2]"

    where vector θ represents the parameters (to be found) of the model and vector x are the variables of the model.

  • n is the number of parameters

  • np is the number of points to be generated.

  • p is the number of trusted points to be used in the LOVO approach.

If cluster_interval is provided, then generates outliers only in this interval.

Additional parameters:

  • xMin, xMax: interval for generating points in one dimensional tests Deprecated
  • x_interval: interval for generating points in one dimensional tests
  • θSol: true solution, used for generating perturbed points
  • std: standard deviation
  • out_times: deviation for outliers will be out_times * std.
source
get_unique_random_points(np::Int, npp::Int)

Choose exactly npp unique random points from a set containing np points. This function is similar to rand(vector), but does not allow repetitions.

If npp < np, returns all the np points. Note that this function is not very memory efficient, since the process of selecting unique elements involves creating several temporary vectors.

Return a vector with the selected points.

source
get_unique_random_points!(v::Vector{Int}, np::Int, npp::Int)

Choose exactly npp unique random points from a set containing np points. This function is similar to rand(vector), but does not allow repetitions.

If npp < np, returns all the np points. Note that this function is not very memory efficient, since the process of selecting unique elements involves creating several temporary vectors.

Return the vector v provided as argument filled with the selected points.

source
generate_noisy_data!(data::AbstractArray{Float64, 2},
    v::Vector{Int}, model::Function, n::Int, np::Int, p::Int;
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(Float64, n),
    std::Float64=200.0, out_times::Float64=7.0)

Random generate a fitting one-dimensional data problem, storing the data in matrix data and the outliers in vector v.

This function receives a model(x, θ) function, the number of parameters n, the number of points np to be generated and the number of trusted points p.

If the n-dimensional vector θSol is provided, then the exact solution will not be random generated. The interval [xMin, xMax] (deprecated) or x_interval for generating the values to evaluate model can also be provided.

It returns a tuple (data, θSol, outliers) where

  • data: (np x 3) array, where each row contains x and model(x, θSol).
  • θSol: n-dimensional vector with the exact solution.
  • outliers: the outliers of this data set
source
generate_noisy_data(model::Function, n::Int, np::Int, p::Int;
    x_interval::Tuple{Float64, Float64}=(-10.0, 10.0),
    θSol::Vector{Float64}=10.0 * randn(Float64, n),
    std::Float64=200.0, out_times::Float64=7.0)

generate_noisy_data(model::Function, n::Int, np::Int, p::Int,
    x_interval::Tuple{Float64, Float64})

generate_noisy_data(model::Function, n::Int, np::Int, p::Int,
    θSol::Vector{Float64}, x_interval::Tuple{Float64, Float64})

Random generate a fitting one-dimensional data problem.

This function receives a model(x, θ) function, the number of parameters n, the number of points np to be generated and the number of trusted points p.

If the n-dimensional vector θSol is provided, then the exact solution will not be random generated. The interval [xMin, xMax] (deprecated) or x_interval for generating the values to evaluate model can also be provided.

It returns a tuple (data, θSol, outliers) where

  • data: (np x 3) array, where each row contains x and model(x, θSol).
  • θSol: n-dimensional vector with the exact solution.
  • outliers: the outliers of this data set
source
generate_clustered_noisy_data!(data::Array{Float64, 2},
    v::Vector{Int}, model::Function, n::Int, np::Int, p::Int,
    x_interval::Tuple{Float64,Float64},
    cluster_interval::Tuple{Float64, Float64}; kwargs...)

Generate a test set with clustered outliers. This version overwrites the content of (np x 3) matrix data and vector v with integer indices to the position of outliers in data.

The arguments and optional arguments are the same for generate_noisy_data!, with exception of tuple cluster_interval which is the interval to generate the clustered outliers.

It returns a tuple (data, θSol, outliers) where

  • data: (np x 3) array, where each row contains x and model(x, θSol). The same array given as argument
  • θSol: n-dimensional vector with the exact solution.
  • outliers: the outliers of this data set. The same vector given as argument.
source
generate_clustered_noisy_data(model::Function, n::Int, np::Int,
    p::Int, x_interval::Tuple{Float64,Float64},
    cluster_interval::Tuple{Float64, Float64}; kwargs...)

generate_clustered_noisy_data(model::Function, n::Int,
    np::Int, p::Int, θSol::Vector{Float64},
    x_interval::Tuple{Float64,Float64},
    cluster_interval::Tuple{Float64, Float64}; kwargs...)

Generate a test set with clustered outliers.

The arguments and optional arguments are the same for generate_noisy_data!, with exception of tuple cluster_interval which is the interval to generate the clustered outliers.

It returns a tuple (data, θSol, outliers) where

  • data: (np x 3) array, where each row contains x and model(x, θSol). The same array given as argument
  • θSol: n-dimensional vector with the exact solution.
  • outliers: the outliers of this data set. The same vector given as argument.
source
RAFF.model_listConstant.

This dictionary represents the list of models used in the generation of random tests. Return the tuple (n, model, model_str), where

  • n is the number of parameters of the model
  • model is the model of the form m(x, θ), where x are the variables and θ are the parameters
  • model_str is the string representing the model, used to build random generated problems
source

Output type

RAFF.RAFFOutputType.

This type defines the output file for the RAFF algorithm.

RAFFOutput(status::Int, solution::Vector{Float64}, iter::Int,
           p::Int, f::Float64, nf::Int, nj::Int, outliers::Vector{Int})

where

  • status: is 1 if converged and 0 if not

  • solution: vector with the parameters of the model

  • iter: number of iterations up to convergence

  • p: number of trusted points

  • f: the residual value

  • nf: number of function evaluations

  • nj: number of Jacobian evaluations

  • outliers: the possible outliers detected by the method, for the given p

    RAFFOutput()

Creates a null version of output, equivalent to RAFFOutput(0, [], -1, 0, Inf, -1, -1, [])

RAFFOuput(p::Int)
RAFFOuput(sol::Vector{Float64}, p::Int)

Creates a null version of output for the given p and a null version with the given solution, respectively.

source