The core object coordinating a syberia engine is a
syberia_engine
R6
object.
syberia_engine_class <- R6::R6Class("syberia_engine",
portable = TRUE,
We inherit directly from the director
base class.
inherit = getFromNamespace("director_", "director"), #environment(director::director)$director_,
public = list(
Each syberia engine can include more dependent engines.
This forms a graph of engines. However, child engines are allowed
to be shared, so the final structure is usually a DAG, not a tree.
The parent node is stored in .parent
and the children
(immediate dependent engines) are stored in .engines
.
.parent = NULL,
.engines = list(),
.set_parent = function(parent) { self$.parent <<- parent },
Registering an engine is as simple adding it to the list of
.engines
. If we are mounting it, we register its parent
node as the current engine.
register_engine = function(name, engine, mount = FALSE) {
stopifnot(is(engine, "syberia_engine"))
self$.engines[[name]] <<- list(engine = engine, mount = isTRUE(mount))
if (isTRUE(mount)) engine$.set_parent(self)
},
The vanilla director
object that the syberia_engine
inherits
from has a resource
method: namely, take an R file and compile it
according to some preprocessor and parser (see the director package
for a more thorough explanation).
The syberia_engine$resource
method takes a different approach.
Since an engine may itself have more engines, resources can come
from a whole tree of engines. Typically, it would be sufficient
to check the engine itself and its children when looking for a
resource. However, Syberia takes a different approach. To make
it possible to pull out arbitrary subsets of files and turn them
into an engine, Syberia first traverses up to the parent to
see if it has overwritten any of the resources.
This leads to some multiple inheritance problems and is a tricky course to navigate, but the end result is that the user is completely transparent to the machinery that goes into finding resources under the hood. It is possible to extract a collection of controllers and resources from any top-level syberia project (as long as they are self contained and include each other's dependencies) and transform that into an engine simply by copying the files, while retaining the power provided by controllers insofar as giving each directory structure its own meaning.
resource = function(name, ..., parent. = TRUE, children. = TRUE,
exclude. = NULL, defining_environment. = parent.frame(),
engine) {
if (missing(engine)) {
# If we do not force right away, parent.frame() will be pointing to
# the wrong place. Everything in R is a promise...
force(defining_environment.)
dots <- list(...)
We use a tree traversal helper defined later in this file:
first, it checks the parent exists (if one exists). This
will recursively call its parent's resource
method. To avoid
infinite loops, we have to be careful about excluding the
current engine from traversal. If the parent engine reports
it owns the resource, we return the on_parent
action.
Otherwise, if the current engine has the resource, we return that instead. Finally, if neither the parent engine nor self own the resource, we browse through the children.
In the event that none of the engines have the resource, we trigger the self engine to resource the object so that we get an error that the resource was not found.
private$traverse_tree(parent = parent., children = children.,
exclude = exclude., exists_args = list(name),
on_parent = self$.parent$resource(name, ...,
exclude. = c(list(self$root()), exclude.),
defining_environment. = defining_environment.),
on_self = super$resource(name, ...,
defining_environment. = defining_environment.),
on_child = function(engine) {
do.call(engine$resource, c(list(name), dots, list(
parent. = FALSE, children. = TRUE,
exclude. = c(engine$root(), exclude.),
defining_environment. = defining_environment.)
))
},
otherwise = super$resource(name, ...,
defining_environment. = defining_environment.)
)
} else { # if (!missing(engine))
It is possible to “pluck” a resource explicitly out of a utility engine
using the engine
parameter (if the engine is not mounted).
stopifnot(is.character(engine))
engine_name <- engine
engine <- private$sanitize_engine(engine)
if (!engine$exists(name, ..., parent. = FALSE)) {
stop("No resource ", sQuote(name), " exists in engine ", sQuote(engine_name), ".")
}
Extract the resource directly from the utility engine.
return(engine$resource(name, ..., parent. = FALSE))
}
},
Finding resources (as opposed to compiling them) follows a similar
pattern: we first check the parent, then self, then the children.
However, unlike $resource
, the $find
method must traverse
all of them and take their union, similarly to the difference
between calling Reduce
with and without the accumulate
parameter. In some sense,
the find operation is a tree traversal pattern with the action of
“find the resources matching these arguments” and a final
reduce step.
find = function(..., parent. = FALSE, children. = FALSE, exclude. = NULL,
check_duplicates. = FALSE, tag_engine. = FALSE) {
resources <- private$traverse_tree(parent = parent., children = children.,
exclude = exclude., accumulate = TRUE,
on_parent = self$.parent$find(..., parent. = TRUE, children. = TRUE,
exclude. = c(list(self$root()), exclude.), tag_engine. = tag_engine.),
on_self = {
This is a bit of a subtle point. We tag the resources with the name of the engine so that we can later avoid some problems related to multiple inheritance
self_resources <- super$find(...)
if (is.character(tag_engine.)) {
`names<-`(self_resources, rep(tag_engine., length(self_resources)))
} else {
self_resources
}
},
on_child = function(engine, name) {
tag <- if (!identical(tag_engine., FALSE)) name else FALSE
engine$find(..., parent. = FALSE,
children. = TRUE, exclude. = exclude., tag_engine. = tag)
}
)
if (isTRUE(children.)) {
if (isTRUE(check_duplicates.)) {
If we are checking child engines, we take this opportunity to
ensure none of the mounted engines have shared resources.
For example, if two engines both have a foo/bar.R
file, this
would cause terrible conflicts related to multiple inheritance:
if the engine1 asked for foo/bar.R
, then since the tree
traversal asks the parent firsts, which asks its non-engine1
children, then engine1 would receive foo/bar.R
from engine2.
Conversely, engine2 would receive engine1's foo/bar.R
, probably
leading to all sorts of perverse bugs. In the future it may
be possible to specify canonical preferences in configuration
but for now we disable shared resources all together:
All engines must be disjoint.
(Except insofar as they share resources from a common child engine, as in a diamond graph.)
resources[[length(resources)]] <- Filter(Negate(is.null), resources[[length(resources)]])
detect_duplicate_resources(resources[[length(resources)]])
}
resources[[length(resources)]] <- unname(resources[[length(resources)]])
}
We revert the list to ensure children resources mask self
and parent resources (or, for example, we would get a conflict
in a simple diamond with two engines depending on a base
engine, which has config/application.R
and config/engines.R
).
resources <- c(recursive = TRUE, rev(resources))
Using unique
here would drop names!
resources[!duplicated(resources)]
},
exists = function(resource, ..., parent. = TRUE, children. = TRUE, exclude. = NULL) {
Check if a resource exists is a straightforward tree traversal
pattern: check the parent, then self, then the children,
and return TRUE
as soon as any find them. Otherwise, return FALSE
.
private$traverse_tree(parent = parent., children = children.,
exclude = exclude., exists_args = list(resource, ...),
on_parent = TRUE, on_self = TRUE,
on_child = function(engine) TRUE, otherwise = FALSE)
}
),
This forms a collection of private helper methods.
private = list(
has_parent = (has_parent <- function() { !is.null(self$.parent) }),
is_root = Negate(has_parent),
mounted_engines = function() {
Filter(function(e) isTRUE(e$mount), self$.engines)
},
We convert an engine from string representation to a
syberia_engine
object.
sanitize_engine = function(engine, utility = TRUE) {
if (!is.simple_string(engine)) {
stop(m("sanitize_engine_class"), call. = FALSE)
}
if (!is.element(engine, names(self$.engines))) {
stop(m("sanitize_engine_no_engine", engine = engine), call. = FALSE)
}
engine_obj <- self$.engines[[engine]]$engine
if (isTRUE(utility) && isTRUE(engine_obj$mount)) {
stop(m("sanitize_engine_mounting_conflict", engine = engine), call. = FALSE)
}
engine_obj
},
The master helper that defines $find
, $resource
, and $exists
.
A little many arguments for my usual taste, but they make sense here:
we specify whether we would like to operate on the parent
or children
first as scalar bools, followed by expressions for on_parent
and on_self
and a function for on_child
. We use a function for the latter since it
may operate on multiple children whereas the parent and self are singletons.
If the on_child
function has arity 1 it receives the syberia_engine
object as its argument. On arity 2, it also receives the name.
Finally, we can specify which arguments to pass to director$exists
when checking for resource existence as well as which engines to exclude.
If this is missing, resources will not be checked for existence prior
to running parent, self, or child actions.
Finally, if accumulate = TRUE
, we return a list of the three results
(parent, self, children) with a list of multiple results per child as
the third element.
traverse_tree = function(parent, children, on_parent, on_self, on_child, otherwise,
exists_args, exclude, accumulate = FALSE) {
record <- if (isTRUE(accumulate)) {
result <- list()
function(value, subindex) {
if (missing(subindex)) {
result[[length(result) + 1]] <<- value
} else {
result[[length(result)]][[subindex]] <<- value
}
}
} else { `return` }
check_exists <- !missing(exists_args)
if (check_exists) {
full_exists_args <- c(exists_args, exclude. = c(list(self$root())),
parent. = TRUE, children. = TRUE)
}
if (isTRUE(parent) && private$has_parent()) {
A little tricky here: If we do not check for existence, always
run the on_parent
action, otherwise only check it if the parent
engine has the resource.
if (!check_exists || do.call(self$.parent$exists, full_exists_args)) {
record(eval.parent(substitute(on_parent)))
}
}
Similarly, if we do not check for existence, always
run the on_self
action, otherwise only check it if the current
engine has the resource.
if (!check_exists || do.call(super$exists, exists_args)) {
record(eval.parent(substitute(on_self)))
}
if (isTRUE(children)) {
The third element returned by the tree traversal when
accumulate = TRUE
will be a list of children.
if (isTRUE(accumulate) && length(formals(on_child)) > 1L) {
record(list())
}
if (check_exists) {
No longer ask the parent during existence checks (or we would have infinite loops!).
full_exists_args$parent. <- FALSE
}
engines <- private$mounted_engines()
for (i in seq_along(engines)) {
name <- names(engines)[i]
engine <- engines[[i]]$engine
if (!any(vapply(exclude, should_exclude, logical(1), engine))) {
if (!check_exists || do.call(engine$exists, full_exists_args)) {
if (length(formals(on_child)) == 1L) {
record(on_child(engine))
} else {
record(on_child(engine, name), name)
}
}
}
}
}
if (isTRUE(accumulate)) {
result
} else {
Note that if accumulate = FALSE
, we should never get
here if on_parent
, on_self
, or on_child
had a return
action.
eval.parent(substitute(otherwise))
}
}
)
)
A rather technical helper function to ensure that child engines of a common parent engines do not share resources.
For example, if two engines both have a foo/bar.R
file, this
would cause terrible conflicts related to multiple inheritance:
if the engine1 asked for foo/bar.R
, then since the tree
traversal asks the parent firsts, which asks its non-engine1
children, then engine1 would receive foo/bar.R
from engine2.
Conversely, engine2 would receive engine1's foo/bar.R
, probably
leading to all sorts of perverse bugs. In the future it may
be possible to specify canonical preferences in configuration
but for now we disable shared resources all together:
All engines must be disjoint.
(Except insofar as they share resources from a common child engine, as in a diamond graph.)
detect_duplicate_resources <- function(resource_list) {
if (length(unique(c(recursive = TRUE, resource_list))) !=
sum(vapply(resource_list, length, numeric(1)))) {
# Allow for comparison of engine-resource pairs.
engined_resource_list <- lapply(resource_list, function(resources) {
setNames(Map(paste, names(resources), resources, sep = "\1"), resources)
})
pairs <- combn(length(resource_list), 2)
conflict_list <- Filter(Negate(is.null), apply(pairs, 2, function(pair) {
same_subengine <- engined_resource_list[[pair[1]]][
engined_resource_list[[pair[1]]] %in% engined_resource_list[[pair[2]]]]
common <- setdiff(intersect(resource_list[[pair[1]]], resource_list[[pair[2]]]),
names(same_subengine))
if (length(common) > 0) {
list(
engines = names(resource_list)[pair],
resources = common
)
}
}))
if (length(conflict_list) > 0L) {
conflicts <- do.call(paste, lapply(conflict_list, function(conflict) {
paste0("\n\nEngine ", crayon::red(conflict$engines[1]), " and ",
crayon::red(conflict$engines[2]), " share:\n",
paste(collapse = "\n", paste(" *",
vapply(conflict$resources, crayon::yellow, character(1)))))
}))
stop("Mounted child engines have conflicting resources. Please mount ",
"at a different root or remove the conflicts: ", conflicts, "\n\n")
}
}
}
Syberia is a framework that was crafted specifically for the R programming language. The story is similar to JavaScript: a single-purpose language meant to be used in a particular context exploded in popularity and found itself in a host of other settings.
However, R is really just LISP under the hood–a general purpose symbolic computing runtime. To prove that this makes R a viable production language for almost any purpose whatsoever, Syberia takes the first step of providing a hierarchical structure for R projects. Anything–from an empty directory, to an R package, to a collection of shiny dashboards–can be a Syberia project.
This flexibility is achieved by assuming nothing about the underlying
directory structure: in other words, fixing no conventions, but letting
the user decide what the right course of action should be with respect
to structuring their project. Other tools fare poorly at this; for example,
a package does not even let you have a hierarchical directory structure
in its R/
subdirectory; what an affront to developer sanity!
How can Syberia claim to be a convention-over-configuration framework if it fixes no conventions? The answer is that while Syberia itself, the package you are currently examining, does not fix any conventions, the de facto approach to development within Syberia is as follows:
It is possible to include multiple engines, or a whole complex potentially cyclic graph of engines, but this introduces the diamond problem and should generally be avoided until you have a good understanding of computer science foundations.
In other words, the Syberia approach is to be a meta-framework. By making no assumptions about the structure of your project, it is your choice which engine to build your project on depending on what suits your needs. This is especially helpful for machine learning projects, where the task and solution can take widely varying shapes depending on whether the problem is supervised, unsupervised, NLP, deep learning, etc.
The above 2-step approach is recursive. For example, the modeling engine,
the default engine for most projects, is built off the base engine,
which dictates that each project should have a config/routes.R
file which
links the lib/controllers
directory to the rest of the project and tells
you how R scripts in the project are to be parsed according to which
directory they reside in. This is similar to object-oriented programming,
except that it is strictly more general since it does not force you
to treat every single thing in the world as an object.
Thus, the core structural unit is an engine. In order to bootstrap
effectively, an engine should have a config/application.R
file, potentially
empty, so that Syberia can detect this is an engine, as well as a config/engines.R
file indicating which engine this one depends on, if any. This is the sole
convention and can be pretty easily deprecated in future versions of Syberia
if there is demand for more flexible configuration.
Each engine is testable by design. Syberia exports a function called
test_project
that by default looks in the test/
directory of the
engine and requires all resources have an accompanying test. If your
resource (e.g., R script) does not have a test – it fails! In general,
only well-written code is easily testable so this encourages both
separating your project out into similar components through like
directory structures as well as ensuring all inputs and outputs are
what you expect them to be. (Not to mention saving you a huge headache
of managing the complexity of a growing system!)
To construct a Syberia engine object, run syberia_project("/path/to/engine")
.
This is a syberia_engine
R6 class
and holds together everything that Syberia knows about your project.
It has methods like $find
, $resource
, and $exists
to play with
the files in your project. In general, Syberia takes a page out of
node.js's book and encourages all files
to be structured so they have a single export: the last expression
in the file. If you haven't replaced the default controller
using the config/routes
file, a topic we'll touch on later, you
should be able to “compile” your resource using
value <- project$resource("relative/path/to/resource.R")
For example, if we had
# relative/path/to/resource.R
x <- 1
y <- 2
x + y
then value
above would be 3
. However, as we will see,
the project$resource
caller is capable of much more than
sourcing a file.
#' Bootstrap a Syberia engine.
#'
#' A Syberia engine defines the core re-usable structural unit across
#' different Syberia projects. In the same way that
#' \href{http://guides.rubyonrails.org/engines.html}{Rails engines}
#' provide a modular structure for \href{Rails}{http://rubyonrails.org/}
#' projects, Syberia engines serve as the re-usable rockbed upon which
#' to construct projects that contain similar components.
#'
#' A Syberia engine is managed by a \code{\link[director]{director}} object.
#' This object ensures that the engine cannot access resources outside of
#' its domain, and allows insularity from other engines and the top-level
#' project from which the engine will be used.
#'
#' @param filepath character. The root directory of the engine.
#' If this directory does not define a (relative) \code{"config/application.R"}
#' file, the parent directories of \code{filepath} will be traversed
#' until such a file is found, or the function will error.
#' By default, the current directory.
#' @param ... Additional arguments used internally.
#' @param root. logical. Whether or not this is a root-level engine,
#' by default \code{TRUE}.
#' @export
#' @note The syberia package will maintain an internal cache of engines.
#' Therefore, calling \code{syberia_engine} twice will retrieve the
#' cached object. This cache is maintained in the \code{.syberia_env}
#' environment object in the syberia package namespace.
#' @return The \code{\link[director]{director}} object responsible for
#' managing the engine.
syberia_engine <- function(filepath = getwd(), ..., root. = TRUE) {
project <- syberia_engine_(filepath, ...)
if (isTRUE(root.)) {
.syberia_env$active_project <- project
}
project
}
#' @rdname syberia_engine
#' @export
syberia_project <- syberia_engine
#' The current active Syberia project.
#'
#' @return A \code{syberia_engine} object or \code{NULL}.
#' @export
active_project <- function() {
.syberia_env$active_project
}
syberia_engine_ <- function(filepath, ...) {
UseMethod("syberia_engine_", filepath)
}
syberia_engine_.pre_engine <- function(filepath, ...) {
build_engine(filepath)
}
syberia_engine_.character <- function(filepath, ...) {
syberia_engine_character(filepath, ...)
}
syberia_engine_character <- function(filepath, cache = TRUE) {
If a user gives ~/foo/bar/baz
as the path and the project's
root is in fact ~/foo
(in other words, if they give a file or subdirectory
in the project), this should be inferrable. We traverse
the parent directories until we hit the root of the file system
to see if we are in a syberia engine
traverse_parent_directories(normalizePath(filepath, mustWork = FALSE), function(filepath) {
If we are caching the precomputed syberia_engine
object, simply fetch
it from the .syberia_env
helper environment.
if (isTRUE(cache) && has_application_file(filepath)) {
If it is not cached, call build_engine
on the directory.
.syberia_env[[filepath]] <- .syberia_env[[filepath]] %||% build_engine(filepath)
} else if (has_application_file(filepath)) {
build_engine(filepath)
}
If we did not find a syberia engine, we just crash here.
}, error = sprintf("No syberia engine found at %s", sQuote(crayon::red(filepath))))
}
A syberia engine with root root
is, by definition, a directory of
R files containing the file root/config/application.R
or any variation
thereof (e.g., root/config/application/application.r
). This is how we
recognize the engine. (Alternatively, if it has a config/engines
resource.)
has_application_file <- function(filepath) {
extensions <- c(".R", ".r", "/_.R", "/_.r")
files <- function(...) {
c(recursive = TRUE, lapply(list(...), function(type) {
paste0(file.path(filepath, "config", type),
gsub(fixed = TRUE, "_", type, extensions))
}))
}
any(file.exists(files("application", "engines")))
}
#' Build a syberia engine.
#'
#' @param buildable character. A file path from which to build the engine.
#' @export
build_engine <- function(buildable) {
UseMethod("build_engine")
}
#' @export
build_engine.pre_engine <- function(buildable) {
dir <- engine_dir(buildable$prefix)
if (!file.exists(dir)) buildable$builder(dir)
syberia_engine(dir, cache = FALSE, root. = FALSE)
}
#' @export
build_engine.character <- function(buildable) {
To build an engine, we bootstrap an otherwise bare
syberia_engine
R6 object. Bootstrapping an engine
is explained below.
bootstrap_engine(syberia_engine_class$new(buildable))
}
engine_dir <- function(dir) {
file.path(engine_location(), dir)
}
The location where Syberia will keep copies of downloaded engines.
engine_location <- function() {
path <- engine_location_path()
if (!file.exists(path)) {
Ensure we can actually create the engine storage location.
if (!dir.create(path, FALSE, TRUE)) {
stop(m("engine_location_mismatch", path = path), call. = FALSE)
}
}
path
}
By default, Syberia will store engine code in
~/.R/.syberia/engines
.
engine_location_path <- function() {
Sys.getenv("SYBERIA_ENGINE_LOCATION") %|||%
getOption("syberia.engine_location", "~/.R/.syberia/engines")
}
To build an engine, we bootstrap an otherwise bare
syberia_engine
R6 object. Bootstrapping an engine
consists of
config/boot
and config/engines
.config/engines
.config/engines
followed by config/boot
.bootstrapped
on the engine to indicate it is built.The notion of preprocessor and parser belongs to
director. In fact, every syberia_engine
inherits from director
, so every syberia_engine
has at least the same
capabilities as a director
object. (Rails experts may notice the similarity
between Syberia engines and directors, and Rails engines and railties.)
Registering a preprocessor means we do additional stuff before sourcing
that file. For example, in config/engines
, we provide a helper function
called engine
for registering subengines. Registering a parser means we
do additional stuff after preprocessing and sourcing that file. For example,
in config/engines
we need to actually build and mount the engines referred
to in the fil with the engine
helper function.
In order to avoid the diamond problem, Syberia ensures that engines do not share resources unless they come from a common base engine. This is a technical issue that will eventually be explained in a more thorough whitepaper. (For the confused CS students asking why I couldn't have simply made engines form a DAG instead of a possibly cyclic ## graph–the short answer is that engines need to trivially modularizable while ## retaining the power to refer to parent engine resources dynamically and just-in-time. I may very well regret this decision.)
Each director
, and by proxy syberia_engine
, has an internal cache for
memoising run-time state while the R process is running (this is different
from a director
's registry, which captures persistent cross-session
state on disk). We simply set the bootstrapped
key to TRUE
so we can
check for it later.
bootstrap_engine <- function(engine) {
if (isTRUE(engine$cache_get("bootstrapped"))) return(engine)
engine$register_preprocessor("config/boot", boot_preprocessor)
engine$register_preprocessor("config/engines", engine_preprocessor)
engine$register_parser ("config/engines", engine_parser)
exists <- function(...) engine$exists(..., parent. = FALSE, children. = FALSE)
if (exists("config/engines")) engine$resource("config/engines")
if (exists("config/boot")) engine$resource("config/boot")
# Check for duplicate resources in mounted child engines.
engine$find(check_duplicates. = TRUE, children. = TRUE, tag_engine. = TRUE)
engine$cache_set("bootstrapped", TRUE)
engine
}
The boot_preprocessor
is referred to above in the boostrap_engine
function.
We are saying that the config/boot
file should have access to
the director
object, and nothing more.
boot_preprocessor <- function(source, source_env, director) {
source_env$director <- director
source()
}
The config/engines
file is an epsilon harder. Basically, we use
the engine
helper function to record the user's requested engine
mounting until we get to the parser below.
engine_preprocessor <- function(source, source_env, preprocessor_output, director) {
if (isTRUE(director$cache_get("bootstrapped"))) return()
preprocessor_output$engines <- new.env(parent = emptyenv())
source_env$engine <- function(name, ...) {
preprocessor_output$engines[[name]] <- list(...)
}
source()
}
Now that we have collected the engines to be mounted into the preprocessor_output
helper (which also came from director),
#
engine_parser <- function(director, preprocessor_output) {
if (isTRUE(director$cache_get("bootstrapped"))) return()
For each engine mentioned using the engine
helper in config/engines
,
we register the engine. This means that we establish member variables
on the respective syberia_engine
objects that allow them to understand
the topology: the parent engine knows about its children, and each child
knows about its parent. Syberia is all about family.
for (engine in ls(preprocessor_output$engines, all = TRUE)) {
register_engine(director, engine, parse_engine(preprocessor_output$engines[[engine]]),
mount = isTRUE(preprocessor_output$engines[[engine]]$mount))
}
When we use devtools::load_all
on director, it loads a symbol called
exists
; we use explicit base namespacing to prevent conflicts during development.
if (base::exists(".onAttach", envir = input, inherits = FALSE)) {
A little additional trick is that the user could have specified
an .onAttach
function in the config/engines
file. If this exists,
we store it in the syberia_engine
cache and later invoke it.
onAttach <- input$.onAttach
if (!is.function(onAttach)) {
stop(m("onattach_failure", root = director$root(), klass = class(onAttach)[1L]),
call. = FALSE)
}
We use list2env
to “inject” the director
local variable into the
scope of the onAttach
hook.
environment(onAttach) <- list2env(
list(director = director),
parent = environment(onAttach) %||% baseenv()
)
director$cache_set(".onAttach", onAttach)
}
NULL
}
Registering an engine means making the parent aware of its child and the child aware of its parent. Mounting the engine means we will be treating a collection of engines, each in potentially very different directories on the file system, as one giant project. This allows us to pull out a subset of Syberia resources and “enginify” them with ease.
Although Syberia does face a few theoretical challenges,
these have solutions, and the end result is that given a Syberia
project with N
resources, there are 2^N
possible engines extractable
from that project: one for each subset of resources. One of Syberia's goals
is to make it as easy as possible to pull out your work so others
can re-use it.
register_engine <- function(director, name, engine, mount = FALSE) {
message(crayon::green(paste("...Mounting", name, "engine.")))
# TODO: (RK) Replace with $engines private member after R6ing.
if (!director$cache_exists("engines")) {
director$cache_set("engines", new.env(parent = emptyenv()))
}
env <- director$cache_get("engines")
env[[name]] <- engine
director$register_engine(name, engine, mount = mount)
if (engine$cache_exists(".onAttach")) {
This is where we invoke the .onAttach
hook registered in
the engine_parser
.
engine$cache_get(".onAttach")(director)
}
}
parse_engine <- function(engine_parameters) {
By default, download engines from GitHub.
engine_parameters$type <- engine_parameters$type %||% "github"
if (!is.simple_string(engine_parameters$type)) {
stop(sprintf("When defining an engine, please provide a string for the %s",
sQuote("type")), call. = FALSE)
}
parser <- paste0("parse_engine.", engine_parameters$type)
When we use devtools::load_all
on director, it loads a symbol called
exists
; we use explicit base namespacing to prevent conflicts during development.
if (!base::exists(parser, envir = getNamespace("syberia"), inherits = FALSE)) {
stop(sprintf("Cannot load an engine of type %s",
sQuote(crayon::red(engine_parameters$type))))
}
syberia_engine(get(parser, envir = getNamespace("syberia"))(engine_parameters),
cache = FALSE, root. = FALSE)
}
parse_engine.github <- function(engine_parameters) {
GitHub-derived engines need to provide a repo
and version
(by default “master”).
For example, the following indicates we wish to load the “bobbleheads”
engine from GitHub “jimbob/bobbleheads.sy” off the master branch.
# config/engines.R
engine("bobbleheads", type = "github", repo = "jimbob/bobbleheads.sy")
Syberia will download and cache the engine.
repo <- engine_parameters$repo %||% engine_parameters$repository
# TODO: (RK) Checking for updates?
version <- engine_parameters$version %||% "master"
PAT <- github_pat()
if (!is.null(PAT)) {
base_url <- sprintf("https://%s@github.com/%s.git", PAT, repo)
} else {
base_url <- sprintf("https://github.com/%s.git", repo)
}
stopifnot(is.simple_string(repo))
pre_engine(prefix = file.path("github", repo, version),
builder = function(filepath) {
status <- system2("git",
c("clone", base_url, filepath,
"--branch", version, "--depth", "1", "--quiet"))
stopifnot(status == 0)
})
}
Putting the following in config/engines.R
indicates we wish
to load the “bobbleheads” engine from GitHub
“jimbob/bobbleheads.sy” off the master branch.
# config/engines.R
engine("bobbleheads", type = "local", path = "~/dev/bobbleheads")
parse_engine.local <- function(engine_parameters) {
path <- engine_parameters$path %||% stop("Please provide an engine path")
if (!file.exists(path)) stop("The path ", sQuote(path), " does not exist.")
path
}
pre_engine <- function(prefix, builder) {
Build a pre_engine
S3 object.
structure(list(prefix = prefix, builder = builder), class = "pre_engine")
}
#' Whether to exclude a syberia engine from being used for resourcing.
#'
#' @param condition logical. Some condition.
#' @param engine syberia_engine. Engine object.
#' @export
should_exclude <- function(condition, engine) {
UseMethod("should_exclude")
}
#' @export
should_exclude.syberia_engine <- function(...) { identical(...) }
#' @export
should_exclude.character <- function(condition, engine) {
identical(condition, engine$root())
}
A dictionary of messages used by the package. We separate these into its own file to avoid cluttering the R code with a multitude of strings.
messages <- list(
test_engine_type_error = c(
"Please pass a ", sQuote(crayon::yellow("syberia_engine")), " object ",
"to the ", sQuote(crayon::yellow("test_engine")), " function."
),
test_hook_no_engine = c(
"To fetch the {{{type}}} hook for a project, please pass in a syberia_engine ",
"object (the syberia_engine for the syberia project). Instead I got ",
"an object of class {{{klass}}}."
),
test_hook_invalid_format = c(
"Test {{{type}}} hooks must be a function or a list of functions.\n\nIn ",
"{{{filename}}}, ensure that ",
"you have ", sQuote(crayon::yellow("{{{type}}} <- some_function")),
" as right now it's an object of class ",
sQuote(crayon::red("{{{klass}}}"))
),
test_hook_arity_error = c(
"Test {{{type}}} hooks must all be functions that take at least one ",
"argument.\n\nThe first argument will be an environment that has one ",
"key, ", sQuote("project"), ". In {{{filename}}}",
" ensure your ", sQuote(crayon::yellow("{{{type}}}")),
" local variable meets this constraint."
),
engine_location_mismatch = c(
"Syberia needs a directory in which to place the code for ",
"dependencies. Please ensure ", sQuote(crayon::red("{{{path}}}")),
" is writable, or set a different path in the ",
sQuote(crayon::yellow("SYBERIA_ENGINE_LOCATION")), " environment variable or ",
"the ", sQuote(crayon::yellow("syberia.engine_location")), " global option (",
"using ", sQuote(crayon::magenta("options(syberia.engine_location = \"some/dir\")")), ")."
),
onattach_failure = c(
"The ", sQuote(crayon::yellow(".onAttach")), " hook in the ",
sQuote(crayon::yellow("{{{root}}}")), " engine must be a *function*. ",
"Instead, I got a {{{klass}}}."
),
sanitize_engine_class = c(
"When sourcing a resource, please pass a string to the ",
sQuote("engine"), " parameter."
),
sanitize_engine_no_engine = c("There is no engine called ", sQuote("{{{engine}}}"), "."),
sanitize_engine_mounting_conflict = c(
"Explicit engine specification during resource sourcing is ",
"only allowed on unmounted engines. The ", sQuote("{{{engine}}}"),
" engine is a mounted engine."
)
)
Cleanse the message a little after fetching it from the messages
list.
msg <- function(name) {
stopifnot(name %in% names(messages))
The gsub
will squish multiple spaces into a single space,
while the paste(collapse = "", ...)
usage will ensure we
can take vectors of characters in the above messages
list.
paste(collapse = "", gsub("[ ]+", " ", messages[[name]]))
}
We use the whisker templating engine to inject any additional values into the message string. For example,
m("parse_mungepiece_dual_error", error = "Bloop")
would return the appropriate error with the string “Bloop” injected in the appropriate place.
m <- function(name, ...) {
Note the use of do.call
,
a very handy R metaprogramming tool when we do not know exactly which
arguments we will pass.
do.call(whisker::whisker.render, list(msg(name), list(...)))
}
#' Syberia is a development meta-framework for R.
#'
#' In its original formulation, in conjunction with
#' \href{http://github.com/syberia/modeling.sy}{the modeling engine},
#' Syberia serves as a machine learning classifier development framework.
#'
#' The modeling engine provides an opinionated unified framework for
#' fast iteration on classifier development and deployment. It is
#' founded on convention over configuration and aims to solve the
#' problems of classifier-specific data preparation and
#' classifier-specific modeling parameters.
#'
#' The more general vision for Syberia is still in progress, but aims
#' to unify the currently disparate realms of R packages, script codebases,
#' Shiny dashboards, R web apps, and reproducible analysis. In the
#' viewpoint of the author, R is syntactic sugar around LISP, which
#' enables arbitrary computation; Syberia is an attempt to support this
#' conjecture by allowing the construction of arbitrary software
#' projects within the R programming language, thereby finally outgrowing
#' its long-overdue misconception as a statistical tool.
#'
#' @name syberia;
#' @docType package
#' @author Robert Krzyzanowski
#' @import testthat stagerunner director R6 whisker
NULL
An environment used for caching some Syberia-managed objects, like directors.
.syberia_env <- new.env(parent = emptyenv())
Like any good development framework, Syberia offers built-in support for testing. The primary unit of development in Syberia is the engine. All Syberia projects are composed of a collection of engines (each potentially depending on vanilla R packages).
This function allows one to run all the tests associated with the engine.
By default, the files in the test
directory of the engine are considered
tests, and all other files are non-tests.
The convention is straightforward:
if you have a Syberia resource a/b/c
off the root of the project, you
should have an accompanying test in test/a/b/c
. Note we leave out the
.R
extension as the resource may be an idempotent or non-idempotent
resource.
Idempotent resources are .R
files in a directory with the same name
as the file without extension. For example, test/a/b/c/c.R
would be an
idempotent resource, and helper files like test/a/b/c/helper.R
would be
invisible to the Syberia engine: this encourages clean design and separation
into helper files as your resource becomes more complex.
Here is an example test. You can look at the accompanying helper project
# calculations/pi.R
# Compute pi using a series: http://functions.wolfram.com/Constants/Pi/06/01/01/
Reduce(`+`, 4 * vapply(seq_len(1000) - 1, function(k) {
(-1)^k / (2 * k + 1)
}, double(1)))
# test/calculations/pi.R
test_that("the calculation is close to pi", {
# Note we have access to the testthat package.
expect_less_than(abs(resource() - 3.1415926), 1e-3)
})
We can execute the test using test_engine(root)
where root
is the string
representing the directory the above files are contained in.
Note that Syberia provides the resource
helper to fetch the current resource
being tested. You could pass a first argument to fetch another resource,
but if you leave it empty, the default will always be the resource corresponding
to the tested resources. Thus, if you are in test/calculations/pi
, calling
resource()
will build you calculations/pi
.
It is possible to add test setup and teardown hooks. This means that before
the test suite runs, you can add additional conditions to ensure your
project is working as intended. For example, the author has found it useful
to add checks for README.md
files in all directories to encourage
the team to always add documentation (or else your test suite breaks!).
To add a test setup hook, simply place a function or a list of functions
into the setup
local variable in the file config/environments/test.R
(note this is controlled by the config
parameter to test_engine
.
Additional hooks are teardown
(for when the test suite finishes),
single_setup
(every time a test runs), and single_teardown
(every
time a test finishes). Any functions or lists of functions passed to
these locals should have only one argument: the received value will be
an environment containing the key resource
, a string representing the
current resource being tested, and the key project
, pointing to the
syberia_engine
object for the project (or engine).
For example, one could include the following configuration file in a syberia project.
# config/environments/test.R
setup <- function(env) { cat("Beginning test suite!\n") }
teardown <- function(env) { cat("Ending test suite!\n") }
single_setup <- function(env) {
cat(paste("Testing resource", env$resource, "\n")) }
single_teardown <- function(env) {
cat(paste("Tested resource", env$resource, "\n")) }
Running test_engine(...)
on this project will cause the first message
to be printed, then the last two to alternate between each test,
before finally wrapping up with the second message.
#' Run all tests in a syberia project or engine.
#'
#' The tests that will be run are all those in the \code{test} subdirectory
#' of the root of the syberia engine, unless otherwise specified.
#'
#' It is possible to introduce additional behavior prior to and after tests.
#' This can be used to perform additional testing not covered by sourcing
#' all files in the "test/" directory of the syberia engine.
#'
#' To provide a setup or teardown hook, simply place a function or list of
#' functions in a local variable \code{setup} or \code{teardown}, respectively,
#' in \code{config/environments/test} relative to the root of the syberia engine,
#' or pass the relevant \code{setup} or \code{teardown} parameters to this function.
#'
#' For example, creating a file \code{config/environments/test.R} with the code
#' \code{setup <- function(env) cat("Running all tests.")} will print a message
#' before all the tests are run. The one parameter the function must take is an
#' environment which will contain a single key, \code{director}, pointing to the
#' object returned by calling \code{\link{syberia_engine}}.
#'
#' @param engine syberia_engine. The syberia engine to test.
#' If a \code{character}, it will be passed to \code{\link{syberia_engine}} first.
#' @param base character. Any subdirectory to test specifically. By default,
#' \code{"test"}.
#' @param config character. The relative path to the configuration resource,
#' by default \code{"config/environments/test"}.
#' @param ignored_tests character. The list of tests to ignore, by default
#' the local variable \code{ignored_tests} extracted from the configuration
#' resource specific by the \code{config} parameter.
#' @param optional_tests character. The list of tests to ignore, by default
#' the local variable \code{optional_tests} extracted from the configuration
#' resource specific by the \code{config} parameter.
#' @param required logical. Whether or not all tests are required to have resources,
#' by default \code{TRUE}. If \code{TRUE}, the \code{ignored_tests}
#' resources will not be required to have an accompanying test. It is highly
#' recommended that all your projects have full test coverage.
#' @param reporter character. The testthat package test reporter to use. The
#' options are \code{c("check", "list", "summary", "minimal", "multi", "rstudio",
#' "silent", "stop", "tap", "teamcity")}, with the default being \code{"summary"}.
#' @param error_on_failure logical. Whether or not to raise an error
#' if there are any failures. By default, \code{TRUE}.
#' @seealso \code{\link{syberia_engine}}
#' @export
#' @return A list of \code{testthat_results} objects giving the details for
#' the tests executed on each tested resource. If \code{error_on_failure}
#' is \code{TRUE}, error instead if there are any failures.
test_engine <- function(engine = syberia_engine(), base = "test",
config = file.path("config", "environments", "test"),
ignored_tests = ignored_tests_from_config(engine, base, config),
optional_tests = optional_tests_from_config(engine, base, config),
required = TRUE, reporter = c("summary", "check", "list",
"minimal", "multi", "rstudio", "silent", "stop", "tap", "teamcity")[1L],
error_on_failure = TRUE) {
if (is.character(engine)) {
engine <- syberia_engine(engine)
}
if (!is(engine, "syberia_engine")) {
stop(m("test_engine_type_error"))
}
force(ignored_tests)
We will try to identify the resources to test by grabbing all
non-test (i.e., not in the test
directory) resources from the engine
and also excluding those given by ignored_tests
.
tests <- find_tests(engine, base, ignored_tests)
If testing is required for the engine (i.e., every resource needs an accompanying test) fail unless the condition is satisfied.
if (isTRUE(required)) {
ensure_resources_are_tested(engine, tests, optional_tests, base)
}
results <- test_resources(engine, tests$active, config, reporter = reporter)
if (isTRUE(error_on_failure)) {
if (!all(vapply(results, getFromNamespace("all_passed", "testthat"), logical(1)))) {
stop("Test failures", call. = FALSE)
}
}
invisible(results)
}
#' Run the tests on a given set of resources.
#'
#' @param engine syberia_engine. The engine to run the tests on.
#' @param tests character. The character vector of resources to test.
#' @param ... Additional arguments to pass to \code{find_test_hook}.
#' @param reporter character. The test reporter to use.
#' @return The testthat result summary for this one test run.
test_resources <- function(engine, tests, ..., reporter) {
We ensure testthat and testthatsomemore are installed.
ensure_test_packages()
The testthat package does not export find_reporter
, so we
grab it with a trick.
reporter <- getNamespace("testthat")$find_reporter(reporter)
We are mimicking testthat's test_files
.
reporter$start_reporter()
results <- NULL
We would like to error if any global options have been modified, or global variables introduced.
Syberia resources should be stateless and not modify global options!
setup_hook <- find_test_hook(engine, type = "setup", ...)
if (!is.null(setup_hook)) setup_hook$run()
single_setup <- find_test_hook(engine, type = "single_setup", ...)
single_teardown <- find_test_hook(engine, type = "single_teardown", ...)
ensure_no_global_variable_pollution(check_options = TRUE, {
results <- lapply(tests, test_resource, engine = engine, reporter = reporter,
setup = single_setup, teardown = single_teardown)
})
teardown_hook <- find_test_hook(engine, type = "teardown", ...)
if (!is.null(teardown_hook)) teardown_hook$run()
reporter$end_reporter()
Again mimicking testthat.
invisible(getNamespace("testthat")$testthat_results(results))
}
ensure_test_packages <- function() {
ensure_installed("testthat")
ensure_testthatsomemore()
}
#' Run the tests for a single resource.
#'
#' @param engine syberia_engine. The engine to run the test on.
#' @param resource character. The resource to test.
#' @param setup stageRunner. A \code{\link[stagerunner]{stageRunner}} to
#' execute setup hooks for this test.
#' @param teardown stageRunner. A \code{\link[stagerunner]{stageRunner}} to
#' execute teardown hooks for this test.
#' @param reporter reporter. A testthat reporter object.
#' @return The testthat result summary for this one test run.
test_resource <- function(engine, resource, setup, teardown, reporter) {
result <- NULL
Each resource itself should not add global variables or pollute options.
ensure_no_global_variable_pollution(check_options = TRUE, {
if (!missing(setup) && !is.null(setup)) {
setup$.context$resource <- resource
setup$run()
}
call_args <- list(resource, recompile = TRUE, recompile. = TRUE)
if (!missing(reporter)) {
call_args$reporter <- reporter
}
result <- suppressMessages(do.call(engine$resource, call_args))
if (!missing(teardown) && !is.null(teardown)) {
teardown$.context$resource <- resource
teardown$run()
}
The desc
parameters allows us to be specific when we fail:
it will inform the user the name of the resource that created a global
variable or modified a global option.
}, desc = paste("running", resource))
result
}
#' Fetch the test setup or teardown hook, if any exists.
#'
#' The resource \code{config/environments/test} should contain a local variable
#' \code{setup} or \code{teardown} that has a function or list of functions to
#' be incorporated into a stageRunner that will run the actual test setup
#' or teardown.
#'
#' The seed environment for the stageRunner will contain the director object
#' of the relevant project in the key \code{director}.
#'
#' @param engine syberia_engine. The director for the syberia project.
#' @param type character. Must be \code{'setup'} or \code{'teardown'}, the former
#' being the default.
#' @param config character. The resource used to fetch configuration.
#' @seealso \code{\link{test_engine}}
#' @return a stageRunner that will run the relevant setup or teardown hook(s).
find_test_hook <- function(engine, type = "setup", config) {
if (!is(engine, "syberia_engine")) {
stop(m("test_hook_not_engine", type = type, klass = class(engine)[1L]), call. = FALSE)
}
hooks <- value_from_config(engine, config, type)
if (is.null(hooks)) return(NULL)
# TODO: (RK) Maybe replace this with a new stageRunner method to check
# argument validity? In the future, stageRunner could maybe do more!
colored_filename <- sQuote(crayon::blue(config))
if (!is.list(hooks) && !is.function(hooks) && !stagerunner::is.stagerunner(hooks)) {
stop(m("test_hook_invalid_format", type = type, filename = colored_filename,
klass = class(hooks)[1L]), call. = FALSE)
}
if (!is.list(hooks)) { hooks <- list(hooks) }
all_have_correct_arity <- stagerunner::is.stagerunner(hooks) ||
all(rapply(hooks, how = "unlist", function(hook) {
is.function(hook) && length(formals(hook)) > 0
}))
if (!all_have_correct_arity) {
stop(m("test_hook_arity_error", type = type, filename = colored_filename,
type = type), call. = FALSE)
}
# Do not give access to global environment to ensure modularity.
hook_env <- list2env(list(project = engine), parent = parent.env(globalenv()))
stagerunner::stageRunner(hook_env, hooks)
}
#' Check that all mandatory tested resources have tests.
#'
#' @param engine syberia_engine. The engine to check.
#' @param tests character. The tests to check. Must be a list with keys
#' \code{"active"} and \code{"ignored"}.
#' @param optional character. A character vector of optional tests.
#' @param base character. The directory containing tests in the project, by
#' default \code{"test"}.
#' @return Nothing, but error if not all resources have tests.
ensure_resources_are_tested <- function(engine, tests, optional, base = "test") {
without_builtin_resources <- function(resources) {
We exclude the config
and test
directories.
resources[substring(resources, 1, 7) != "config/" &
substring(resources, 1, 5) != paste0(base, "/")]
}
without_optional_resources <- function(resources) {
optional
resources are actually substring matches!
Filter(function(resource) !any_is_substring_of(resource, optional), resources)
}
all_resources <- without_optional_resources(without_builtin_resources(engine$find()))
# Error if any resources don't have tests.
necessary_tests <- file.path(base, all_resources)
missing_tests <- setdiff(necessary_tests, c(tests$active, tests$ignored))
if (length(missing_tests) > 0L) {
stop(call. = FALSE, "Tests are missing for the following resources:\n\n",
crayon::red(paste(gsub(paste0("^", base, "/"), "", missing_tests), collapse = "\n")))
}
}
find_tests <- function(engine, base, ignored_tests) {
all_tests <- engine$find(children. = FALSE, base = gsub("\\/$", "", base))
tests <- Filter(function(x) !any_is_substring_of(x, ignored_tests), all_tests)
ignored_test_paths <- setdiff(all_tests, tests)
list(
active = tests,
ignored = setdiff(all_tests, tests)
)
}
By default, we will fetch the test configuration from
config/environments/test
, but the location of the configuration file
itself is configurable (see the config
parameter to test_engine
).
test_environment_configuration <-
function(engine, path = file.path("config", "environments", "test")) {
if (!engine$exists(path, children. = FALSE)) {
list()
} else {
engine$resource(path, children. = FALSE)
}
}
If the configuration file, usually config/environments/test.R
,
has a local variable ignored_tests
, the tests of the
character vector of resource names will not be executed.
This is useful when certain tests are broken or under development.
ignored_tests_from_config <- function(engine, base, config) {
file.path(base,
value_from_config(engine, config, "ignored_tests") %||% character(0)
)
}
If the configuration file, usually config/environments/test.R
,
has a local variable optional_tests
, the character vector of resource
names will not be executed. This is useful when certain resources
were never expected to have tests, such as those representing network
connections, constants, or other things that don't really make sense to test.
Use with caution!
optional_tests_from_config <- function(engine, base, config) {
value_from_config(engine, config, "optional_tests") %||% character(0)
}
value_from_config <- function(engine, config, value) {
test_environment_configuration(engine, config)[[value]]
}
#' Non-null selection operator.
#'
#' @name infix_or
#' @param x ANY. An R object. If \code{NULL}, return \code{y}.
#' @param y ANY. An R object. If x is \code{NULL}, return \code{y}.
#' @export
`%||%` <- function(x, y) if (is.null(x)) y else x
`%|||%` <- function(x, y) if (is.falsy(x)) y else x
is.falsy <- function(x) {
identical(x, NULL) || identical(x, FALSE) || identical(x, "") ||
length(x) == 0 || identical(x, 0)
}
#' Retrieve Github personal access token.
#'
#' Borrowed from \url{https://github.com/r-pkgs/remotes/blob/master/R/github.R#L23}
#' A github personal access token
#' Looks in env var \code{GITHUB_PAT}
#'
#' @keywords internal
#' @noRd
github_pat <- function() {
pat <- Sys.getenv('GITHUB_PAT')
if (!nzchar(pat)) { return(NULL) }
message("Using github PAT from envvar GITHUB_PAT")
pat
}
#' Fetch the current Syberia version.
#' @export
#' @return The version of the syberia package as \code{\link{package_version}}
#' object.
syberia_version <- function() {
utils::packageVersion("syberia")
}
package_exists <- function(name) {
is.element(name, utils::installed.packages()[, 1])
}
ensure_installed <- function(package_name) {
Using requireNamespace
is the de facto accepted approach here.
if (!package_exists(package_name)) {
stop("Please install ", crayon::yellow(package_name), ":\n\n",
crayon::green(paste0("install.packages('", package_name, "')")), "\n", call. = FALSE)
}
}
Testthatsomemore is an auxiliary package used for some testing utilities.
ensure_testthatsomemore <- function() {
if (package_exists("testthatsomemore")) return()
ensure_installed("devtools")
message("The package ", crayon::yellow("testthatsomemore"),
" is not installed; installing from http://github.com/robertzk/testthatsomemore")
withCallingHandlers({
We install it from GitHub if the user does not have it installed.
devtools::install_github("robertzk/testthatsomemore")
requireNamespace("testthatsomemore", quietly = TRUE)
}, error = function(e) {
stop("The ", crayon::red("testthatsomemore"), " package failed to install. ",
"Try manually: \n\n",
crayon::green('devtools::install_github("robertzk/testthatsomemore")'), "\n\n",
"The error was: ", paste(as.character(e), collapse = "\n"), call. = FALSE)
})
}
as.list.environment <- function(env) {
out <- base::as.list.environment(env)
lapply(out, function(x) if (is.environment(x) && !is(x, "R6")) as.list(x) else x)
}
#' Ensure no global variables are polluted during an expression.
#'
#' If any global variables are removed or created, it will
#' give a descriptive error.
#'
#' @param expr expression. The R expression to evaluate
#' @param desc character. A string to add to "you modified global
#' @param check_options logical. Whether to check if any global options were changed.
#' variables while [\code{desc} goes here]".
#' @return the output of the \code{expr}.
ensure_no_global_variable_pollution <- function(expr, desc, check_options = FALSE) {
if (isTRUE(check_options)) old_options <- options()
before <- ls(globalenv())
out <- eval.parent(substitute(expr))
after <- ls(globalenv())
missing_desc <- missing(desc)
shorten <- function(vars) if (length(vars) > 5) c(vars[1:5], "...") else vars
message <- function(vars, type = "variables", action = "removed") {
msg <- paste("Some global", type, "were", action)
if (!eval.parent(quote(missing_desc))) msg <- paste(msg, "while", desc)
msg <- paste0(msg, ": ", crayon::red(paste(vars, collapse = ", ")))
}
check_before_after <- function(before, after, type) {
if (length(bads <- setdiff(before, after)) > 0) stop(message(bads, type = type))
else if (length(bads <- setdiff(after, before)) > 0)
stop(message(bads, type = type, action = "added"))
}
check_before_after(before, after, "variables")
if (isTRUE(check_options) && !identical(new_options <- options(), old_options)) {
before <- ls(old_options); after <- ls(new_options)
check_before_after(before, after, "options")
diffs <- vapply(before,
function(name)! identical(old_options[[name]], new_options[[name]]), logical(1))
stop("Some global options were modified: ",
crayon::red(paste(names(which(diffs)), collapse = ", ")))
}
out
}
#' Perform an action repeatedly on parent directories until success or error.
#'
#' Given a \code{fn}, we may wish to run it on a \code{filepath}, determine
#' its success, and try again with the parent directory of \code{filepath},
#' until we obtain result that is not \code{NULL}. If this does not occur for
#' any parent directory, we halt with the string \code{error}.
#'
#' @param filepath character. The filepath to traverse along. The \code{fn}
#' function will be called with \code{filepath} and its parent directories
#' until it returns a result other than \code{NULL}.
#' @param fn function. A one-argument function called on \code{filepath} or
#' its successive parent directories until a result other than \code{NULL}
#' is returned, which will be the final return value.
#' @param error character or function. A string to error if \code{fn} returns
#' \code{NULL} on all parent directories, or a one-argument function to
#' execute (the argument received will be the initial \code{filepath})
#' @return The result of \code{fn} on the first parent directory of
#' \code{filepath} on which it is not \code{NULL}.
traverse_parent_directories <- function(filepath, fn, error) {
stopifnot(is.character(filepath), length(filepath) == 1, !is.na(filepath))
stopifnot(is.function(fn), length(formals(fn)) >= 1)
stopifnot(is.character(error) || is.function(error))
path <- normalizePath(filepath, mustWork = FALSE)
As long as we have not hit the root directory, keep trying.
while (!identical(dirname(path), path)) {
result <- fn(path)
if (!is.null(result)) return(result)
path <- dirname(path)
}
if (is.character(error)) stop(error)
else error(filepath)
}
order_by_key <- function(list) {
list[order(names(list))]
}
is.simple_string <- function(obj) {
is.character(obj) && length(obj) == 1 && !is.na(obj) && nzchar(obj)
}
#' Whether or not any substring of a string is any of a set of strings.
#'
#' @param string character.
#' @param set_of_strings character.
#' @return logical
#' @examples
#' stopifnot(syberia:::any_is_substring_of('test', c('blah', 'te', 'woo'))) # TRUE
#' stopifnot(!syberia:::any_is_substring_of('test', c('blah', 'woo'))) # FALSE
any_is_substring_of <- function(string, set_of_strings) {
any(vapply(set_of_strings,
function(x) substring(string, 1, nchar(x)) == x, logical(1)))
}
.onAttach <- function(...) {
if (!isTRUE(getOption("syberia.silent"))) {
packageStartupMessage(paste0("Loading ", crayon::red("Syberia"), "...\n"))
}
# Load better trace.
if (isTRUE(getOption("syberia.autoload_bettertrace", TRUE)) &&
!identical(Sys.getenv("CI"), "TRUE")) {
if (!is.element("devtools", utils::installed.packages()[, 1])) {
packageStartupMessage(crayon::yellow(" ...Installing devtools\n"))
utils::install.packages("devtools")
}
if (!requireNamespace("bettertrace", quietly = TRUE)) {
packageStartupMessage(crayon::yellow(" ...Installing github.com/robertzk/bettertrace\n"))
devtools::install_github("robertzk/bettertrace")
}
library(bettertrace)
}
makeActiveBinding("project", env = globalenv(),
function() getFromNamespace("active_project", "syberia")())
makeActiveBinding("resource", env = globalenv(),
function() getFromNamespace("active_project", "syberia")()$resource)
# We want to initialize a Syberia project in the current working directory
# because 9 times out of 10 this is what the user wants.
#
# However, this hook doesn't work on install, because install is done from
# the working directory of the package, not the user.
#
# But when the user calls library(syberia), this will work.
try({
syberia_engine()
packageStartupMessage(crayon::yellow(
"Loaded syberia project ", sQuote(active_project()$root()), "...\n"
))
}, silent = TRUE)
}
.onDetach <- function(...) {
try(silent = TRUE, detach("syberia:shims"))
# TODO: (RK) Detach any environments attached to the search path
# by the active project.
}
.onUnload <- function(...) {
try(silent = TRUE, detach("syberia:shims"))
}