engine_R6.R

The core object coordinating a syberia engine is a syberia_engine R6 object.

              syberia_engine_class <- R6::R6Class("syberia_engine",
  portable = TRUE,

We inherit directly from the director base class.

                inherit = getFromNamespace("director_", "director"), #environment(director::director)$director_,
  public = list(

Each syberia engine can include more dependent engines. This forms a graph of engines. However, child engines are allowed to be shared, so the final structure is usually a DAG, not a tree. The parent node is stored in .parent and the children (immediate dependent engines) are stored in .engines.

                  .parent  = NULL,
    .engines = list(),
    .set_parent = function(parent) { self$.parent <<- parent },

Registering an engine is as simple adding it to the list of .engines. If we are mounting it, we register its parent node as the current engine.

                  register_engine = function(name, engine, mount = FALSE) {
      stopifnot(is(engine, "syberia_engine"))
      self$.engines[[name]] <<- list(engine = engine, mount = isTRUE(mount))
      if (isTRUE(mount)) engine$.set_parent(self)
    },

The vanilla director object that the syberia_engine inherits from has a resource method: namely, take an R file and compile it according to some preprocessor and parser (see the director package for a more thorough explanation).

The syberia_engine$resource method takes a different approach. Since an engine may itself have more engines, resources can come from a whole tree of engines. Typically, it would be sufficient to check the engine itself and its children when looking for a resource. However, Syberia takes a different approach. To make it possible to pull out arbitrary subsets of files and turn them into an engine, Syberia first traverses up to the parent to see if it has overwritten any of the resources.

This leads to some multiple inheritance problems and is a tricky course to navigate, but the end result is that the user is completely transparent to the machinery that goes into finding resources under the hood. It is possible to extract a collection of controllers and resources from any top-level syberia project (as long as they are self contained and include each other's dependencies) and transform that into an engine simply by copying the files, while retaining the power provided by controllers insofar as giving each directory structure its own meaning.

                  resource = function(name, ..., parent. = TRUE, children. = TRUE,
                        exclude. = NULL, defining_environment. = parent.frame(),
                        engine) {

      if (missing(engine)) {
        # If we do not force right away, parent.frame() will be pointing to
        # the wrong place. Everything in R is a promise...
        force(defining_environment.)
        dots <- list(...)

We use a tree traversal helper defined later in this file: first, it checks the parent exists (if one exists). This will recursively call its parent's resource method. To avoid infinite loops, we have to be careful about excluding the current engine from traversal. If the parent engine reports it owns the resource, we return the on_parent action.

Otherwise, if the current engine has the resource, we return that instead. Finally, if neither the parent engine nor self own the resource, we browse through the children.

In the event that none of the engines have the resource, we trigger the self engine to resource the object so that we get an error that the resource was not found.

                      private$traverse_tree(parent = parent., children = children.,
          exclude = exclude., exists_args = list(name),
          on_parent = self$.parent$resource(name, ...,
                        exclude. = c(list(self$root()), exclude.),
                        defining_environment. = defining_environment.),
          on_self = super$resource(name, ...,
                                   defining_environment. = defining_environment.),
          on_child = function(engine) {
            do.call(engine$resource, c(list(name), dots, list(
              parent. = FALSE, children. = TRUE,
              exclude. = c(engine$root(), exclude.),
              defining_environment. = defining_environment.)
            ))
          },
          otherwise = super$resource(name, ...,
                                     defining_environment. = defining_environment.)
        )
      } else { # if (!missing(engine))

It is possible to “pluck” a resource explicitly out of a utility engine using the engine parameter (if the engine is not mounted).

                      stopifnot(is.character(engine))
        engine_name <- engine
        engine <- private$sanitize_engine(engine)
        if (!engine$exists(name, ..., parent. = FALSE)) {
          stop("No resource ", sQuote(name), " exists in engine ", sQuote(engine_name), ".")
        }

Extract the resource directly from the utility engine.

                      return(engine$resource(name, ..., parent. = FALSE))
      }
    },

Finding resources (as opposed to compiling them) follows a similar pattern: we first check the parent, then self, then the children. However, unlike $resource, the $find method must traverse all of them and take their union, similarly to the difference between calling Reduce with and without the accumulate parameter. In some sense, the find operation is a tree traversal pattern with the action of “find the resources matching these arguments” and a final reduce step.

                  find = function(..., parent. = FALSE, children. = FALSE, exclude. = NULL,
                    check_duplicates. = FALSE, tag_engine. = FALSE) {

      resources <- private$traverse_tree(parent = parent., children = children.,
        exclude = exclude., accumulate = TRUE,
        on_parent = self$.parent$find(..., parent. = TRUE, children. = TRUE,
          exclude. = c(list(self$root()), exclude.), tag_engine. = tag_engine.),
        on_self = {

This is a bit of a subtle point. We tag the resources with the name of the engine so that we can later avoid some problems related to multiple inheritance

                        self_resources <- super$find(...)
          if (is.character(tag_engine.)) {
            `names<-`(self_resources, rep(tag_engine., length(self_resources)))
          } else {
            self_resources
          }
        },
        on_child = function(engine, name) {
          tag <- if (!identical(tag_engine., FALSE)) name else FALSE
          engine$find(..., parent. = FALSE,
            children. = TRUE, exclude. = exclude., tag_engine. = tag)
        }
      )

      if (isTRUE(children.)) {
        if (isTRUE(check_duplicates.)) {

If we are checking child engines, we take this opportunity to ensure none of the mounted engines have shared resources. For example, if two engines both have a foo/bar.R file, this would cause terrible conflicts related to multiple inheritance: if the engine1 asked for foo/bar.R, then since the tree
traversal asks the parent firsts, which asks its non-engine1 children, then engine1 would receive foo/bar.R from engine2. Conversely, engine2 would receive engine1's foo/bar.R, probably leading to all sorts of perverse bugs. In the future it may be possible to specify canonical preferences in configuration but for now we disable shared resources all together:

All engines must be disjoint.

(Except insofar as they share resources from a common child engine, as in a diamond graph.)

                        resources[[length(resources)]] <- Filter(Negate(is.null), resources[[length(resources)]])
          detect_duplicate_resources(resources[[length(resources)]])
        }
        resources[[length(resources)]] <- unname(resources[[length(resources)]])
      }

We revert the list to ensure children resources mask self and parent resources (or, for example, we would get a conflict in a simple diamond with two engines depending on a base engine, which has config/application.R and config/engines.R).

                    resources <- c(recursive = TRUE, rev(resources))

Using unique here would drop names!

                    resources[!duplicated(resources)]
    },

    exists = function(resource, ..., parent. = TRUE, children. = TRUE, exclude. = NULL) {

Check if a resource exists is a straightforward tree traversal pattern: check the parent, then self, then the children, and return TRUE as soon as any find them. Otherwise, return FALSE.

                    private$traverse_tree(parent = parent., children = children.,
        exclude = exclude., exists_args = list(resource, ...),
        on_parent = TRUE, on_self = TRUE,
        on_child = function(engine) TRUE, otherwise = FALSE)
    }
  ),

This forms a collection of private helper methods.

                private = list(
    has_parent = (has_parent <- function() { !is.null(self$.parent) }),
    is_root    = Negate(has_parent),
    mounted_engines = function() {
      Filter(function(e) isTRUE(e$mount), self$.engines)
    },

We convert an engine from string representation to a syberia_engine object.

                  sanitize_engine = function(engine, utility = TRUE) {
      if (!is.simple_string(engine)) {
        stop(m("sanitize_engine_class"), call. = FALSE)
      }
      if (!is.element(engine, names(self$.engines))) {
        stop(m("sanitize_engine_no_engine", engine = engine), call. = FALSE)
      }
      engine_obj <- self$.engines[[engine]]$engine
      if (isTRUE(utility) && isTRUE(engine_obj$mount)) {
        stop(m("sanitize_engine_mounting_conflict", engine = engine), call. = FALSE)
      }
      engine_obj
    },

The master helper that defines $find, $resource, and $exists. A little many arguments for my usual taste, but they make sense here: we specify whether we would like to operate on the parent or children first as scalar bools, followed by expressions for on_parent and on_self and a function for on_child. We use a function for the latter since it may operate on multiple children whereas the parent and self are singletons.

If the on_child function has arity 1 it receives the syberia_engine object as its argument. On arity 2, it also receives the name.

Finally, we can specify which arguments to pass to director$exists when checking for resource existence as well as which engines to exclude. If this is missing, resources will not be checked for existence prior to running parent, self, or child actions.

Finally, if accumulate = TRUE, we return a list of the three results (parent, self, children) with a list of multiple results per child as the third element.

                  traverse_tree = function(parent, children, on_parent, on_self, on_child, otherwise,
                             exists_args, exclude, accumulate = FALSE) {

      record <- if (isTRUE(accumulate)) {
        result <- list()
        function(value, subindex) {
          if (missing(subindex)) {
            result[[length(result) + 1]] <<- value
          } else {
            result[[length(result)]][[subindex]] <<- value
          }
        }
      } else { `return` }

      check_exists <- !missing(exists_args)

      if (check_exists) {
        full_exists_args <- c(exists_args, exclude. = c(list(self$root())),
          parent. = TRUE, children. = TRUE)
      }

      if (isTRUE(parent) && private$has_parent()) {

A little tricky here: If we do not check for existence, always run the on_parent action, otherwise only check it if the parent engine has the resource.

                      if (!check_exists || do.call(self$.parent$exists, full_exists_args)) {
          record(eval.parent(substitute(on_parent)))
        }
      }

Similarly, if we do not check for existence, always run the on_self action, otherwise only check it if the current engine has the resource.

                    if (!check_exists || do.call(super$exists, exists_args)) {
        record(eval.parent(substitute(on_self)))
      }
      
      if (isTRUE(children)) {

The third element returned by the tree traversal when accumulate = TRUE will be a list of children.

                      if (isTRUE(accumulate) && length(formals(on_child)) > 1L) {
          record(list())
        }
        if (check_exists) {

No longer ask the parent during existence checks (or we would have infinite loops!).

                        full_exists_args$parent. <- FALSE
        }
        engines <- private$mounted_engines()
        for (i in seq_along(engines)) {
          name   <- names(engines)[i]
          engine <- engines[[i]]$engine

          if (!any(vapply(exclude, should_exclude, logical(1), engine))) {
            if (!check_exists || do.call(engine$exists, full_exists_args)) {
              if (length(formals(on_child)) == 1L) {
                record(on_child(engine))
              } else {
                record(on_child(engine, name), name)
              }
            }
          }
        }
      }

      if (isTRUE(accumulate)) {
        result
      } else {

Note that if accumulate = FALSE, we should never get here if on_parent, on_self, or on_child had a return action.

                      eval.parent(substitute(otherwise))
      }
    }
  )
)

A rather technical helper function to ensure that child engines of a common parent engines do not share resources.

For example, if two engines both have a foo/bar.R file, this would cause terrible conflicts related to multiple inheritance: if the engine1 asked for foo/bar.R, then since the tree
traversal asks the parent firsts, which asks its non-engine1 children, then engine1 would receive foo/bar.R from engine2. Conversely, engine2 would receive engine1's foo/bar.R, probably leading to all sorts of perverse bugs. In the future it may be possible to specify canonical preferences in configuration but for now we disable shared resources all together:

All engines must be disjoint.

(Except insofar as they share resources from a common child engine, as in a diamond graph.)

              detect_duplicate_resources <- function(resource_list) {
  if (length(unique(c(recursive = TRUE, resource_list))) !=
      sum(vapply(resource_list, length, numeric(1)))) {
    # Allow for comparison of engine-resource pairs.
    engined_resource_list <- lapply(resource_list, function(resources) {
      setNames(Map(paste, names(resources), resources, sep = "\1"), resources)
    })

    pairs <- combn(length(resource_list), 2)
    conflict_list <- Filter(Negate(is.null), apply(pairs, 2, function(pair) {
      same_subengine <- engined_resource_list[[pair[1]]][
        engined_resource_list[[pair[1]]] %in% engined_resource_list[[pair[2]]]]
      common <- setdiff(intersect(resource_list[[pair[1]]], resource_list[[pair[2]]]),
                        names(same_subengine))
      if (length(common) > 0) {
        list(
          engines   = names(resource_list)[pair],
          resources = common
        )
      }
    }))

    if (length(conflict_list) > 0L) {
      conflicts <- do.call(paste, lapply(conflict_list, function(conflict) {
        paste0("\n\nEngine ", crayon::red(conflict$engines[1]), " and ",
               crayon::red(conflict$engines[2]), " share:\n",
          paste(collapse = "\n", paste(" *",
            vapply(conflict$resources, crayon::yellow, character(1)))))
      }))

      stop("Mounted child engines have conflicting resources. Please mount ",
           "at a different root or remove the conflicts: ", conflicts, "\n\n")
    }
  }
}

engine.R

Syberia is a framework that was crafted specifically for the R programming language. The story is similar to JavaScript: a single-purpose language meant to be used in a particular context exploded in popularity and found itself in a host of other settings.

However, R is really just LISP under the hood–a general purpose symbolic computing runtime. To prove that this makes R a viable production language for almost any purpose whatsoever, Syberia takes the first step of providing a hierarchical structure for R projects. Anything–from an empty directory, to an R package, to a collection of shiny dashboards–can be a Syberia project.

This flexibility is achieved by assuming nothing about the underlying directory structure: in other words, fixing no conventions, but letting the user decide what the right course of action should be with respect to structuring their project. Other tools fare poorly at this; for example, a package does not even let you have a hierarchical directory structure in its R/ subdirectory; what an affront to developer sanity!

How can Syberia claim to be a convention-over-configuration framework if it fixes no conventions? The answer is that while Syberia itself, the package you are currently examining, does not fix any conventions, the de facto approach to development within Syberia is as follows:

Pick an engine to build your work off of.
Build your project within the conventions set by your choice of engine.

It is possible to include multiple engines, or a whole complex potentially cyclic graph of engines, but this introduces the diamond problem and should generally be avoided until you have a good understanding of computer science foundations.

In other words, the Syberia approach is to be a meta-framework. By making no assumptions about the structure of your project, it is your choice which engine to build your project on depending on what suits your needs. This is especially helpful for machine learning projects, where the task and solution can take widely varying shapes depending on whether the problem is supervised, unsupervised, NLP, deep learning, etc.

The above 2-step approach is recursive. For example, the modeling engine, the default engine for most projects, is built off the base engine, which dictates that each project should have a config/routes.R file which links the lib/controllers directory to the rest of the project and tells you how R scripts in the project are to be parsed according to which directory they reside in. This is similar to object-oriented programming, except that it is strictly more general since it does not force you to treat every single thing in the world as an object.

Thus, the core structural unit is an engine. In order to bootstrap effectively, an engine should have a config/application.R file, potentially empty, so that Syberia can detect this is an engine, as well as a config/engines.R file indicating which engine this one depends on, if any. This is the sole convention and can be pretty easily deprecated in future versions of Syberia if there is demand for more flexible configuration.

Each engine is testable by design. Syberia exports a function called test_project that by default looks in the test/ directory of the engine and requires all resources have an accompanying test. If your resource (e.g., R script) does not have a test – it fails! In general, only well-written code is easily testable so this encourages both separating your project out into similar components through like directory structures as well as ensuring all inputs and outputs are what you expect them to be. (Not to mention saving you a huge headache of managing the complexity of a growing system!)

To construct a Syberia engine object, run syberia_project("/path/to/engine"). This is a syberia_engine R6 class and holds together everything that Syberia knows about your project. It has methods like $find, $resource, and $exists to play with the files in your project. In general, Syberia takes a page out of node.js's book and encourages all files to be structured so they have a single export: the last expression in the file. If you haven't replaced the default controller using the config/routes file, a topic we'll touch on later, you should be able to “compile” your resource using

value <- project$resource("relative/path/to/resource.R")

For example, if we had

# relative/path/to/resource.R
x <- 1
y <- 2
x + y

then value above would be 3. However, as we will see, the project$resource caller is capable of much more than sourcing a file.

              #' Bootstrap a Syberia engine.
#'
#' A Syberia engine defines the core re-usable structural unit across
#' different Syberia projects. In the same way that
#' \href{http://guides.rubyonrails.org/engines.html}{Rails engines}
#' provide a modular structure for \href{Rails}{http://rubyonrails.org/}
#' projects, Syberia engines serve as the re-usable rockbed upon which
#' to construct projects that contain similar components.
#'
#' A Syberia engine is managed by a \code{\link[director]{director}} object.
#' This object ensures that the engine cannot access resources outside of
#' its domain, and allows insularity from other engines and the top-level
#' project from which the engine will be used.
#'
#' @param filepath character. The root directory of the engine.
#'    If this directory does not define a (relative) \code{"config/application.R"}
#'    file, the parent directories of \code{filepath} will be traversed
#'    until such a file is found, or the function will error.
#'    By default, the current directory.
#' @param ... Additional arguments used internally.
#' @param root. logical. Whether or not this is a root-level engine,
#'    by default \code{TRUE}.
#' @export
#' @note The syberia package will maintain an internal cache of engines.
#'    Therefore, calling \code{syberia_engine} twice will retrieve the
#'    cached object. This cache is maintained in the \code{.syberia_env}
#'    environment object in the syberia package namespace.
#' @return The \code{\link[director]{director}} object responsible for
#'    managing the engine.
syberia_engine <- function(filepath = getwd(), ..., root. = TRUE) {
  project <- syberia_engine_(filepath, ...)
  if (isTRUE(root.)) {
    .syberia_env$active_project <- project
  }
  project
}

#' @rdname syberia_engine
#' @export
syberia_project <- syberia_engine

#' The current active Syberia project.
#'
#' @return A \code{syberia_engine} object or \code{NULL}.
#' @export
active_project <- function() {
  .syberia_env$active_project
}

syberia_engine_ <- function(filepath, ...) {
  UseMethod("syberia_engine_", filepath)
}

syberia_engine_.pre_engine <- function(filepath, ...) {
  build_engine(filepath)
}

syberia_engine_.character <- function(filepath, ...) {
  syberia_engine_character(filepath, ...)
}

syberia_engine_character <- function(filepath, cache = TRUE) {

If a user gives ~/foo/bar/baz as the path and the project's root is in fact ~/foo (in other words, if they give a file or subdirectory in the project), this should be inferrable. We traverse the parent directories until we hit the root of the file system to see if we are in a syberia engine

                traverse_parent_directories(normalizePath(filepath, mustWork = FALSE), function(filepath) {

If we are caching the precomputed syberia_engine object, simply fetch it from the .syberia_env helper environment.

                  if (isTRUE(cache) && has_application_file(filepath)) {

If it is not cached, call build_engine on the directory.

                    .syberia_env[[filepath]] <- .syberia_env[[filepath]] %||% build_engine(filepath)
    } else if (has_application_file(filepath)) {
      build_engine(filepath)
    }

If we did not find a syberia engine, we just crash here.

                }, error = sprintf("No syberia engine found at %s", sQuote(crayon::red(filepath))))
}

A syberia engine with root root is, by definition, a directory of R files containing the file root/config/application.R or any variation thereof (e.g., root/config/application/application.r). This is how we recognize the engine. (Alternatively, if it has a config/engines resource.)

              has_application_file <- function(filepath) {
  extensions <- c(".R", ".r", "/_.R", "/_.r")
  files <- function(...) {
    c(recursive = TRUE, lapply(list(...), function(type) {
      paste0(file.path(filepath, "config", type),
             gsub(fixed = TRUE, "_", type, extensions))
    }))
  }
  any(file.exists(files("application", "engines")))
}

#' Build a syberia engine.
#'
#' @param buildable character. A file path from which to build the engine.
#' @export
build_engine <- function(buildable) {
  UseMethod("build_engine")
}

#' @export
build_engine.pre_engine <- function(buildable) {
  dir <- engine_dir(buildable$prefix)
  if (!file.exists(dir)) buildable$builder(dir)
  syberia_engine(dir, cache = FALSE, root. = FALSE)
}

#' @export
build_engine.character <- function(buildable) {

To build an engine, we bootstrap an otherwise bare syberia_engine R6 object. Bootstrapping an engine is explained below.

                bootstrap_engine(syberia_engine_class$new(buildable))
}

engine_dir <- function(dir) {
  file.path(engine_location(), dir)
}

The location where Syberia will keep copies of downloaded engines.

              engine_location <- function() {
  path <- engine_location_path()
  if (!file.exists(path)) {

Ensure we can actually create the engine storage location.

                  if (!dir.create(path, FALSE, TRUE)) {
      stop(m("engine_location_mismatch", path = path), call. = FALSE)
    }
  }
  path
}

By default, Syberia will store engine code in ~/.R/.syberia/engines.

              engine_location_path <- function() {
  Sys.getenv("SYBERIA_ENGINE_LOCATION") %|||%
  getOption("syberia.engine_location", "~/.R/.syberia/engines")
}

To build an engine, we bootstrap an otherwise bare syberia_engine R6 object. Bootstrapping an engine consists of

Registering preprocessors for config/boot and config/engines.
Registering a parser for config/engines.
Executing config/engines followed by config/boot.
Ensure that the ensure has no conflicting resources with any other engine.
Set the cache entry bootstrapped on the engine to indicate it is built.

The notion of preprocessor and parser belongs to director. In fact, every syberia_engine inherits from director, so every syberia_engine has at least the same capabilities as a director object. (Rails experts may notice the similarity between Syberia engines and directors, and Rails engines and railties.)

Registering a preprocessor means we do additional stuff before sourcing that file. For example, in config/engines, we provide a helper function called engine for registering subengines. Registering a parser means we do additional stuff after preprocessing and sourcing that file. For example, in config/engines we need to actually build and mount the engines referred to in the fil with the engine helper function.

In order to avoid the diamond problem, Syberia ensures that engines do not share resources unless they come from a common base engine. This is a technical issue that will eventually be explained in a more thorough whitepaper. (For the confused CS students asking why I couldn't have simply made engines form a DAG instead of a possibly cyclic ## graph–the short answer is that engines need to trivially modularizable while ## retaining the power to refer to parent engine resources dynamically and just-in-time. I may very well regret this decision.)

Each director, and by proxy syberia_engine, has an internal cache for memoising run-time state while the R process is running (this is different from a director's registry, which captures persistent cross-session state on disk). We simply set the bootstrapped key to TRUE so we can check for it later.

              bootstrap_engine <- function(engine) {
  if (isTRUE(engine$cache_get("bootstrapped"))) return(engine)
  engine$register_preprocessor("config/boot",    boot_preprocessor)
  engine$register_preprocessor("config/engines", engine_preprocessor)
  engine$register_parser      ("config/engines", engine_parser)

  exists <- function(...) engine$exists(..., parent. = FALSE, children. = FALSE)
  if (exists("config/engines")) engine$resource("config/engines")
  if (exists("config/boot"))    engine$resource("config/boot")

  # Check for duplicate resources in mounted child engines.
  engine$find(check_duplicates. = TRUE, children. = TRUE, tag_engine. = TRUE)
  engine$cache_set("bootstrapped", TRUE)
  engine
}

The boot_preprocessor is referred to above in the boostrap_engine function. We are saying that the config/boot file should have access to the director object, and nothing more.

              boot_preprocessor <- function(source, source_env, director) {
  source_env$director <- director
  source()
}

The config/engines file is an epsilon harder. Basically, we use the engine helper function to record the user's requested engine mounting until we get to the parser below.

              engine_preprocessor <- function(source, source_env, preprocessor_output, director) {
  if (isTRUE(director$cache_get("bootstrapped"))) return()

  preprocessor_output$engines <- new.env(parent = emptyenv())
  source_env$engine <- function(name, ...) {
    preprocessor_output$engines[[name]] <- list(...)
  }
  source()
}

Now that we have collected the engines to be mounted into the preprocessor_output helper (which also came from director),

              #
engine_parser <- function(director, preprocessor_output) {
  if (isTRUE(director$cache_get("bootstrapped"))) return()

For each engine mentioned using the engine helper in config/engines, we register the engine. This means that we establish member variables on the respective syberia_engine objects that allow them to understand the topology: the parent engine knows about its children, and each child knows about its parent. Syberia is all about family.

                for (engine in ls(preprocessor_output$engines, all = TRUE)) {
    register_engine(director, engine, parse_engine(preprocessor_output$engines[[engine]]),
                    mount = isTRUE(preprocessor_output$engines[[engine]]$mount))
  }

When we use devtools::load_all on director, it loads a symbol called exists; we use explicit base namespacing to prevent conflicts during development.

                if (base::exists(".onAttach", envir = input, inherits = FALSE)) {

A little additional trick is that the user could have specified an .onAttach function in the config/engines file. If this exists, we store it in the syberia_engine cache and later invoke it.

                  onAttach <- input$.onAttach

    if (!is.function(onAttach)) {
      stop(m("onattach_failure", root = director$root(), klass = class(onAttach)[1L]),
           call. = FALSE)
    }

We use list2env to “inject” the director local variable into the scope of the onAttach hook.

                  environment(onAttach) <- list2env(
      list(director = director),
      parent = environment(onAttach) %||% baseenv()
    )
    director$cache_set(".onAttach", onAttach)
  }
  NULL
}

Registering an engine means making the parent aware of its child and the child aware of its parent. Mounting the engine means we will be treating a collection of engines, each in potentially very different directories on the file system, as one giant project. This allows us to pull out a subset of Syberia resources and “enginify” them with ease.

Although Syberia does face a few theoretical challenges, these have solutions, and the end result is that given a Syberia project with N resources, there are 2^N possible engines extractable from that project: one for each subset of resources. One of Syberia's goals is to make it as easy as possible to pull out your work so others can re-use it.

              register_engine <- function(director, name, engine, mount = FALSE) {
  message(crayon::green(paste("...Mounting", name, "engine.")))

  # TODO: (RK) Replace with $engines private member after R6ing.
  if (!director$cache_exists("engines")) {
    director$cache_set("engines", new.env(parent = emptyenv()))
  }
  env <- director$cache_get("engines")
  env[[name]] <- engine

  director$register_engine(name, engine, mount = mount)

  if (engine$cache_exists(".onAttach")) {

This is where we invoke the .onAttach hook registered in the engine_parser.

                  engine$cache_get(".onAttach")(director)
  }
}

parse_engine <- function(engine_parameters) {

By default, download engines from GitHub.

                engine_parameters$type <- engine_parameters$type %||% "github"

  if (!is.simple_string(engine_parameters$type)) {
    stop(sprintf("When defining an engine, please provide a string for the %s",
                 sQuote("type")), call. = FALSE)
  }

  parser <- paste0("parse_engine.", engine_parameters$type)

When we use devtools::load_all on director, it loads a symbol called exists; we use explicit base namespacing to prevent conflicts during development.

                if (!base::exists(parser, envir = getNamespace("syberia"), inherits = FALSE)) {
    stop(sprintf("Cannot load an engine of type %s",
                 sQuote(crayon::red(engine_parameters$type))))
  }
  syberia_engine(get(parser, envir = getNamespace("syberia"))(engine_parameters),
                 cache = FALSE, root. = FALSE)
}

parse_engine.github <- function(engine_parameters) {

GitHub-derived engines need to provide a repo and version (by default “master”). For example, the following indicates we wish to load the “bobbleheads” engine from GitHub “jimbob/bobbleheads.sy” off the master branch.

# config/engines.R
engine("bobbleheads", type = "github", repo = "jimbob/bobbleheads.sy")

Syberia will download and cache the engine.

                repo    <- engine_parameters$repo %||% engine_parameters$repository
  # TODO: (RK) Checking for updates?
  version <- engine_parameters$version %||% "master"
  PAT     <- github_pat()
  if (!is.null(PAT)) {
    base_url <- sprintf("https://%s@github.com/%s.git", PAT, repo)
  } else {
    base_url <- sprintf("https://github.com/%s.git", repo)
  }
  stopifnot(is.simple_string(repo))

  pre_engine(prefix = file.path("github", repo, version),
    builder = function(filepath) {
      status <- system2("git",
        c("clone", base_url, filepath,
        "--branch", version, "--depth", "1", "--quiet"))
      stopifnot(status == 0)
    })
}

Putting the following in config/engines.R indicates we wish to load the “bobbleheads” engine from GitHub “jimbob/bobbleheads.sy” off the master branch.

# config/engines.R
engine("bobbleheads", type = "local", path = "~/dev/bobbleheads")

              parse_engine.local <- function(engine_parameters) {
  path <- engine_parameters$path %||% stop("Please provide an engine path")
  if (!file.exists(path)) stop("The path ", sQuote(path), " does not exist.")
  path
}

pre_engine <- function(prefix, builder) {

Build a pre_engine S3 object.

                structure(list(prefix = prefix, builder = builder), class = "pre_engine")
}

#' Whether to exclude a syberia engine from being used for resourcing.
#'
#' @param condition logical. Some condition.
#' @param engine syberia_engine. Engine object.
#' @export
should_exclude <- function(condition, engine) {
  UseMethod("should_exclude")
}

#' @export
should_exclude.syberia_engine <- function(...) { identical(...) }

#' @export
should_exclude.character <- function(condition, engine) {
  identical(condition, engine$root())
}

messages.R

A dictionary of messages used by the package. We separate these into its own file to avoid cluttering the R code with a multitude of strings.

              messages <- list(
  test_engine_type_error = c(
    "Please pass a ", sQuote(crayon::yellow("syberia_engine")), " object ",
    "to the ", sQuote(crayon::yellow("test_engine")), " function."
  ),
  
  test_hook_no_engine = c(
    "To fetch the {{{type}}} hook for a project, please pass in a syberia_engine ",
    "object (the syberia_engine for the syberia project). Instead I got ",
    "an object of class {{{klass}}}."
  ),

  test_hook_invalid_format = c(
    "Test {{{type}}} hooks must be a function or a list of functions.\n\nIn ",
    "{{{filename}}}, ensure that ",
    "you have ", sQuote(crayon::yellow("{{{type}}} <- some_function")),
    " as right now it's an object of class ",
    sQuote(crayon::red("{{{klass}}}"))
  ),

  test_hook_arity_error = c(
    "Test {{{type}}} hooks must all be functions that take at least one ",
    "argument.\n\nThe first argument will be an environment that has one ",
    "key, ", sQuote("project"), ". In {{{filename}}}",
    " ensure your ", sQuote(crayon::yellow("{{{type}}}")),
    " local variable meets this constraint."
  ),

  engine_location_mismatch = c(
    "Syberia needs a directory in which to place the code for ",
    "dependencies. Please ensure ", sQuote(crayon::red("{{{path}}}")),
    " is writable, or set a different path in the ",
    sQuote(crayon::yellow("SYBERIA_ENGINE_LOCATION")), " environment variable or ",
    "the ", sQuote(crayon::yellow("syberia.engine_location")), " global option (",
    "using ", sQuote(crayon::magenta("options(syberia.engine_location = \"some/dir\")")), ")."
  ),

  onattach_failure = c(
    "The ", sQuote(crayon::yellow(".onAttach")), " hook in the ",
    sQuote(crayon::yellow("{{{root}}}")), " engine must be a *function*. ",
    "Instead, I got a {{{klass}}}."
  ),

  sanitize_engine_class = c(
    "When sourcing a resource, please pass a string to the ",
    sQuote("engine"), " parameter."
  ),

  sanitize_engine_no_engine = c("There is no engine called ", sQuote("{{{engine}}}"), "."),

  sanitize_engine_mounting_conflict = c(
    "Explicit engine specification during resource sourcing is ",
    "only allowed on unmounted engines. The ", sQuote("{{{engine}}}"), 
    " engine is a mounted engine."
  )
)

Cleanse the message a little after fetching it from the messages list.

              msg <- function(name) {
  stopifnot(name %in% names(messages))

The gsub will squish multiple spaces into a single space, while the paste(collapse = "", ...) usage will ensure we can take vectors of characters in the above messages list.

                paste(collapse = "", gsub("[ ]+", " ", messages[[name]]))
}

We use the whisker templating engine to inject any additional values into the message string. For example,

m("parse_mungepiece_dual_error", error = "Bloop")

would return the appropriate error with the string “Bloop” injected in the appropriate place.

              m <- function(name, ...) {

Note the use of do.call, a very handy R metaprogramming tool when we do not know exactly which arguments we will pass.

                do.call(whisker::whisker.render, list(msg(name), list(...)))
}

package.syberia.R

              #' Syberia is a development meta-framework for R.
#'
#' In its original formulation, in conjunction with
#' \href{http://github.com/syberia/modeling.sy}{the modeling engine},
#' Syberia serves as a machine learning classifier development framework.
#'
#' The modeling engine provides an opinionated unified framework for
#' fast iteration on classifier development and deployment. It is
#' founded on convention over configuration and aims to solve the
#' problems of classifier-specific data preparation and
#' classifier-specific modeling parameters.
#'
#' The more general vision for Syberia is still in progress, but aims
#' to unify the currently disparate realms of R packages, script codebases,
#' Shiny dashboards, R web apps, and reproducible analysis. In the
#' viewpoint of the author, R is syntactic sugar around LISP, which
#' enables arbitrary computation; Syberia is an attempt to support this
#' conjecture by allowing the construction of arbitrary software 
#' projects within the R programming language, thereby finally outgrowing
#' its long-overdue misconception as a statistical tool.
#'
#' @name syberia;
#' @docType package
#' @author Robert Krzyzanowski
#' @import testthat stagerunner director R6 whisker
NULL

An environment used for caching some Syberia-managed objects, like directors.

              .syberia_env <- new.env(parent = emptyenv())

test.R

Like any good development framework, Syberia offers built-in support for testing. The primary unit of development in Syberia is the engine. All Syberia projects are composed of a collection of engines (each potentially depending on vanilla R packages).

This function allows one to run all the tests associated with the engine. By default, the files in the test directory of the engine are considered tests, and all other files are non-tests.

The convention is straightforward: if you have a Syberia resource a/b/c off the root of the project, you should have an accompanying test in test/a/b/c. Note we leave out the .R extension as the resource may be an idempotent or non-idempotent resource.

Idempotent resources are .R files in a directory with the same name as the file without extension. For example, test/a/b/c/c.R would be an idempotent resource, and helper files like test/a/b/c/helper.R would be invisible to the Syberia engine: this encourages clean design and separation into helper files as your resource becomes more complex.

Here is an example test. You can look at the accompanying helper project

# calculations/pi.R
# Compute pi using a series: http://functions.wolfram.com/Constants/Pi/06/01/01/
Reduce(`+`, 4 * vapply(seq_len(1000) - 1, function(k) {
  (-1)^k / (2 * k + 1)
}, double(1)))

# test/calculations/pi.R
test_that("the calculation is close to pi", {
  # Note we have access to the testthat package.
  expect_less_than(abs(resource() - 3.1415926), 1e-3)
})

We can execute the test using test_engine(root) where root is the string representing the directory the above files are contained in.

Note that Syberia provides the resource helper to fetch the current resource being tested. You could pass a first argument to fetch another resource, but if you leave it empty, the default will always be the resource corresponding to the tested resources. Thus, if you are in test/calculations/pi, calling resource() will build you calculations/pi.

It is possible to add test setup and teardown hooks. This means that before the test suite runs, you can add additional conditions to ensure your project is working as intended. For example, the author has found it useful to add checks for README.md files in all directories to encourage the team to always add documentation (or else your test suite breaks!).

To add a test setup hook, simply place a function or a list of functions into the setup local variable in the file config/environments/test.R (note this is controlled by the config parameter to test_engine. Additional hooks are teardown (for when the test suite finishes), single_setup (every time a test runs), and single_teardown (every time a test finishes). Any functions or lists of functions passed to these locals should have only one argument: the received value will be an environment containing the key resource, a string representing the current resource being tested, and the key project, pointing to the syberia_engine object for the project (or engine).

For example, one could include the following configuration file in a syberia project.

# config/environments/test.R
setup    <- function(env) { cat("Beginning test suite!\n") }
teardown <- function(env) { cat("Ending test suite!\n") }
single_setup    <- function(env) {
  cat(paste("Testing resource", env$resource, "\n")) }
single_teardown <- function(env) {
  cat(paste("Tested resource", env$resource,  "\n")) }

Running test_engine(...) on this project will cause the first message to be printed, then the last two to alternate between each test, before finally wrapping up with the second message.

              #' Run all tests in a syberia project or engine.
#'
#' The tests that will be run are all those in the \code{test} subdirectory
#' of the root of the syberia engine, unless otherwise specified.
#'
#' It is possible to introduce additional behavior prior to and after tests.
#' This can be used to perform additional testing not covered by sourcing
#' all files in the "test/" directory of the syberia engine.
#'
#' To provide a setup or teardown hook, simply place a function or list of
#' functions in a local variable \code{setup} or \code{teardown}, respectively,
#' in \code{config/environments/test} relative to the root of the syberia engine,
#' or pass the relevant \code{setup} or \code{teardown} parameters to this function.
#'
#' For example, creating a file \code{config/environments/test.R} with the code
#' \code{setup <- function(env) cat("Running all tests.")} will print a message
#' before all the tests are run. The one parameter the function must take is an
#' environment which will contain a single key, \code{director}, pointing to the 
#' object returned by calling \code{\link{syberia_engine}}.
#'
#' @param engine syberia_engine. The syberia engine to test.
#'    If a \code{character}, it will be passed to \code{\link{syberia_engine}} first.
#' @param base character. Any subdirectory to test specifically. By default,
#'    \code{"test"}.
#' @param config character. The relative path to the configuration resource,
#'    by default \code{"config/environments/test"}.
#' @param ignored_tests character. The list of tests to ignore, by default
#'    the local variable \code{ignored_tests} extracted from the configuration
#'    resource specific by the \code{config} parameter.
#' @param optional_tests character. The list of tests to ignore, by default
#'    the local variable \code{optional_tests} extracted from the configuration
#'    resource specific by the \code{config} parameter.
#' @param required logical. Whether or not all tests are required to have resources,
#'    by default \code{TRUE}. If \code{TRUE}, the \code{ignored_tests}
#'    resources will not be required to have an accompanying test. It is highly
#'    recommended that all your projects have full test coverage.
#' @param reporter character. The testthat package test reporter to use. The
#'    options are \code{c("check", "list", "summary", "minimal", "multi", "rstudio",
#'    "silent", "stop", "tap", "teamcity")}, with the default being \code{"summary"}.
#' @param error_on_failure logical. Whether or not to raise an error
#'    if there are any failures. By default, \code{TRUE}.
#' @seealso \code{\link{syberia_engine}}
#' @export
#' @return A list of \code{testthat_results} objects giving the details for
#'    the tests executed on each tested resource. If \code{error_on_failure}
#'    is \code{TRUE}, error instead if there are any failures.
test_engine <- function(engine = syberia_engine(), base = "test",
                        config = file.path("config", "environments", "test"),
                        ignored_tests = ignored_tests_from_config(engine, base, config),
                        optional_tests = optional_tests_from_config(engine, base, config),
                        required = TRUE, reporter = c("summary", "check", "list",
                          "minimal", "multi", "rstudio", "silent", "stop", "tap", "teamcity")[1L],
                        error_on_failure = TRUE) {
  if (is.character(engine)) {
    engine <- syberia_engine(engine)
  }

  if (!is(engine, "syberia_engine")) {
    stop(m("test_engine_type_error"))
  }

  force(ignored_tests)

We will try to identify the resources to test by grabbing all non-test (i.e., not in the test directory) resources from the engine and also excluding those given by ignored_tests.

                tests <- find_tests(engine, base, ignored_tests)

If testing is required for the engine (i.e., every resource needs an accompanying test) fail unless the condition is satisfied.

                if (isTRUE(required)) {
    ensure_resources_are_tested(engine, tests, optional_tests, base)
  }

  results <- test_resources(engine, tests$active, config, reporter = reporter)

  if (isTRUE(error_on_failure)) {
    if (!all(vapply(results, getFromNamespace("all_passed", "testthat"), logical(1)))) {
      stop("Test failures", call. = FALSE)
    }
  }

  invisible(results)
}

#' Run the tests on a given set of resources.
#'
#' @param engine syberia_engine. The engine to run the tests on.
#' @param tests character. The character vector of resources to test.
#' @param ... Additional arguments to pass to \code{find_test_hook}.
#' @param reporter character. The test reporter to use.
#' @return The testthat result summary for this one test run.
test_resources <- function(engine, tests, ..., reporter) {

We ensure testthat and testthatsomemore are installed.

                ensure_test_packages()

The testthat package does not export find_reporter, so we grab it with a trick.

                reporter <- getNamespace("testthat")$find_reporter(reporter)

We are mimicking testthat's test_files.

                reporter$start_reporter()

  results <- NULL

We would like to error if any global options have been modified, or global variables introduced.

Syberia resources should be stateless and not modify global options!

                setup_hook <- find_test_hook(engine, type = "setup", ...)
  if (!is.null(setup_hook)) setup_hook$run()

  single_setup    <- find_test_hook(engine, type = "single_setup", ...)
  single_teardown <- find_test_hook(engine, type = "single_teardown", ...)
  ensure_no_global_variable_pollution(check_options = TRUE, {
    results <- lapply(tests, test_resource, engine = engine, reporter = reporter,
                      setup = single_setup, teardown = single_teardown)
  })

  teardown_hook <- find_test_hook(engine, type = "teardown", ...)
  if (!is.null(teardown_hook)) teardown_hook$run()

  reporter$end_reporter()

Again mimicking testthat.

                invisible(getNamespace("testthat")$testthat_results(results))
}

ensure_test_packages <- function() {
  ensure_installed("testthat")
  ensure_testthatsomemore()
}

#' Run the tests for a single resource.
#'
#' @param engine syberia_engine. The engine to run the test on.
#' @param resource character. The resource to test.
#' @param setup stageRunner. A \code{\link[stagerunner]{stageRunner}} to
#'    execute setup hooks for this test.
#' @param teardown stageRunner. A \code{\link[stagerunner]{stageRunner}} to
#'    execute teardown hooks for this test.
#' @param reporter reporter. A testthat reporter object. 
#' @return The testthat result summary for this one test run.
test_resource <- function(engine, resource, setup, teardown, reporter) {
  result <- NULL

Each resource itself should not add global variables or pollute options.

                ensure_no_global_variable_pollution(check_options = TRUE, {
    if (!missing(setup) && !is.null(setup)) {
      setup$.context$resource <- resource
      setup$run()
    }

    call_args <- list(resource, recompile = TRUE, recompile. = TRUE)
    if (!missing(reporter)) {
      call_args$reporter <- reporter
    }
    result <- suppressMessages(do.call(engine$resource, call_args))

    if (!missing(teardown) && !is.null(teardown)) {
      teardown$.context$resource <- resource
      teardown$run()
    }

The desc parameters allows us to be specific when we fail: it will inform the user the name of the resource that created a global variable or modified a global option.

                }, desc = paste("running", resource))

  result
}

#' Fetch the test setup or teardown hook, if any exists.
#'
#' The resource \code{config/environments/test} should contain a local variable
#' \code{setup} or \code{teardown} that has a function or list of functions to
#' be incorporated into a stageRunner that will run the actual test setup
#' or teardown.
#'
#' The seed environment for the stageRunner will contain the director object
#' of the relevant project in the key \code{director}.
#'
#' @param engine syberia_engine. The director for the syberia project.
#' @param type character. Must be \code{'setup'} or \code{'teardown'}, the former
#'   being the default.
#' @param config character. The resource used to fetch configuration.
#' @seealso \code{\link{test_engine}}
#' @return a stageRunner that will run the relevant setup or teardown hook(s).
find_test_hook <- function(engine, type = "setup", config) {
  if (!is(engine, "syberia_engine")) {
    stop(m("test_hook_not_engine", type = type, klass = class(engine)[1L]), call. = FALSE)
  }

  hooks <- value_from_config(engine, config, type)
  if (is.null(hooks)) return(NULL)

  # TODO: (RK) Maybe replace this with a new stageRunner method to check 
  # argument validity? In the future, stageRunner could maybe do more!
  colored_filename <- sQuote(crayon::blue(config))
  if (!is.list(hooks) && !is.function(hooks) && !stagerunner::is.stagerunner(hooks)) {
    stop(m("test_hook_invalid_format", type = type, filename = colored_filename,
           klass = class(hooks)[1L]), call. = FALSE)
  }

  if (!is.list(hooks)) { hooks <- list(hooks) }

  all_have_correct_arity <- stagerunner::is.stagerunner(hooks) ||
    all(rapply(hooks, how = "unlist", function(hook) {
      is.function(hook) && length(formals(hook)) > 0
    }))

  if (!all_have_correct_arity) {
    stop(m("test_hook_arity_error", type = type, filename = colored_filename,
           type = type), call. = FALSE)
  }

  # Do not give access to global environment to ensure modularity.
  hook_env <- list2env(list(project = engine), parent = parent.env(globalenv()))
  stagerunner::stageRunner(hook_env, hooks)
}

 
#' Check that all mandatory tested resources have tests.
#'
#' @param engine syberia_engine. The engine to check.
#' @param tests character. The tests to check. Must be a list with keys
#'     \code{"active"} and \code{"ignored"}.
#' @param optional character. A character vector of optional tests.
#' @param base character. The directory containing tests in the project, by
#'     default \code{"test"}.
#' @return Nothing, but error if not all resources have tests.
ensure_resources_are_tested <- function(engine, tests, optional, base = "test") {
  without_builtin_resources <- function(resources) {

We exclude the config and test directories.

                  resources[substring(resources, 1, 7) != "config/" &
              substring(resources, 1, 5) != paste0(base, "/")]
  }

  without_optional_resources <- function(resources) {

optional resources are actually substring matches!

                  Filter(function(resource) !any_is_substring_of(resource, optional), resources)
  }

  all_resources <- without_optional_resources(without_builtin_resources(engine$find()))

  # Error if any resources don't have tests.
  necessary_tests <- file.path(base, all_resources)
  missing_tests   <- setdiff(necessary_tests, c(tests$active, tests$ignored))
  if (length(missing_tests) > 0L) {
    stop(call. = FALSE, "Tests are missing for the following resources:\n\n",
         crayon::red(paste(gsub(paste0("^", base, "/"), "", missing_tests), collapse = "\n")))
  }
}
  

find_tests <- function(engine, base, ignored_tests) {
  all_tests <- engine$find(children. = FALSE, base = gsub("\\/$", "", base)) 
  tests     <- Filter(function(x) !any_is_substring_of(x, ignored_tests), all_tests)
  ignored_test_paths <- setdiff(all_tests, tests)

  list(
    active  = tests,
    ignored = setdiff(all_tests, tests)
  )
}

By default, we will fetch the test configuration from config/environments/test, but the location of the configuration file itself is configurable (see the config parameter to test_engine).

              test_environment_configuration <- 
  function(engine, path = file.path("config", "environments", "test")) {
    if (!engine$exists(path, children. = FALSE)) {
      list()
    } else {
      engine$resource(path, children. = FALSE)
    }
  }

If the configuration file, usually config/environments/test.R, has a local variable ignored_tests, the tests of the character vector of resource names will not be executed. This is useful when certain tests are broken or under development.

              ignored_tests_from_config <- function(engine, base, config) {
  file.path(base, 
    value_from_config(engine, config, "ignored_tests") %||% character(0)
  )
}

If the configuration file, usually config/environments/test.R, has a local variable optional_tests, the character vector of resource names will not be executed. This is useful when certain resources were never expected to have tests, such as those representing network connections, constants, or other things that don't really make sense to test.

Use with caution!

              optional_tests_from_config <- function(engine, base, config) {
  value_from_config(engine, config, "optional_tests") %||% character(0)
}

value_from_config <- function(engine, config, value) {
  test_environment_configuration(engine, config)[[value]]
}

utils.R

              #' Non-null selection operator.
#'
#' @name infix_or
#' @param x ANY. An R object. If \code{NULL}, return \code{y}.
#' @param y ANY. An R object. If x is \code{NULL}, return \code{y}.
#' @export
`%||%`   <- function(x, y) if (is.null(x)) y else x

`%|||%`  <- function(x, y) if (is.falsy(x)) y else x

is.falsy <- function(x) {
  identical(x, NULL) || identical(x, FALSE) || identical(x, "") ||
  length(x) == 0 || identical(x, 0)
}

#' Retrieve Github personal access token.
#'
#' Borrowed from \url{https://github.com/r-pkgs/remotes/blob/master/R/github.R#L23}
#' A github personal access token
#' Looks in env var \code{GITHUB_PAT}
#'
#' @keywords internal
#' @noRd
github_pat <- function() {
  pat <- Sys.getenv('GITHUB_PAT')
  if (!nzchar(pat)) { return(NULL) }

  message("Using github PAT from envvar GITHUB_PAT")
  pat
}

#' Fetch the current Syberia version.
#' @export
#' @return The version of the syberia package as \code{\link{package_version}}
#'   object.
syberia_version <- function() {
  utils::packageVersion("syberia")
}

package_exists <- function(name) {
  is.element(name, utils::installed.packages()[, 1])
}

ensure_installed <- function(package_name) {

Using requireNamespace is the de facto accepted approach here.

                if (!package_exists(package_name)) {
    stop("Please install ", crayon::yellow(package_name), ":\n\n",
         crayon::green(paste0("install.packages('", package_name, "')")), "\n", call. = FALSE)
  }
}

Testthatsomemore is an auxiliary package used for some testing utilities.

              ensure_testthatsomemore <- function() {
  if (package_exists("testthatsomemore")) return()
  ensure_installed("devtools")
  message("The package ", crayon::yellow("testthatsomemore"),
          " is not installed; installing from http://github.com/robertzk/testthatsomemore")
  withCallingHandlers({

We install it from GitHub if the user does not have it installed.

                  devtools::install_github("robertzk/testthatsomemore")
    requireNamespace("testthatsomemore", quietly = TRUE)
  }, error = function(e) {
    stop("The ", crayon::red("testthatsomemore"), " package failed to install. ",
         "Try manually: \n\n",
         crayon::green('devtools::install_github("robertzk/testthatsomemore")'), "\n\n",
         "The error was: ", paste(as.character(e), collapse = "\n"), call. = FALSE)
  })
}

as.list.environment <- function(env) {
  out <- base::as.list.environment(env)
  lapply(out, function(x) if (is.environment(x) && !is(x, "R6")) as.list(x) else x)
}

#' Ensure no global variables are polluted during an expression.
#'
#' If any global variables are removed or created, it will
#' give a descriptive error.
#'
#' @param expr expression. The R expression to evaluate
#' @param desc character. A string to add to "you modified global
#' @param check_options logical. Whether to check if any global options were changed.
#'   variables while [\code{desc} goes here]".
#' @return the output of the \code{expr}.
ensure_no_global_variable_pollution <- function(expr, desc, check_options = FALSE) {
  if (isTRUE(check_options)) old_options <- options()
  before <- ls(globalenv())

  out <- eval.parent(substitute(expr))

  after <- ls(globalenv())
  missing_desc <- missing(desc)
  shorten <- function(vars) if (length(vars) > 5) c(vars[1:5], "...") else vars
  message <- function(vars, type = "variables", action = "removed") {
    msg <- paste("Some global", type, "were", action)
    if (!eval.parent(quote(missing_desc))) msg <- paste(msg, "while", desc)
    msg <- paste0(msg, ": ", crayon::red(paste(vars, collapse = ", ")))
  }

  check_before_after <- function(before, after, type) {
    if (length(bads <- setdiff(before, after)) > 0) stop(message(bads, type = type))
    else if (length(bads <- setdiff(after, before)) > 0)
      stop(message(bads, type = type, action = "added"))
  }

  check_before_after(before, after, "variables")

  if (isTRUE(check_options) && !identical(new_options <- options(), old_options)) {
    before <- ls(old_options); after <- ls(new_options)
    check_before_after(before, after, "options")
    diffs <- vapply(before,
      function(name)! identical(old_options[[name]], new_options[[name]]), logical(1))
    stop("Some global options were modified: ",
         crayon::red(paste(names(which(diffs)), collapse = ", ")))
  }

  out
}

#' Perform an action repeatedly on parent directories until success or error.
#'
#' Given a \code{fn}, we may wish to run it on a \code{filepath}, determine
#' its success, and try again with the parent directory of \code{filepath},
#' until we obtain result that is not \code{NULL}. If this does not occur for
#' any parent directory, we halt with the string \code{error}.
#'
#' @param filepath character. The filepath to traverse along. The \code{fn}
#'    function will be called with \code{filepath} and its parent directories
#'    until it returns a result other than \code{NULL}.
#' @param fn function. A one-argument function called on \code{filepath} or
#'    its successive parent directories until a result other than \code{NULL}
#'    is returned, which will be the final return value.
#' @param error character or function. A string to error if \code{fn} returns
#'    \code{NULL} on all parent directories, or a one-argument function to
#'    execute (the argument received will be the initial \code{filepath})
#' @return The result of \code{fn} on the first parent directory of
#'   \code{filepath} on which it is not \code{NULL}.
traverse_parent_directories <- function(filepath, fn, error) {
  stopifnot(is.character(filepath), length(filepath) == 1, !is.na(filepath))
  stopifnot(is.function(fn), length(formals(fn)) >= 1)
  stopifnot(is.character(error) || is.function(error))

  path <- normalizePath(filepath, mustWork = FALSE)

As long as we have not hit the root directory, keep trying.

                while (!identical(dirname(path), path)) {
    result <- fn(path)
    if (!is.null(result)) return(result)
    path <- dirname(path)
  }

  if (is.character(error)) stop(error)
  else error(filepath)
}

order_by_key <- function(list) {
  list[order(names(list))]
}

is.simple_string <- function(obj) {
  is.character(obj) && length(obj) == 1 && !is.na(obj) && nzchar(obj)
}

#' Whether or not any substring of a string is any of a set of strings.
#'
#' @param string character.
#' @param set_of_strings character.
#' @return logical
#' @examples
#' stopifnot(syberia:::any_is_substring_of('test', c('blah', 'te', 'woo'))) # TRUE
#' stopifnot(!syberia:::any_is_substring_of('test', c('blah', 'woo'))) # FALSE
any_is_substring_of <- function(string, set_of_strings) {
  any(vapply(set_of_strings,
             function(x) substring(string, 1, nchar(x)) == x, logical(1)))
}

zzz.R

              .onAttach <- function(...) {
  if (!isTRUE(getOption("syberia.silent"))) {
    packageStartupMessage(paste0("Loading ", crayon::red("Syberia"), "...\n"))
  }

  # Load better trace.
  if (isTRUE(getOption("syberia.autoload_bettertrace", TRUE)) &&
      !identical(Sys.getenv("CI"), "TRUE")) {
    if (!is.element("devtools", utils::installed.packages()[, 1])) {
      packageStartupMessage(crayon::yellow("   ...Installing devtools\n"))
      utils::install.packages("devtools")
    }

    if (!requireNamespace("bettertrace", quietly = TRUE)) {
      packageStartupMessage(crayon::yellow("   ...Installing github.com/robertzk/bettertrace\n"))
      devtools::install_github("robertzk/bettertrace")
    }
    library(bettertrace)
  }

  makeActiveBinding("project", env = globalenv(),
    function() getFromNamespace("active_project", "syberia")())
  makeActiveBinding("resource", env = globalenv(),
    function() getFromNamespace("active_project", "syberia")()$resource)
  
  # We want to initialize a Syberia project in the current working directory
  # because 9 times out of 10 this is what the user wants.
  #
  # However, this hook doesn't work on install, because install is done from
  # the working directory of the package, not the user.
  #
  # But when the user calls library(syberia), this will work.
  try({
    syberia_engine()
    packageStartupMessage(crayon::yellow(
      "Loaded syberia project ", sQuote(active_project()$root()), "...\n"
    ))
  }, silent = TRUE)
}

.onDetach <- function(...) {
  try(silent = TRUE, detach("syberia:shims"))
  # TODO: (RK) Detach any environments attached to the search path
  # by the active project.
}

.onUnload <- function(...) {
  try(silent = TRUE, detach("syberia:shims"))
}