The goal of raadfiles is to manage information about the large collection of files used by raadtools, and related systems.

Motivation

You have a huge set of files you need to access regularly, and the information about those files is the first thing you need.

This project aims to speed up and help you control the following:

  • raw file listing
  • seaching file names for patterns
  • caching metadata extracted from the files

The overall goal is to help you write code to access and manipulate the data in those files. This is a natural complement to schemes that automatically obtain files and build file collections such as raadsync but can be used for other collections as well.

Set up

  1. Get a huge file collection. (You probably have lots, but see https://github.com/rOpensci/bowerbird for a possible way forward if not).
  2. Install raadfiles.
  3. Set up the automated file listing and caching mechanism.

Install raadfiles

## install.packages("remotes")
remotes::install_github("AustralianAntarcticDivision/raadfiles")

Set up the automated file listing and caching mechanism

  1. An R script to list all the files and save to a cache, use raw text or R workspace saveRDS or feather or an actual database.
  2. Create a cron job to run that script every day/hour/minute.
  3. Configure raadfiles::custom_setup (TBD, see R/zzz.R for the in-built mechanism)

Why raadfiles for raadtools?

  • consistent convention around “file”, “root” in the file cache to have a clear separation on the configured path versus the data
  • mechanism to load file cache into memory on load, for all functions to share
library(raadtools)
system.time(rt_files <- sstfiles())

library(raadfiles)
system.time(rf_files <- oisst_daily_files())

range(rt_files$date)
range(rf_files$date)

length(rt_files$date)
length(rf_files$date)