File Database Management for 'raadtools' • raadfiles

The goal of raadfiles is to manage information about the large collection of files used by raadtools, and related systems.

Motivation

You have a huge set of files you need to access regularly, and the information about those files is the first thing you need.

This project aims to speed up and help you control the following:

raw file listing
seaching file names for patterns
caching metadata extracted from the files

The overall goal is to help you write code to access and manipulate the data in those files. This is a natural complement to schemes that automatically obtain files and build file collections such as raadsync but can be used for other collections as well.

Set up

Get a huge file collection. (You probably have lots, but see https://github.com/rOpensci/bowerbird for a possible way forward if not).
Install raadfiles.
Set up the automated file listing and caching mechanism.

Install raadfiles

## install.packages("remotes")
remotes::install_github("AustralianAntarcticDivision/raadfiles")

Set up the automated file listing and caching mechanism

An R script to list all the files and save to a cache, use raw text or R workspace saveRDS or feather or an actual database.
Create a cron job to run that script every day/hour/minute.
Configure raadfiles::custom_setup (TBD, see R/zzz.R for the in-built mechanism)

Why raadfiles for raadtools?

consistent convention around “file”, “root” in the file cache to have a clear separation on the configured path versus the data
mechanism to load file cache into memory on load, for all functions to share

library(raadtools)
system.time(rt_files <- sstfiles())

library(raadfiles)
system.time(rf_files <- oisst_daily_files())

range(rt_files$date)
range(rf_files$date)

length(rt_files$date)
length(rf_files$date)

raadfiles

Motivation

Set up

Install raadfiles

Set up the automated file listing and caching mechanism

Why raadfiles for raadtools?

License

Developers