Create a file cache object

Create a file cache object

Details

A file cache object is a key-file store that saves the values as files in a directory on disk. The objects are files on disk. They are stored and retrieved using the get_file(), get_content(), set_file(), and set_content() methods. Objects are automatically pruned from the cache according to the parameters max_size, max_age, max_n, and evict.

Cache pruning

Cache pruning occurs when set_file() or set_content() is called, or it can be invoked manually by calling prune().

The disk cache will throttle the pruning so that it does not happen on every call to set_file() or set_content(), because the filesystem operations for checking the status of files can be slow. Instead, it will prune once in every 20 calls to set_file() or set_content(), or if at least 5 seconds have elapsed since the last prune occurred, whichever is first. These parameters are currently not customizable, but may be in the future.

When a pruning occurs, if there are any objects that are older than max_age, they will be removed.

The max_size and max_n parameters are applied to the cache as a whole, in contrast to max_age, which is applied to each object individually.

If the number of objects in the cache exceeds max_n, then objects will be removed from the cache according to the eviction policy, which is set with the evict parameter. Objects will be removed so that the number of items is max_n.

If the size of the objects in the cache exceeds max_size, then objects will be removed from the cache. Objects will be removed from the cache so that the total size remains under max_size. Note that the size is calculated using the size of the files, not the size of disk space used by the files --- these two values can differ because of files are stored in blocks on disk. For example, if the block size is 4096 bytes, then a file that is one byte in size will take 4096 bytes on disk.

Another time that objects can be removed from the cache is when get_file() or get_content() is called. If the target object is older than max_age, it will be removed and the cache will report it as a missing value.

Eviction policies

If max_n or max_size are used, then objects will be removed from the cache according to an eviction policy. The available eviction policies are:

"lru"

Least Recently Used. The least recently used objects will be removed. This uses the filesystem's mtime property. When "lru" is used, each time get_file() or get_content() is called, it will update the file's mtime.

"fifo"

First-in-first-out. The oldest objects will be removed.

Both of these policies use files' mtime. Note that some filesystems (notably FAT) have poor mtime resolution. (atime is not used because support for atime is worse than mtime.)

Sharing among multiple processes

The directory for a FileCache can be shared among multiple R processes. To do this, each R process should have a FileCache object that uses the same directory. Each FileCache will do pruning independently of the others, so if they have different pruning parameters, then one FileCache may remove cached objects before another FileCache would do so.

Even though it is possible for multiple processes to share a FileCache directory, this should not be done on networked file systems, because of slow performance of networked file systems can cause problems. If you need a high-performance shared cache, you can use one built on a database like Redis, SQLite, mySQL, or similar.

When multiple processes share a cache directory, there are some potential race conditions. For example, if your code calls exists(key) to check if an object is in the cache, and then call get_file(key), the object may be removed from the cache in between those two calls, and get_file(key) will throw an error. Instead of calling the two functions, it is better to simply call get_file(key), and use tryCatch() to handle the error that is thrown if the object is not in the cache. This effectively tests for existence and gets the object in one operation.

It is also possible for one processes to prune objects at the same time that another processes is trying to prune objects. If this happens, you may see a warning from file.remove() failing to remove a file that has already been deleted.

Methods


Method new()

Create a FileCache object.

Usage

FileCache$new(
  dir = NULL,
  max_size = 40 * 1024^2,
  max_age = Inf,
  max_n = Inf,
  evict = c("lru", "fifo"),
  destroy_on_finalize = FALSE,
  logfile = NULL
)

Arguments

dir

Directory to store files for the cache. If NULL (the default) it will create and use a temporary directory.

max_size

Maximum size of the cache, in bytes. If the cache exceeds this size, cached objects will be removed according to the value of the evict. Use Inf for no size limit.

max_age

Maximum age of files in cache before they are evicted, in seconds. Use Inf for no age limit.

max_n

Maximum number of objects in the cache. If the number of objects exceeds this value, then cached objects will be removed according to the value of evict. Use Inf for no limit of number of items.

evict

The eviction policy to use to decide which objects are removed when a cache pruning occurs. Currently, "lru" and "fifo" are supported.

destroy_on_finalize

If TRUE, then when the FileCache object is garbage collected, the cache directory and all objects inside of it will be deleted from disk. If FALSE (the default), it will do nothing when finalized.

logfile

An optional filename or connection object to where logging information will be written. To log to the console, use stdout().


Method get_file()

Get the content associated with key, and save in a file named outfile.

Usage

FileCache$get_file(key, outfile, overwrite = TRUE)

Arguments

key

Key. Must be lowercase numbers and letters.

outfile

Name of output file. If NULL, return the content as

overwrite

If the output file already exists, should it be overwritten?

Returns

TRUE if the object is found in the cache and copying succeeds, FALSE otherwise.


Method get_content()

Get the content associated with key, and return as either string or a raw vector.

Usage

FileCache$get_content(key, mode = c("text", "raw"))

Arguments

key

Key. Must be lowercase numbers and letters.

mode

If "text", return the content as a UTF-8-encoded text string (a one element char vector). If "raw", return the content as a raw vector.

Returns

A character or raw vector if the object is found in the cache, NULL otherwise.


Method set_file()

Sets content associated with key, from a file named infile.

Usage

FileCache$set_file(key, infile)

Arguments

key

Key. Must be lowercase numbers and letters.

infile

Name of input file.

Returns

TRUE if copying the file into the cache succeeds, FALSE otherwise.


Method set_content()

Sets content associated with key, from a single-element vector.

Usage

FileCache$set_content(key, content)

Arguments

key

Key. Must be lowercase numbers and letters.

content

A character or raw vector. If it is a character vector, it will be written with UTF-8 encoding, with with elements collapsed with \\n (consistent across platforms).

Returns

TRUE if setting the content in the cache succeeds, FALSE otherwise.


Method exists()

Check if content associated with key exists in cache

Usage

FileCache$exists(key)

Arguments

key

Key. Must be lowercase numbers and letters.

Returns

TRUE if the object is in the cache, FALSE otherwise.


Method keys()

Get all keys

Usage

FileCache$keys()

Returns

A character vector of all keys currently in the cache.


Method remove()

Remove an object

Usage

FileCache$remove(key)

Arguments

key

Key. Must be lowercase numbers and letters.

Returns

TRUE if the object was found and successfully removed, FALSE otherwise.


Method reset()

Clear all objects from the cache.

Usage

FileCache$reset()


Method dir()

Returns the directory used for the cache.

Usage

FileCache$dir()


Method prune()

Prune the cache, using the parameters specified by max_size, max_age, max_n, and evict.

Usage

FileCache$prune()


Method size()

Return the number of items currently in the cache.

Usage

FileCache$size()


Method destroy()

Clears all objects in the cache, and removes the cache directory from disk.

Usage

FileCache$destroy()


Method is_destroyed()

Reports whether the cache has been destroyed.

Usage

FileCache$is_destroyed(throw = FALSE)

Arguments

throw

Should this function throw an error if the cache has been destroyed?


Method finalize()

A finalizer for the cache.

Usage

FileCache$finalize()


Method clone()

The objects of this class are cloneable with this method.

Usage

FileCache$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.