Utilities for working with the local dataset cache.

allennlp.common.file_utils.cached_path(url_or_filename:Union[str, pathlib.Path], cache_dir:str=None) → str[source]

Given something that might be a URL (or might be a local path), determine which. If it’s a URL, download the file and cache it, and return the path to the cached file. If it’s already a local path, make sure the file exists and then return the path.

allennlp.common.file_utils.filename_to_url(filename:str, cache_dir:str=None) → Tuple[str, str][source]

Return the url and etag (which may be None) stored for filename. Raise FileNotFoundError if filename or its stored metadata do not exist.

allennlp.common.file_utils.get_file_extension(path:str, dot=True, lower:bool=True)[source]
allennlp.common.file_utils.get_from_cache(url:str, cache_dir:str=None) → str[source]

Given a URL, look for the corresponding dataset in the local cache. If it’s not there, download it. Then return the path to the cached file.

allennlp.common.file_utils.http_get(url:str, temp_file:IO) → None[source]
allennlp.common.file_utils.is_url_or_existing_file(url_or_filename:Union[str, pathlib.Path, NoneType]) → bool[source]

Given something that might be a URL (or might be a local path), determine check if it’s url or an existing file path.

allennlp.common.file_utils.read_set_from_file(filename:str) → Set[str][source]

Extract a de-duped collection (set) of text from a file. Expected file format is one item per line.

allennlp.common.file_utils.s3_etag(url:str) → Union[str, NoneType][source]

Check ETag on S3 object.

allennlp.common.file_utils.s3_get(url:str, temp_file:IO) → None[source]

Pull a file directly from S3.


Wrapper function for s3 requests in order to create more helpful error messages.

allennlp.common.file_utils.split_s3_path(url:str) → Tuple[str, str][source]

Split a full s3 path into the bucket name and path.

allennlp.common.file_utils.url_to_filename(url:str, etag:str=None) → str[source]

Convert url into a hashed filename in a repeatable way. If etag is specified, append its hash to the url’s, delimited by a period.