pywb.manager package

Submodules

pywb.manager.aclmanager module

class pywb.manager.aclmanager.ACLManager(r)[source]

Bases: pywb.manager.manager.CollectionsManager

DEFAULT_FILE = 'access-rules.aclj'
SURT_RX = re.compile('([^:.]+[,)])+')
VALID_ACCESS = ('allow', 'block', 'exclude', 'allow_ignore_embargo')
add_excludes(r)[source]

Import old-style excludes, in url-per-line format

Parameters:r (argparse.Namespace) – Parsed result from ArgumentParser
add_rule(r)[source]

Adds a rule the ACL manager

Parameters:r (argparse.Namespace) – The argparse namespace representing the rule to be added
Return type:None
find_match(r)[source]

Finds a matching acl rule

Parameters:r (argparse.Namespace) – Parsed result from ArgumentParser
Return type:None
classmethod init_parser(parser)[source]

Initializes an argument parser for acl commands

Parameters:parser (argparse.ArgumentParser) – The parser to be initialized
Return type:None
is_valid_auto_coll(coll_name)[source]

Returns T/F indicating if the supplied collection name is a valid collection

Parameters:coll_name – The collection name to check
Returns:T/F indicating a valid collection
Return type:bool
list_rules(r)[source]

Print the acl rules to the stdout

Parameters:r (argparse.Namespace|None) – Not used
Return type:None
load_acl(must_exist=True)[source]

Loads the access control list

Parameters:must_exist (bool) – Does the acl file have to exist
Returns:T/F indicating load success
Return type:bool
print_rule(rule)[source]

Prints the supplied rule to the std out

Parameters:rule (CDXObject) – The rule to be printed
Return type:None
process(r)[source]

Process acl command

Parameters:r (argparse.Namespace) – Parsed result from ArgumentParser
Return type:None
remove_rule(r)[source]

Removes a rule from the acl file

Parameters:r (argparse.Namespace) – Parsed result from ArgumentParser
Return type:None
save_acl(r=None)[source]

Save the contents of the rules as cdxj entries to the access control list file

Parameters:r (argparse.Namespace|None) – Not used
Return type:None
to_key(url_or_surt, exact_match=False)[source]

If ‘url_or_surt’ already a SURT, use as is If exact match, add the exact match suffix

Parameters:
  • url_or_surt (str) – The url or surt to be converted to an acl key
  • exact_match (bool) – Should the exact match suffix be added to key
Return type:

str

validate(log=False, correct=False)[source]

Validates the acl rules returning T/F if the list should be saved

Parameters:
  • log (bool) – Should the results of validating be logged to stdout
  • correct (bool) – Should invalid results be corrected and saved
Return type:

None

validate_access(access)[source]

Returns true if the supplied access value is valid otherwise terminates the process

Parameters:access (str) – The access value to be validated
Returns:True if valid
Return type:bool
validate_save(r=None, log=False)[source]

Validates the acl rules and saves the file

Parameters:
  • r (argparse.Namespace|None) – Not used
  • log (bool) – Should a report be printed to stdout
Return type:

None

pywb.manager.autoindex module

class pywb.manager.autoindex.AutoIndexer(colls_dir=None, interval=30, keep_running=True)[source]

Bases: object

AUTO_INDEX_FILE = 'autoindex.cdxj'
EXT_RX = re.compile('.*\\.w?arc(\\.gz)?$')
check_path()[source]
do_index(files)[source]
is_newer_than(path1, path2, track=False)[source]
run()[source]
start()[source]
stop()[source]

pywb.manager.locmanager module

class pywb.manager.locmanager.LocManager[source]

Bases: object

compile_catalog()[source]
extract_loc(locale, no_csv)[source]
extract_text()[source]
init_catalog(loc)[source]
classmethod init_parser(parser)[source]

Initializes an argument parser for acl commands

Parameters:parser (argparse.ArgumentParser) – The parser to be initialized
Return type:None
list_loc()[source]
process(r)[source]
remove_loc(locale)[source]
update_catalog(loc)[source]
update_loc(locale, no_csv)[source]

pywb.manager.manager module

class pywb.manager.manager.CollectionsManager(coll_name, colls_dir=None, must_exist=True)[source]

Bases: object

This utility is designed to simplify the creation and management of web archive collections

It may be used via cmdline to setup and maintain the directory structure expected by pywb

COLLS_DIR = 'collections'
COLL_RX = re.compile('^[\\w][-\\w]*$')
DEF_INDEX_FILE = 'index.cdxj'
WACZ_RX = re.compile('.*\\.wacz$')
WARC_RX = re.compile('.*\\.w?arc(\\.gz)?$')
add_archives(archives, uncompress_wacz=False)[source]
add_collection()[source]
add_template(template_name, force=False, ignore=False)[source]
change_collection(coll_name)[source]
index_merge(filelist, index_file)[source]
list_colls()[source]
list_templates()[source]
migrate_cdxj(path, force=False)[source]
reindex()[source]
remove_template(template_name, force=False)[source]
set_metadata(namevalue_pairs)[source]
pywb.manager.manager.get_input(msg)[source]
pywb.manager.manager.get_version()[source]

Get version of the pywb

pywb.manager.manager.main(args=None)[source]
pywb.manager.manager.main_wrap_exc()[source]

pywb.manager.migrate module

class pywb.manager.migrate.MigrateCDX(dir_)[source]

Bases: object

convert_to_cdxj()[source]
count_cdx()[source]
iter_cdx_files()[source]

Module contents