pywb.warcserver package

Submodules

pywb.warcserver.access_checker module

class pywb.warcserver.access_checker.AccessChecker(access_source, default_access='allow', embargo=None)[source]

Bases: object

An access checker class

EXACT_SUFFIX = '###'
EXACT_SUFFIX_B = b'###'
EXACT_SUFFIX_SEARCH_B = b'####'
check_embargo(url, ts)[source]
create_access_aggregator(source_files)[source]

Creates a new AccessRulesAggregator using the supplied list of access control file names

Parameters:source_files (list[str]) – The list of access control file names
Returns:The created AccessRulesAggregator
Return type:AccessRulesAggregator
create_access_source(filename)[source]

Creates a new access source for the supplied filename.

If the filename is for a directory an CacheDirectoryAccessSource instance is returned otherwise an FileAccessIndexSource instance

Parameters:filename (str) – The name of an file/directory
Returns:An instance of CacheDirectoryAccessSource or FileAccessIndexSource

depending on if the supplied filename is for a directory or file :rtype: CacheDirectoryAccessSource|FileAccessIndexSource :raises Exception: Indicates an invalid access source was supplied

find_access_rule(url, ts=None, urlkey=None, collection=None, acl_user=None)[source]

Attempts to find the access control rule for the supplied URL otherwise returns the default rule

Parameters:
  • url (str) – The URL for the rule to be found
  • ts (str|None) – A timestamp (not used)
  • urlkey (str|None) – The access control url key
  • collection (str|None) – The collection, if any
  • acl_user (str|None) – The access control user, if any
Returns:

The access control rule for the supplied URL

if one exists otherwise the default rule :rtype: CDXObject

parse_embargo(embargo)[source]
wrap_iter(cdx_iter, acl_user)[source]

Wraps the supplied cdx iter and yields cdx objects that contain the access control results for the cdx object being yielded

Parameters:
  • cdx_iter – The cdx object iterator to be wrapped
  • acl_user (str) – The user associated with this request (optional)
Returns:

The wrapped cdx object iterator

class pywb.warcserver.access_checker.AccessRulesAggregator(*args, **kwargs)[source]

Bases: pywb.warcserver.access_checker.ReverseMergeMixin, pywb.warcserver.index.aggregator.SimpleAggregator

An Aggregator specific to access control

class pywb.warcserver.access_checker.CacheDirectoryAccessSource(*args, **kwargs)[source]

Bases: pywb.warcserver.index.aggregator.CacheDirectoryMixin, pywb.warcserver.access_checker.DirectoryAccessSource

An cache directory index source specific to access control

class pywb.warcserver.access_checker.DirectoryAccessSource(*args, **kwargs)[source]

Bases: pywb.warcserver.access_checker.ReverseMergeMixin, pywb.warcserver.index.aggregator.DirectoryIndexSource

An directory index source specific to access control

INDEX_SOURCES = [('.aclj', <class 'pywb.warcserver.access_checker.FileAccessIndexSource'>)]
class pywb.warcserver.access_checker.FileAccessIndexSource(filename, config=None)[source]

Bases: pywb.warcserver.index.indexsource.FileIndexSource

An Index Source class specific to access control lists

static rev_cmp(a, b)[source]

Performs a comparison between two items using the algorithm of the removed builtin cmp

Parameters:
  • a – A value to be compared
  • b – A value to be compared
Returns:

The result of the comparison

Return type:

int

class pywb.warcserver.access_checker.ReverseMergeMixin[source]

Bases: object

A mixin that provides revered merge functionality

pywb.warcserver.amf module

class pywb.warcserver.amf.Amf[source]

Bases: object

static get_representation(request_object, max_calls=500)[source]

pywb.warcserver.basewarcserver module

class pywb.warcserver.basewarcserver.BaseWarcServer(*args, **kwargs)[source]

Bases: object

add_route(path, handler, path_param_name='', default_value='')[source]
get_query_dict(environ)[source]
json_encode(res, out_headers)[source]
send_error(errs, start_response, message='No Resource Found', status=404)[source]

pywb.warcserver.handlers module

class pywb.warcserver.handlers.DefaultResourceHandler(index_source, warc_paths='', forward_proxy_prefix='', **kwargs)[source]

Bases: pywb.warcserver.handlers.ResourceHandler

class pywb.warcserver.handlers.HandlerSeq(handlers)[source]

Bases: object

get_supported_modes()[source]
class pywb.warcserver.handlers.IndexHandler(index_source, opts=None, *args, **kwargs)[source]

Bases: object

DEF_OUTPUT = 'cdxj'
OUTPUTS = {'cdxj': <function to_cdxj>, 'json': <function to_json>, 'link': <function to_link>, 'text': <function to_text>}
get_supported_modes()[source]
class pywb.warcserver.handlers.ResourceHandler(index_source, resource_loaders, **kwargs)[source]

Bases: pywb.warcserver.handlers.IndexHandler

get_supported_modes()[source]
pywb.warcserver.handlers.to_cdxj(cdx_iter, fields, params)[source]
pywb.warcserver.handlers.to_json(cdx_iter, fields, params)[source]
pywb.warcserver.handlers.to_text(cdx_iter, fields, params)[source]

pywb.warcserver.http module

class pywb.warcserver.http.DefaultAdapters[source]

Bases: object

live_adapter = <pywb.warcserver.http.PywbHttpAdapter object>
remote_adapter = <pywb.warcserver.http.PywbHttpAdapter object>
class pywb.warcserver.http.PywbHttpAdapter(cert_reqs='CERT_NONE', ca_cert_dir=None, **init_kwargs)[source]

Bases: requests.adapters.HTTPAdapter

This adaptor exists exists to restore the default behavior of urllib3 < 1.25.x, which was to not verify ssl certs, until a better solution is found

init_poolmanager(connections, maxsize, block=False, **pool_kwargs)[source]

Initializes a urllib3 PoolManager.

This method should not be called from user code, and is only exposed for use when subclassing the HTTPAdapter.

Parameters:
  • connections – The number of urllib3 connection pools to cache.
  • maxsize – The maximum number of connections to save in the pool.
  • block – Block when no free connections are available.
  • pool_kwargs – Extra keyword arguments used to initialize the Pool Manager.
proxy_manager_for(proxy, **proxy_kwargs)[source]

Return urllib3 ProxyManager for the given proxy.

This method should not be called from user code, and is only exposed for use when subclassing the HTTPAdapter.

Parameters:
  • proxy – The proxy to return a urllib3 ProxyManager for.
  • proxy_kwargs – Extra keyword arguments used to configure the Proxy Manager.
Returns:

ProxyManager

Return type:

urllib3.ProxyManager

pywb.warcserver.inputrequest module

class pywb.warcserver.inputrequest.DirectWSGIInputRequest(env)[source]

Bases: object

get_full_request_uri()[source]
get_referrer()[source]
get_req_body()[source]
get_req_headers()[source]
get_req_method()[source]
get_req_protocol()[source]
include_method_query(url)[source]
reconstruct_request(url=None)[source]
class pywb.warcserver.inputrequest.MethodQueryCanonicalizer(method, mime, length, stream, buffered_stream=None, environ=None)[source]

Bases: object

MAX_QUERY_LENGTH = 4096
amf_parse(string, warn_on_error)[source]
append_query(url)[source]
json_parse(string)[source]
class pywb.warcserver.inputrequest.POSTInputRequest(env)[source]

Bases: pywb.warcserver.inputrequest.DirectWSGIInputRequest

get_full_request_uri()[source]
get_req_headers()[source]
get_req_method()[source]
get_req_protocol()[source]

pywb.warcserver.upstreamindexsource module

class pywb.warcserver.upstreamindexsource.UpstreamAggIndexSource(base_url)[source]

Bases: pywb.warcserver.index.indexsource.RemoteIndexSource

class pywb.warcserver.upstreamindexsource.UpstreamMementoIndexSource(proxy_url='{url}')[source]

Bases: pywb.warcserver.index.indexsource.BaseIndexSource

load_index(params)[source]
static upstream_resource(base_url)[source]

pywb.warcserver.warcserver module

class pywb.warcserver.warcserver.WarcServer(config_file='./config.yaml', custom_config=None)[source]

Bases: pywb.warcserver.basewarcserver.BaseWarcServer

AUTO_COLL_TEMPL = '{coll}'
DEFAULT_DEDUP_URL = 'redis://localhost:6379/0/pywb:{coll}:cdxj'
get_coll_config(name)[source]
init_paths(name, abs_path=None)[source]
init_sequence(coll_name, seq_config)[source]
list_dynamic_routes()[source]
list_fixed_routes()[source]
load_auto_colls()[source]
load_coll(name, coll_config)[source]
load_colls()[source]
pywb.warcserver.warcserver.init_index_agg(source_configs, use_gevent=False, timeout=0, source_list=None)[source]
pywb.warcserver.warcserver.init_index_source(value, source_list=None)[source]
pywb.warcserver.warcserver.register_source(source_cls, end=False)[source]

Module contents