pywb.warcserver package¶
Submodules¶
pywb.warcserver.access_checker module¶
-
class
pywb.warcserver.access_checker.
AccessChecker
(access_source, default_access='allow', embargo=None)[source]¶ Bases:
object
An access checker class
-
EXACT_SUFFIX
= '###'¶
-
EXACT_SUFFIX_B
= b'###'¶
-
EXACT_SUFFIX_SEARCH_B
= b'####'¶
-
create_access_aggregator
(source_files)[source]¶ Creates a new AccessRulesAggregator using the supplied list of access control file names
Parameters: source_files (list[str]) – The list of access control file names Returns: The created AccessRulesAggregator Return type: AccessRulesAggregator
-
create_access_source
(filename)[source]¶ Creates a new access source for the supplied filename.
If the filename is for a directory an CacheDirectoryAccessSource instance is returned otherwise an FileAccessIndexSource instance
Parameters: filename (str) – The name of an file/directory Returns: An instance of CacheDirectoryAccessSource or FileAccessIndexSource depending on if the supplied filename is for a directory or file :rtype: CacheDirectoryAccessSource|FileAccessIndexSource :raises Exception: Indicates an invalid access source was supplied
-
find_access_rule
(url, ts=None, urlkey=None, collection=None, acl_user=None)[source]¶ Attempts to find the access control rule for the supplied URL otherwise returns the default rule
Parameters: - url (str) – The URL for the rule to be found
- ts (str|None) – A timestamp (not used)
- urlkey (str|None) – The access control url key
- collection (str|None) – The collection, if any
- acl_user (str|None) – The access control user, if any
Returns: The access control rule for the supplied URL
if one exists otherwise the default rule :rtype: CDXObject
-
wrap_iter
(cdx_iter, acl_user)[source]¶ Wraps the supplied cdx iter and yields cdx objects that contain the access control results for the cdx object being yielded
Parameters: - cdx_iter – The cdx object iterator to be wrapped
- acl_user (str) – The user associated with this request (optional)
Returns: The wrapped cdx object iterator
-
-
class
pywb.warcserver.access_checker.
AccessRulesAggregator
(*args, **kwargs)[source]¶ Bases:
pywb.warcserver.access_checker.ReverseMergeMixin
,pywb.warcserver.index.aggregator.SimpleAggregator
An Aggregator specific to access control
-
class
pywb.warcserver.access_checker.
CacheDirectoryAccessSource
(*args, **kwargs)[source]¶ Bases:
pywb.warcserver.index.aggregator.CacheDirectoryMixin
,pywb.warcserver.access_checker.DirectoryAccessSource
An cache directory index source specific to access control
-
class
pywb.warcserver.access_checker.
DirectoryAccessSource
(*args, **kwargs)[source]¶ Bases:
pywb.warcserver.access_checker.ReverseMergeMixin
,pywb.warcserver.index.aggregator.DirectoryIndexSource
An directory index source specific to access control
-
INDEX_SOURCES
= [('.aclj', <class 'pywb.warcserver.access_checker.FileAccessIndexSource'>)]¶
-
-
class
pywb.warcserver.access_checker.
FileAccessIndexSource
(filename, config=None)[source]¶ Bases:
pywb.warcserver.index.indexsource.FileIndexSource
An Index Source class specific to access control lists
pywb.warcserver.amf module¶
pywb.warcserver.basewarcserver module¶
pywb.warcserver.handlers module¶
-
class
pywb.warcserver.handlers.
DefaultResourceHandler
(index_source, warc_paths='', forward_proxy_prefix='', **kwargs)[source]¶
-
class
pywb.warcserver.handlers.
IndexHandler
(index_source, opts=None, *args, **kwargs)[source]¶ Bases:
object
-
DEF_OUTPUT
= 'cdxj'¶
-
OUTPUTS
= {'cdxj': <function to_cdxj>, 'json': <function to_json>, 'link': <function to_link>, 'text': <function to_text>}¶
-
pywb.warcserver.http module¶
-
class
pywb.warcserver.http.
DefaultAdapters
[source]¶ Bases:
object
-
live_adapter
= <pywb.warcserver.http.PywbHttpAdapter object>¶
-
remote_adapter
= <pywb.warcserver.http.PywbHttpAdapter object>¶
-
-
class
pywb.warcserver.http.
PywbHttpAdapter
(cert_reqs='CERT_NONE', ca_cert_dir=None, **init_kwargs)[source]¶ Bases:
requests.adapters.HTTPAdapter
This adaptor exists exists to restore the default behavior of urllib3 < 1.25.x, which was to not verify ssl certs, until a better solution is found
-
init_poolmanager
(connections, maxsize, block=False, **pool_kwargs)[source]¶ Initializes a urllib3 PoolManager.
This method should not be called from user code, and is only exposed for use when subclassing the
HTTPAdapter
.Parameters: - connections – The number of urllib3 connection pools to cache.
- maxsize – The maximum number of connections to save in the pool.
- block – Block when no free connections are available.
- pool_kwargs – Extra keyword arguments used to initialize the Pool Manager.
-
proxy_manager_for
(proxy, **proxy_kwargs)[source]¶ Return urllib3 ProxyManager for the given proxy.
This method should not be called from user code, and is only exposed for use when subclassing the
HTTPAdapter
.Parameters: - proxy – The proxy to return a urllib3 ProxyManager for.
- proxy_kwargs – Extra keyword arguments used to configure the Proxy Manager.
Returns: ProxyManager
Return type: urllib3.ProxyManager
-
pywb.warcserver.inputrequest module¶
pywb.warcserver.upstreamindexsource module¶
pywb.warcserver.warcserver module¶
-
class
pywb.warcserver.warcserver.
WarcServer
(config_file='./config.yaml', custom_config=None)[source]¶ Bases:
pywb.warcserver.basewarcserver.BaseWarcServer
-
AUTO_COLL_TEMPL
= '{coll}'¶
-
DEFAULT_DEDUP_URL
= 'redis://localhost:6379/0/pywb:{coll}:cdxj'¶
-