pywb.warcserver.resource package

Submodules

pywb.warcserver.resource.blockrecordloader module

class pywb.warcserver.resource.blockrecordloader.BlockArcWarcRecordLoader(loader=None, cookie_maker=None, block_size=16384, *args, **kwargs)[source]

Bases: warcio.recordloader.ArcWarcRecordLoader

load(url, offset, length, no_record_parse=False)[source]

Load a single record from given url at offset with length and parse as either warc or arc record

pywb.warcserver.resource.pathresolvers module

class pywb.warcserver.resource.pathresolvers.DefaultResolverMixin[source]

Bases: object

classmethod make_best_resolver(path)[source]
classmethod make_resolvers(paths)[source]
class pywb.warcserver.resource.pathresolvers.PathIndexResolver(pathindex_file)[source]

Bases: object

class pywb.warcserver.resource.pathresolvers.PrefixResolver(template)[source]

Bases: object

resolve_coll(path, source)[source]
class pywb.warcserver.resource.pathresolvers.RedisResolver(redis_url=None, redis=None, key_template=None, **kwargs)[source]

Bases: pywb.warcserver.index.indexsource.RedisIndexSource

pywb.warcserver.resource.resolvingloader module

class pywb.warcserver.resource.resolvingloader.ResolvingLoader(path_resolvers, record_loader=None, no_record_parse=False)[source]

Bases: object

EMPTY_DIGEST = '3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ'
MISSING_REVISIT_MSG = 'Original for revisit record could not be loaded'
load_cdx_for_dupe(url, timestamp, digest, cdx_loader)[source]

If a cdx_server is available, return response from server, otherwise empty list

load_headers_and_payload(cdx, failed_files, cdx_loader)[source]

Resolve headers and payload for a given capture In the simple case, headers and payload are in the same record. In the case of revisit records, the payload and headers may be in different records.

If the original has already been found, lookup original using orig. fields in cdx dict. Otherwise, call _load_different_url_payload() to get cdx index from a different url to find the original record.

pywb.warcserver.resource.responseloader module

class pywb.warcserver.resource.responseloader.BaseLoader[source]

Bases: object

raise_on_self_redirect(params, cdx, status_code, location_url)[source]

Check if response is a 3xx redirect to the same url If so, reject this capture to avoid causing redirect loop

class pywb.warcserver.resource.responseloader.LiveWebLoader(forward_proxy_prefix=None, adapter=None)[source]

Bases: pywb.warcserver.resource.responseloader.BaseLoader

SKIP_HEADERS = ('link', 'memento-datetime', 'content-location', 'x-archive')
UNREWRITE_HEADERS = ('location', 'content-location')
VIDEO_MIMES = ('application/x-mpegURL', 'application/vnd.apple.mpegurl', 'application/dash+xml')
get_custom_metadata(content_type, dt)[source]
load_resource(cdx, params)[source]
unrewrite_header(cdx, value)[source]
class pywb.warcserver.resource.responseloader.VideoLoader[source]

Bases: pywb.warcserver.resource.responseloader.BaseLoader

CONTENT_TYPE = 'application/vnd.youtube-dl_formats+json'
load_resource(cdx, params)[source]
class pywb.warcserver.resource.responseloader.WARCPathLoader(paths, cdx_source)[source]

Bases: pywb.warcserver.resource.pathresolvers.DefaultResolverMixin, pywb.warcserver.resource.responseloader.BaseLoader

load_resource(cdx, params)[source]

Module contents