pywb.apps package¶
Submodules¶
pywb.apps.cli module¶
-
class
pywb.apps.cli.
BaseCli
(args=None, default_port=8080, desc='')[source]¶ Bases:
object
Base CLI class that provides the initial arg parser setup, calls load to receive the application to be started and starts the application.
-
class
pywb.apps.cli.
LiveCli
(args=None, default_port=8080, desc='')[source]¶ Bases:
pywb.apps.cli.BaseCli
CLI class for starting pywb in replay server in live mode
-
class
pywb.apps.cli.
ReplayCli
(args=None, default_port=8080, desc='')[source]¶ Bases:
pywb.apps.cli.BaseCli
CLI class that adds the cli functionality specific to starting pywb’s Wayback Machine implementation
-
class
pywb.apps.cli.
WarcServerCli
(args=None, default_port=8080, desc='')[source]¶ Bases:
pywb.apps.cli.BaseCli
CLI class for starting a WarcServer
-
class
pywb.apps.cli.
WaybackCli
(args=None, default_port=8080, desc='')[source]¶ Bases:
pywb.apps.cli.ReplayCli
CLI class for starting the pywb’s implementation of the Wayback Machine
pywb.apps.frontendapp module¶
-
class
pywb.apps.frontendapp.
FrontEndApp
(config_file=None, custom_config=None)[source]¶ Bases:
object
Orchestrates pywb’s core Wayback Machine functionality and is comprised of 2 core sub-apps and 3 optional apps.
- Sub-apps:
- WarcServer: Serves the archive content (WARC/ARC and index) as well as from the live web in record/proxy mode
- RewriterApp: Rewrites the content served by pywb (if it is to be rewritten)
- WSGIProxMiddleware (Optional): If proxy mode is enabled, performs pywb’s HTTP(s) proxy functionality
- AutoIndexer (Optional): If auto-indexing is enabled for the collections it is started here
- RecorderApp (Optional): Recording functionality, available when recording mode is enabled
The RewriterApp is configurable and can be set via the class var REWRITER_APP_CLS, defaults to RewriterApp
-
ALL_DIGITS
= re.compile('^\\d+$')¶
-
CDX_API
= 'http://localhost:%s/{coll}/index'¶
-
PROXY_CA_NAME
= 'pywb HTTPS Proxy CA'¶
-
PROXY_CA_PATH
= 'proxy-certs/pywb-ca.pem'¶
-
RECORD_API
= 'http://localhost:%s/%s/resource/postreq?param.recorder.coll={coll}'¶
-
RECORD_ROUTE
= '/record'¶
-
RECORD_SERVER
= 'http://localhost:%s'¶
-
REPLAY_API
= 'http://localhost:%s/{coll}/resource/postreq'¶
-
REWRITER_APP_CLS
¶ alias of
pywb.apps.rewriterapp.RewriterApp
-
classmethod
create_app
(port)[source]¶ Create a new instance of FrontEndApp that listens on port with a hostname of 0.0.0.0
Parameters: port (int) – The port FrontEndApp is to listen on Returns: A new instance of FrontEndApp wrapped in GeventServer Return type: GeventServer
-
get_coll_config
(coll)[source]¶ Retrieve the collection config, including metadata, associated with a collection
Parameters: coll (str) – The name of the collection to receive config info for Returns: The collections config Return type: dict
-
get_upstream_paths
(port)[source]¶ Retrieve a dictionary containing the full URLs of the upstream apps
Parameters: port (int) – The port used by the replay and cdx servers Returns: A dictionary containing the upstream paths (replay, cdx-server, record [if enabled]) Return type: dict[str, str]
-
handle_request
(environ, start_response)[source]¶ Retrieves the route handler and calls the handler returning its the response
Parameters: - environ (dict) – The WSGI environment dictionary for the request
- start_response –
Returns: The WbResponse for the request
Return type:
-
init_autoindex
(auto_interval)[source]¶ Initialize and start the auto-indexing of the collections. If auto_interval is None this is a no op.
Parameters: auto_interval (str|int) – The auto-indexing interval from the configuration file or CLI argument
-
init_proxy
(config)[source]¶ Initialize and start proxy mode. If proxy configuration entry is not contained in the config this is a no op. Causes handler to become an instance of WSGIProxMiddleware.
Parameters: config (dict) – The configuration object used to configure this instance of FrontEndApp
-
init_recorder
(recorder_config)[source]¶ Initialize the recording functionality of pywb. If recording_config is None this function is a no op
Parameters: recorder_config (str|dict|None) – The configuration for the recorder app Return type: None
-
is_proxy_enabled
(environ)[source]¶ Returns T/F indicating if proxy mode is enabled
Parameters: environ (dict) – The WSGI environment dictionary for the request Returns: T/F indicating if proxy mode is enabled Return type: bool
-
is_valid_coll
(coll)[source]¶ Determines if the collection name for a request is valid (exists)
Parameters: coll (str) – The name of the collection to check Returns: True if the collection is valid, false otherwise Return type: bool
-
proxy_fetch
(env, url)[source]¶ Proxy mode only endpoint that handles OPTIONS requests and COR fetches for Preservation Worker.
Due to normal cross-origin browser restrictions in proxy mode, auto fetch worker cannot access the CSS rules of cross-origin style sheets and must re-fetch them in a manner that is CORS safe. This endpoint facilitates that by fetching the stylesheets for the auto fetch worker and then responds with its contents
Parameters: Returns: WbResponse that is either response to an Options request or the results of fetching url
Return type:
-
proxy_route_request
(url, environ)[source]¶ Return the full url that this proxy request will be routed to The ‘environ’ PATH_INFO and REQUEST_URI will be modified based on the returned url
Default is to use the ‘proxy_prefix’ to point to the proxy collection
-
put_custom_record
(environ, coll='$root')[source]¶ When recording, PUT a custom WARC record to the specified collection (Available only when recording)
Parameters:
-
raise_not_found
(environ, err_type, url)[source]¶ Utility function for raising a werkzeug.exceptions.NotFound execption with the supplied WSGI environment and message.
Parameters:
-
serve_cdx
(environ, coll='$root')[source]¶ Make the upstream CDX query for a collection and response with the results of the query
Parameters: Returns: The WbResponse containing the results of the CDX query
Return type:
-
serve_coll_page
(environ, coll='$root')[source]¶ Render and serve a collections search page (search.html).
Parameters: Returns: The WbResponse containing the collections search page
Return type:
-
serve_content
(environ, coll='$root', url='', timemap_output='', record=False)[source]¶ Serve the contents of a URL/Record rewriting the contents of the response when applicable.
Parameters: - environ (dict) – The WSGI environment dictionary for the request
- coll (str) – The name of the collection the record is to be served from
- url (str) – The URL for the corresponding record to be served if it exists
- timemap_output (str) – The contents of the timemap included in the link header of the response
- record (bool) – Should the content being served by recorded (save to a warc). Only valid in record mode
Returns: WbResponse containing the contents of the record/URL
Return type:
-
serve_home
(environ)[source]¶ Serves the home (/) view of pywb (not a collections)
Parameters: environ (dict) – The WSGI environment dictionary for the request Returns: The WbResponse for serving the home (/) path Return type: WbResponse
-
serve_listing
(environ)[source]¶ Serves the response for WARCServer fixed and dynamic listing (paths)
Parameters: environ (dict) – The WSGI environment dictionary for the request Returns: WbResponse containing the frontend apps WARCServer URL paths Return type: WbResponse
-
serve_record
(environ, coll='$root', url='')[source]¶ Serve a URL’s content from a WARC/ARC record in replay mode or from the live web in live, proxy, and record mode.
Parameters: Returns: WbResponse containing the contents of the record/URL
Return type:
-
serve_static
(environ, coll='', filepath='')[source]¶ Serve a static file associated with a specific collection or one of pywb’s own static assets
Parameters: Returns: The WbResponse for the static asset
Return type:
-
class
pywb.apps.frontendapp.
MetadataCache
(template_str)[source]¶ Bases:
object
This class holds the collection medata template string and caches the metadata for a collection once it is rendered once. Cached metadata is updated if its corresponding file has been updated since last cache time (file mtime based)
-
get_all
(routes)[source]¶ Load the metadata for all routes (collections) and populate the cache
Parameters: routes (list[str]) – List of collection names Returns: A dictionary containing each collections metadata Return type: dict
-
load
(coll)[source]¶ Load and receive the metadata associated with a collection.
If the metadata for the collection is not cached yet its metadata file is read in and stored. If the cache has seen the collection before the mtime of the metadata file is checked and if it is more recent than the cached time, the cache is updated and returned otherwise the cached version is returned.
Parameters: coll (str) – Name of a collection Returns: The cached metadata for a collection Return type: dict
-
pywb.apps.live module¶
pywb.apps.rewriterapp module¶
-
class
pywb.apps.rewriterapp.
RewriterApp
(framed_replay=False, jinja_env=None, config=None, paths=None)[source]¶ Bases:
object
Primary application for rewriting the content served by pywb (if it is to be rewritten).
This class is also responsible rendering the archives templates
-
DEFAULT_CSP
= "default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self'"¶
-
VIDEO_INFO_CONTENT_TYPE
= 'application/vnd.youtube-dl_formats+json'¶
-
add_csp_header
(wb_url, status_headers)[source]¶ Adds Content-Security-Policy headers to the supplied StatusAndHeaders instance if the wb_url’s mod is equal to the replay mod
Parameters: - wb_url (WbUrl) – The WbUrl for the URL being operated on
- status_headers (warcio.StatusAndHeaders) – The status and
headers instance for the reply to the URL
-
do_query
(wb_url, kwargs)[source]¶ Performs the timemap query request for the supplied WbUrl returning the response
Parameters: Returns: The queries response
Return type: requests.Response
-
format_response
(response, wb_url, full_prefix, is_timegate, is_proxy, timegate_closest_ts=None)[source]¶
-
is_framed_replay
(wb_url)[source]¶ Returns T/F indicating if the rewriter app is configured to be operating in framed replay mode and the supplied WbUrl is also operating in framed replay mode
Parameters: wb_url (WbUrl) – The WbUrl instance to check Returns: T/F if in framed replay mode Return type: bool
-
pywb.apps.static_handler module¶
pywb.apps.warcserverapp module¶
pywb.apps.wayback module¶
pywb.apps.wbrequestresponse module¶
-
class
pywb.apps.wbrequestresponse.
WbResponse
(status_headers, value=None, **kwargs)[source]¶ Bases:
object
Represnts a pywb wsgi response object.
Holds a status_headers object and a response iter, to be returned to wsgi container.
-
add_access_control_headers
(env=None)[source]¶ Adds Access-Control* HTTP headers to this WbResponse’s HTTP headers.
Parameters: env (dict) – The WSGI environment dictionary Returns: The same WbResponse but with the values for the Access-Control* HTTP header added Return type: WbResponse
-
add_range
(*args)[source]¶ Add HTTP range header values to this response
Parameters: args (int) – The values for the range HTTP header Returns: The same WbResponse but with the values for the range HTTP header added Return type: WbResponse
-
static
bin_stream
(stream, content_type, status='200 OK', headers=None)[source]¶ Utility method for constructing a binary response.
Parameters: Returns: WbResponse that is a binary stream
Return type:
-
static
encode_stream
(stream)[source]¶ Utility method to encode a stream using utf-8.
Parameters: stream (Any) – The stream to be encoded using utf-8 Returns: A generator that yields the contents of the stream encoded as utf-8
-
static
json_response
(obj, status='200 OK', content_type='application/json; charset=utf-8')[source]¶ Utility method for constructing a JSON response.
Parameters: Returns: WbResponse JSON response
Return type:
-
static
options_response
(env)[source]¶ Construct WbResponse for OPTIONS based on the WSGI env dictionary
Parameters: env (dict) – The WSGI environment dictionary Returns: The WBResponse for the options request Return type: WbResponse
-
static
redir_response
(location, status='302 Redirect', headers=None)[source]¶ Utility method for constructing redirection response.
Parameters: Returns: WbResponse redirection response
Return type:
-
static
text_response
(text, status='200 OK', content_type='text/plain; charset=utf-8')[source]¶ Utility method for constructing a text response.
Parameters: Returns: WbResponse text response
Return type:
-