pywb.rewrite package¶

Submodules¶

pywb.rewrite.content_rewriter module¶

class pywb.rewrite.content_rewriter.BaseContentRewriter(rules_file, replay_mod='')[source]¶

Bases: object

CHARSET_REGEX = re.compile(b'<meta[^>]*?[\\s;"\']charset\\s*=[\\s"\']*([^\\s"\'/>]*)')¶

TITLE = re.compile('<\\s*title\\s*>(.*)<\\s*\\/\\s*title\\s*>', re.IGNORECASE|re.MULTILINE|re.DOTALL)¶

add_prefer_mod(pref, mod)[source]¶

add_rewriter(rw)[source]¶

create_rewriter(text_type, rule, rwinfo, cdx, head_insert_func=None)[source]¶

extract_html_charset(buff)[source]¶

get_head_insert(rwinfo, rule, head_insert_func, cdx)[source]¶

get_rewrite_types()[source]¶

get_rewriter(rw_type, rwinfo=None)[source]¶

get_rule(cdx)[source]¶

get_rw_class(rule, text_type, rwinfo)[source]¶

has_custom_rules(rule, cdx)[source]¶

html_unescape()¶: Convert all named and numeric character references (e.g. >, >, &x3e;) in the string s to the corresponding unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the list of HTML 5 named character references defined in html.entities.html5.

init_js_regexs(regexs)[source]¶

load_rules(filename)[source]¶

mod_to_prefer(mod)[source]¶

parse_rewrite_rule(config)[source]¶

prefer_to_mod(pref)[source]¶

rewrite_headers(rwinfo)[source]¶

classmethod set_unescape(unescape)[source]¶

class pywb.rewrite.content_rewriter.BufferedRewriter(url_rewriter=None)[source]¶

Bases: object

rewrite_stream(stream, rwinfo)[source]¶

class pywb.rewrite.content_rewriter.RewriteInfo(record, content_rewriter, url_rewriter, cookie_rewriter=None)[source]¶

Bases: object

JSONP_CONTAINS = ['callback=jQuery', 'callback=jsonp', '.json?']¶

JSON_REGEX = re.compile(b'^\\s*[{[][{"]')¶

TAG_REGEX = re.compile(b'^(\xef\xbb\xbf)?\\s*\\<')¶

TAG_REGEX2 = re.compile(b'^.*<\\w+[\\s>]')¶

content_stream¶

is_url_rw()[source]¶

read_and_keep(size)[source]¶

should_rw_content()[source]¶

class pywb.rewrite.content_rewriter.StreamingRewriter(url_rewriter, align_to_line=True, first_buff='')[source]¶

Bases: object

final_read()[source]¶

rewrite(string)[source]¶

rewrite_complete(string, **kwargs)[source]¶

rewrite_text_stream_to_gen(stream, rwinfo)[source]¶: Convert stream to generator using applying rewriting func to each portion of the stream. Align to line boundaries if needed.

pywb.rewrite.cookie_rewriter module¶

class pywb.rewrite.cookie_rewriter.ExactPathCookieRewriter(url_rewriter)[source]¶

Bases: pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter

Rewrite cookies only using exact path, useful for live rewrite without a timestamp and to minimize cookie pollution

If path or domain present, simply remove

rewrite_cookie(name, morsel)[source]¶

class pywb.rewrite.cookie_rewriter.HostScopeCookieRewriter(url_rewriter)[source]¶

Bases: pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter

Attempt to rewrite cookies to current host url..

If path present, rewrite path to current host. Only makes sense in live proxy or no redirect mode, as otherwise timestamp may change.

If domain present, remove domain and set to path prefix

rewrite_cookie(name, morsel)[source]¶

class pywb.rewrite.cookie_rewriter.MinimalScopeCookieRewriter(url_rewriter)[source]¶

Bases: pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter

Attempt to rewrite cookies to minimal scope possible

If path present, rewrite path to current rewritten url only If domain present, remove domain and set to path prefix

rewrite_cookie(name, morsel)[source]¶

class pywb.rewrite.cookie_rewriter.RemoveAllCookiesRewriter(url_rewriter)[source]¶

Bases: pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter

rewrite(cookie_str, header='Set-Cookie')[source]¶

class pywb.rewrite.cookie_rewriter.RootScopeCookieRewriter(url_rewriter)[source]¶

Bases: pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter

Sometimes it is necessary to rewrite cookies to root scope in order to work across time boundaries and modifiers

This rewriter simply sets all cookies to be in the root

rewrite_cookie(name, morsel)[source]¶

class pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter(url_rewriter)[source]¶

Bases: object

Base Cookie rewriter for wburl-based requests.

REMOVE_EXPIRES = re.compile('[;]\\s*?expires=.{4}[^,;]+', re.IGNORECASE)¶

add_prefix_cookie_for_all_mods(morsel, results, header)[source]¶: If HttpOnly cookie that is set to a path ending in /, and current mod is mp_ or if_, then assume its meant to be a prefix, and likely needed for other content. Set cookie with same prefix but for all common modifiers: (mp_, js_, cs_, oe_, if_, sw_, wkrf_)

rewrite(cookie_str, header='Set-Cookie')[source]¶

pywb.rewrite.cookie_rewriter.get_cookie_rewriter(cookie_scope)[source]¶

pywb.rewrite.cookies module¶

class pywb.rewrite.cookies.CookieTracker(redis, expire_time=120)[source]¶

Bases: object

add_cookie(cookie_key, domain, name, value)[source]¶

get_cookie_headers(url, url_rewriter, cookie_key, existing_cookie)[source]¶

get_rewriter(url_rewriter, cookie_key)[source]¶

static get_subdomains(url)[source]¶

class pywb.rewrite.cookies.DomainCacheCookieRewriter(url_rewriter, cookie_tracker, cookie_key)[source]¶

Bases: pywb.rewrite.cookie_rewriter.WbUrlBaseCookieRewriter

get_expire_sec(morsel)[source]¶

rewrite_cookie(name, morsel)[source]¶

class pywb.rewrite.cookies.HostScopeNoFilterCookieRewriter(url_rewriter)[source]¶: Bases: pywb.rewrite.cookie_rewriter.HostScopeCookieRewriter

pywb.rewrite.default_rewriter module¶

class pywb.rewrite.default_rewriter.DefaultRewriter(replay_mod='', config=None)[source]¶

Bases: pywb.rewrite.content_rewriter.BaseContentRewriter

DEFAULT_REWRITERS = {'amf': <class 'pywb.rewrite.rewrite_amf.RewriteAMF'>, 'cookie': <class 'pywb.rewrite.cookie_rewriter.HostScopeCookieRewriter'>, 'css': <class 'pywb.rewrite.regex_rewriters.CSSRewriter'>, 'dash': <class 'pywb.rewrite.rewrite_dash.RewriteDASH'>, 'header': <class 'pywb.rewrite.header_rewriter.DefaultHeaderRewriter'>, 'hls': <class 'pywb.rewrite.rewrite_hls.RewriteHLS'>, 'html': <class 'pywb.rewrite.html_rewriter.HTMLRewriter'>, 'html-banner-only': <class 'pywb.rewrite.html_insert_rewriter.HTMLInsertOnlyRewriter'>, 'js': <class 'pywb.rewrite.regex_rewriters.JSLocationOnlyRewriter'>, 'js-proxy': <class 'pywb.rewrite.regex_rewriters.JSNoneRewriter'>, 'js-worker': <class 'pywb.rewrite.rewrite_js_workers.JSWorkerRewriter'>, 'json': <class 'pywb.rewrite.jsonp_rewriter.JSONPRewriter'>, 'xml': <class 'pywb.rewrite.regex_rewriters.XMLRewriter'>}¶

default_content_types = {'css': 'text/css', 'html': 'text/html', 'js': 'text/javascript'}¶

get_rewrite_types()[source]¶

init_js_regex(regexs)[source]¶

rewrite_types = {'': 'guess-text', 'application/dash+xml': 'dash', 'application/javascript': 'js', 'application/json': 'json', 'application/octet-stream': 'guess-bin', 'application/vnd.apple.mpegurl': 'hls', 'application/x-amf': 'amf', 'application/x-javascript': 'js', 'application/x-mpegURL': 'hls', 'application/xhtml': 'html', 'application/xhtml+xml': 'html', 'text/css': 'css', 'text/html': 'guess-html', 'text/javascript': 'js', 'text/plain': 'guess-text'}¶

class pywb.rewrite.default_rewriter.RewriterWithJSProxy(*args, **kwargs)[source]¶

Bases: pywb.rewrite.default_rewriter.DefaultRewriter

get_rewriter(rw_type, rwinfo=None)[source]¶

ua_allows_obj_proxy(opts)[source]¶

pywb.rewrite.header_rewriter module¶

class pywb.rewrite.header_rewriter.DefaultHeaderRewriter(rwinfo, header_prefix='X-Archive-Orig-')[source]¶

Bases: object

header_rules = {'accept-patch': 'keep', 'accept-ranges': 'keep', 'access-control-allow-credentials': 'prefix-if-url-rewrite', 'access-control-allow-headers': 'prefix-if-url-rewrite', 'access-control-allow-methods': 'prefix-if-url-rewrite', 'access-control-allow-origin': 'prefix-if-url-rewrite', 'access-control-expose-headers': 'prefix-if-url-rewrite', 'access-control-max-age': 'prefix-if-url-rewrite', 'age': 'prefix', 'allow': 'keep', 'alt-svc': 'prefix', 'cache-control': 'prefix', 'connection': 'prefix', 'content-base': 'url-rewrite', 'content-disposition': 'keep', 'content-encoding': 'prefix-if-content-rewrite', 'content-language': 'keep', 'content-length': 'content-length', 'content-location': 'url-rewrite', 'content-md5': 'prefix', 'content-range': 'keep', 'content-security-policy': 'prefix', 'content-security-policy-report-only': 'prefix', 'content-type': 'keep', 'date': 'prefix', 'etag': 'prefix', 'expires': 'prefix', 'last-modified': 'prefix', 'link': 'keep', 'location': 'url-rewrite', 'p3p': 'prefix', 'pragma': 'prefix', 'proxy-authenticate': 'keep', 'public-key-pins': 'prefix', 'retry-after': 'prefix', 'server': 'prefix', 'set-cookie': 'cookie', 'status': 'prefix', 'strict-transport-security': 'prefix', 'tk': 'prefix', 'trailer': 'prefix', 'transfer-encoding': 'transfer-encoding', 'upgrade': 'prefix', 'upgrade-insecure-requests': 'prefix', 'vary': 'prefix', 'via': 'prefix', 'warning': 'prefix', 'www-authenticate': 'keep', 'x-frame-options': 'prefix', 'x-xss-protection': 'prefix'}¶

rewrite_header(name, value, rule)[source]¶

pywb.rewrite.html_insert_rewriter module¶

class pywb.rewrite.html_insert_rewriter.HTMLInsertOnlyRewriter(url_rewriter, **kwargs)[source]¶

Bases: pywb.rewrite.content_rewriter.StreamingRewriter

Insert custom string into HTML into the head, before any tag not <head> or <html> no other rewriting performed

NOT_HEAD_REGEX = re.compile('(<\\s*\\b)(?!(html|head))', re.IGNORECASE)¶

XML_HEADER = re.compile('<\\?xml.*\\?>')¶

final_read()[source]¶

rewrite(string)[source]¶

pywb.rewrite.html_rewriter module¶

class pywb.rewrite.html_rewriter.HTMLRewriter(*args, **kwargs)[source]¶

Bases: pywb.rewrite.html_rewriter.HTMLRewriterMixin, html.parser.HTMLParser

PARSETAG = re.compile('[<]')¶

clear_cdata_mode()[source]¶

feed(string)[source]¶

Feed data to the parser.

Call this as often as you want, with as little or as much text as you want (may include ‘n’).

handle_comment(data)[source]¶

handle_data(data)[source]¶

handle_decl(data)[source]¶

handle_endtag(tag)[source]¶

handle_pi(data)[source]¶

handle_startendtag(tag, attrs)[source]¶

handle_starttag(tag, attrs)[source]¶

reset()[source]¶: Reset this instance. Loses all unprocessed data.

unescape(s)[source]¶

unknown_decl(data)[source]¶

class pywb.rewrite.html_rewriter.HTMLRewriterMixin(url_rewriter, head_insert=None, js_rewriter_class=None, js_rewriter=None, css_rewriter=None, css_rewriter_class=None, url='', defmod='', parse_comments=False, charset='utf-8')[source]¶

Bases: pywb.rewrite.content_rewriter.StreamingRewriter

HTML-Parsing Rewriter for custom rewriting, also delegates to rewriters for script and css

ADD_WINDOW = re.compile('(?<![.])(WB_wombat_)')¶

class AccumBuff[source]¶

Bases: object

getvalue()[source]¶

write(string)[source]¶

BEFORE_HEAD_TAGS = ['html', 'head']¶

DATA_RW_PROTOCOLS = ('http://', 'https://', '//')¶

META_REFRESH_REGEX = re.compile('^[\\d.]+\\s*;\\s*url\\s*=\\s*(.+?)\\s*$', re.IGNORECASE|re.MULTILINE)¶

PRELOAD_TYPES = {'audio': 'oe_', 'document': 'if_', 'embed': 'oe_', 'fetch': 'mp_', 'font': 'oe_', 'image': 'im_', 'object': 'oe_', 'script': 'js_', 'style': 'cs_', 'track': 'oe_', 'video': 'oe_', 'worker': 'js_'}¶

SRCSET_REGEX = re.compile('\\s*(\\S*\\s+[\\d\\.]+[wx]),|(?:\\s*,(?:\\s+|(?=https?:)))')¶

close()[source]¶

final_read()[source]¶

get_attr(tag_attrs, match_name)[source]¶

has_attr(tag_attrs, attr)[source]¶

parse_data(data)[source]¶

rewrite(string)[source]¶

try_unescape(value)[source]¶

pywb.rewrite.jsonp_rewriter module¶

class pywb.rewrite.jsonp_rewriter.JSONPRewriter(url_rewriter, align_to_line=True, first_buff='')[source]¶

Bases: pywb.rewrite.content_rewriter.StreamingRewriter

CALLBACK = re.compile('[?].*callback=([^&]+)')¶

JSONP = re.compile('(?:^[ \\t]*(?:(?:\\/\\*[^\\*]*\\*\\/)|(?:\\/\\/[^\\n]+[\\n])))*[ \\t]*(\\w+)\\(\\{', re.MULTILINE)¶

rewrite(string)[source]¶

pywb.rewrite.regex_rewriters module¶

class pywb.rewrite.regex_rewriters.CSSRewriter(rewriter, extra_rules=None, first_buff='')[source]¶

Bases: pywb.rewrite.regex_rewriters.RegexRewriter

rules_factory = <pywb.rewrite.regex_rewriters.CSSRules object>¶

class pywb.rewrite.regex_rewriters.CSSRules[source]¶

Bases: pywb.rewrite.regex_rewriters.RxRules

CSS_IMPORT_REGEX = '@import\\s+(?:url\\s*)?\\(?\\s*[\'"]?([\\w.:/\\\\-]+)'¶

CSS_URL_REGEX = 'url\\s*\$\\s*(?:[\\\\"\']|(?:&.{1,4};))*\\s*([^)\'"]+)\\s*(?:[\\\\"\']|(?:&.{1,4};))*\\s*\$'¶

class pywb.rewrite.regex_rewriters.JSLinkAndLocationRewriter(rewriter, extra_rules=None, first_buff='')[source]¶

Bases: pywb.rewrite.regex_rewriters.RegexRewriter

rules_factory = <pywb.rewrite.regex_rewriters.JSLinkAndLocationRewriterRules object>¶

class pywb.rewrite.regex_rewriters.JSLinkAndLocationRewriterRules(prefix='WB_wombat_')[source]¶

Bases: pywb.rewrite.regex_rewriters.JSLocationRewriterRules

JS Rewriter rules which also rewrite absolute http://, https:// and // urls at the beginning of a string

JS_HTTPX = '(?:(?<=["\\\';])https?:|(?<=["\\\']))\\\\{0,4}/\\\\{0,4}/[A-Za-z0-9:_@%.\\\\-]+/'¶

get_rules(prefix)[source]¶

class pywb.rewrite.regex_rewriters.JSLocationOnlyRewriter(rewriter, extra_rules=None, first_buff='')[source]¶

Bases: pywb.rewrite.regex_rewriters.RegexRewriter

rules_factory = <pywb.rewrite.regex_rewriters.JSLocationRewriterRules object>¶

class pywb.rewrite.regex_rewriters.JSLocationRewriterRules(prefix='WB_wombat_')[source]¶

Bases: pywb.rewrite.regex_rewriters.RxRules

JS Rewriter mixin which rewrites location and domain to the specified prefix (default: WB_wombat_)

get_rules(prefix)[source]¶

class pywb.rewrite.regex_rewriters.JSNoneRewriter(rewriter, extra_rules=None, first_buff='')[source]¶: Bases: pywb.rewrite.regex_rewriters.RegexRewriter

class pywb.rewrite.regex_rewriters.JSReplaceFuzzy(*args, **kwargs)[source]¶

Bases: object

rewrite(string)[source]¶

rx_obj = None¶

pywb.rewrite.regex_rewriters.JSRewriter¶: alias of pywb.rewrite.regex_rewriters.JSLinkAndLocationRewriter

class pywb.rewrite.regex_rewriters.JSWombatProxyRewriter(rewriter, extra_rules=None)[source]¶

Bases: pywb.rewrite.regex_rewriters.RegexRewriter

JS Rewriter mixin which wraps the contents of the script in an anonymous block scope and inserts Wombat js-proxy setup

final_read()[source]¶

rewrite_complete(string, **kwargs)[source]¶

rules_factory = <pywb.rewrite.regex_rewriters.JSWombatProxyRules object>¶

class pywb.rewrite.regex_rewriters.JSWombatProxyRules[source]¶: Bases: pywb.rewrite.regex_rewriters.RxRules

class pywb.rewrite.regex_rewriters.RegexRewriter(rewriter, extra_rules=None, first_buff='')[source]¶

Bases: pywb.rewrite.content_rewriter.StreamingRewriter

filter(m)[source]¶

static parse_rules_from_config(config)[source]¶

replace(m)[source]¶

rewrite(string)[source]¶

rules_factory = <pywb.rewrite.regex_rewriters.RxRules object>¶

class pywb.rewrite.regex_rewriters.RxRules(rules=None)[source]¶

Bases: object

HTTPX_MATCH_STR = 'https?:\\\\?/\\\\?/[A-Za-z0-9:_@.-]+'¶

static add_prefix(prefix)[source]¶

static add_suffix(suffix)[source]¶

static archival_rewrite(mod=None)[source]¶

static compile_rules(rules)[source]¶

static fixed(string)[source]¶

static format(template)[source]¶

static remove_https(string, _)[source]¶

static replace_str(replacer, match='this')[source]¶

class pywb.rewrite.regex_rewriters.XMLRewriter(rewriter, extra_rules=None, first_buff='')[source]¶

Bases: pywb.rewrite.regex_rewriters.RegexRewriter

filter(m)[source]¶

rules_factory = <pywb.rewrite.regex_rewriters.XMLRules object>¶

class pywb.rewrite.regex_rewriters.XMLRules[source]¶: Bases: pywb.rewrite.regex_rewriters.RxRules

pywb.rewrite.rewrite_amf module¶

class pywb.rewrite.rewrite_amf.RewriteAMF(url_rewriter=None)[source]¶

Bases: pywb.rewrite.content_rewriter.BufferedRewriter

rewrite_stream(stream, rwinfo)[source]¶

pywb.rewrite.rewrite_dash module¶

class pywb.rewrite.rewrite_dash.RewriteDASH(url_rewriter=None)[source]¶

Bases: pywb.rewrite.content_rewriter.BufferedRewriter

rewrite_dash(stream, rwinfo)[source]¶

rewrite_stream(stream, rwinfo)[source]¶

pywb.rewrite.rewrite_dash.rewrite_fb_dash(string, *args)[source]¶

pywb.rewrite.rewrite_dash.rewrite_tw_dash(string, *args)[source]¶

pywb.rewrite.rewrite_hls module¶

class pywb.rewrite.rewrite_hls.RewriteHLS(url_rewriter=None)[source]¶

Bases: pywb.rewrite.content_rewriter.BufferedRewriter

EXT_INF = re.compile('#EXT-X-STREAM-INF:(?:.*[,])?BANDWIDTH=([\\d]+)')¶

EXT_RESOLUTION = re.compile('RESOLUTION=([\\d]+)x([\\d]+)')¶

rewrite_stream(stream, rwinfo)[source]¶

pywb.rewrite.rewrite_js_workers module¶

class pywb.rewrite.rewrite_js_workers.JSWorkerRewriter(url_rewriter, align_to_line=True, first_buff='')[source]¶

Bases: pywb.rewrite.content_rewriter.StreamingRewriter

A simple rewriter for rewriting web or service workers. The only rewriting that occurs is the injection of the init code for wombatWorkers.js. This allows for all them to operate as expected on the live web.

pywb.rewrite.rewriteinputreq module¶

class pywb.rewrite.rewriteinputreq.RewriteInputRequest(env, urlkey, url, rewriter)[source]¶

Bases: pywb.warcserver.inputrequest.DirectWSGIInputRequest

RANGE_ARG_RX = re.compile('.*.googlevideo.com/videoplayback.*([&?]range=(\\d+)-(\\d+))')¶

RANGE_HEADER = re.compile('bytes=(\\d+)-(\\d+)?')¶

extract_range()[source]¶

get_full_request_uri()[source]¶

get_req_headers()[source]¶

pywb.rewrite.templateview module¶

class pywb.rewrite.templateview.BaseInsertView(jenv, insert_file, banner_view=None)[source]¶

Bases: object

Base class of all template views used by Pywb

render_to_string(env, **kwargs)[source]¶

Render this template.

Parameters:	env (dict) – The WSGI environment associated with the request causing this template to be rendered kwargs (any) – The keyword arguments to be supplied to the Jninja template render method
Returns:	The rendered template
Return type:	str

class pywb.rewrite.templateview.HeadInsertView(jenv, insert_file, banner_view=None)[source]¶

Bases: pywb.rewrite.templateview.BaseInsertView

The template view class associated with rendering the HTML inserted into the head of the pages replayed (WB Insert).

create_insert_func(wb_url, wb_prefix, host_prefix, top_url, env, is_framed, coll='', include_ts=True, **kwargs)[source]¶

Create the function used to render the header insert template for the current request.

Parameters:	wb_url (rewrite.wburl.WbUrl) – The WbUrl for the request this template is being rendered for wb_prefix (str) – The URL prefix pywb is serving the content using (e.g. http://localhost:8080/live/) host_prefix (str) – The host URL prefix pywb is running on (e.g. http://localhost:8080) top_url (str) – The full URL for this request (e.g. http://localhost:8080/live/http://example.com) env (dict) – The WSGI environment dictionary for this request is_framed (bool) – Is pywb or a specific collection running in framed mode coll (str) – The name of the collection this request is associated with include_ts (bool) – Should a timestamp be included in the rendered template kwargs – Additional keyword arguments to be supplied to the Jninja template render method
Returns:	A function to be used to render the header insert for the request this template is being rendered for
Return type:	callable

class pywb.rewrite.templateview.JinjaEnv(paths=None, packages=None, assets_path=None, globals=None, overlay=None, extensions=None, env_template_params_key='pywb.template_params', env_template_dir_key='pywb.templates_dir')[source]¶

Bases: object

Pywb JinjaEnv class that provides utility functions used by the templates, configured template loaders and template paths, and contains the actual Jinja env used by each template.

init_loc(locales_root_dir, locales, loc_map, default_locale)[source]¶

template_filter(param=None)[source]¶

Returns a decorator that adds the wrapped function to dictionary of template filters.

The wrapped function is keyed by either the supplied param (if supplied) or by the wrapped functions name.

Parameters:	param – Optional name to use instead of the name of the function to be wrapped
Returns:	A decorator to wrap a template filter function
Return type:	callable

class pywb.rewrite.templateview.PkgResResolver[source]¶

Bases: webassets.env.Resolver

Class for resolving pywb package resources when install via pypi or setup.py

get_pkg_path(item)[source]¶

Get the package path for the

Parameters:	item (str) – A resources full package path
Returns:	The netloc and path from the items package path
Return type:	tuple[str, str]

resolve_source(ctx, item)[source]¶

Given item from a Bundle’s contents, this has to return the final value to use, usually an absolute filesystem path.

Note

It is also allowed to return urls and bundle instances (or generally anything else the calling Bundle instance may be able to handle). Indeed this is the reason why the name of this method does not imply a return type.

The incoming item is usually a relative path, but may also be an absolute path, or a url. These you will commonly want to return unmodified.

This method is also allowed to resolve item to multiple values, in which case a list should be returned. This is commonly used if item includes glob instructions (wildcards).

Note

Instead of this, subclasses should consider implementing search_for_source() instead.

class pywb.rewrite.templateview.RelEnvironment(block_start_string='{%', block_end_string='%}', variable_start_string='{{', variable_end_string='}}', comment_start_string='{#', comment_end_string='#}', line_statement_prefix=None, line_comment_prefix=None, trim_blocks=False, lstrip_blocks=False, newline_sequence='n', keep_trailing_newline=False, extensions=(), optimized=True, undefined=<class 'jinja2.runtime.Undefined'>, finalize=None, autoescape=False, loader=None, cache_size=400, auto_reload=True, bytecode_cache=None, enable_async=False)[source]¶

Bases: jinja2.environment.Environment

Override join_path() to enable relative template paths.

join_path(template, parent)[source]¶

Join a template with the parent. By default all the lookups are relative to the loader root so this method returns the template parameter unchanged, but if the paths should be relative to the parent template, this function can be used to calculate the real template name.

Subclasses may override this method and implement template path joining here.

class pywb.rewrite.templateview.TopFrameView(jenv, insert_file, banner_view=None)[source]¶

Bases: pywb.rewrite.templateview.BaseInsertView

The template view class associated with rendering the replay iframe

get_top_frame(wb_url, wb_prefix, host_prefix, env, frame_mod, replay_mod, coll='', extra_params=None)[source]¶

Parameters:	wb_url (rewrite.wburl.WbUrl) – The WbUrl for the request this template is being rendered for wb_prefix (str) – The URL prefix pywb is serving the content using (e.g. http://localhost:8080/live/) host_prefix (str) – The host URL prefix pywb is running on (e.g. http://localhost:8080) env (dict) – The WSGI environment dictionary for the request this template is being rendered for frame_mod (str) – The modifier to be used for framing (e.g. if_) replay_mod (str) – The modifier to be used in the URL of the page being replayed (e.g. mp_) coll (str) – The name of the collection this template is being rendered for extra_params (dict) – Additional parameters to be supplied to the Jninja template render method
Returns:	The frame insert string
Return type:	str

pywb.rewrite.url_rewriter module¶

class pywb.rewrite.url_rewriter.IdentityUrlRewriter(wburl, prefix='', full_prefix=None, rel_prefix=None, root_path=None, cookie_scope=None, rewrite_opts=None, pywb_static_prefix=None)[source]¶

Bases: pywb.rewrite.url_rewriter.UrlRewriter

No rewriting performed, return original url

deprefix_url()[source]¶

get_cookie_rewriter(scope=None)[source]¶

get_new_url(**kwargs)[source]¶

rebase_rewriter(new_url)[source]¶

rewrite(url, mod=None, force_abs=False)[source]¶

class pywb.rewrite.url_rewriter.SchemeOnlyUrlRewriter(*args, **kwargs)[source]¶

Bases: pywb.rewrite.url_rewriter.IdentityUrlRewriter

A url rewriter which ensures that any urls have the same scheme (http or https) as the base url. Other urls/input is unchanged.

rewrite(url, mod=None, force_abs=False)[source]¶

class pywb.rewrite.url_rewriter.UrlRewriter(wburl, prefix='', full_prefix=None, rel_prefix=None, root_path=None, cookie_scope=None, rewrite_opts=None, pywb_static_prefix=None)[source]¶

Bases: object

Main pywb UrlRewriter which rewrites absolute and relative urls to be relative to the current page, as specified via a WbUrl instance and an optional full path prefix

NO_REWRITE_URI_PREFIX = ('#', 'javascript:', 'data:', 'mailto:', 'about:', 'file:', '{')¶

PARENT_PATH = '../'¶

PROTOCOLS = ('http:', 'https:', 'ftp:', 'mms:', 'rtsp:', 'wais:')¶

REL_PATH = '/'¶

REL_SCHEME = ('//', '\\/\\/', '\\\\/\\\\/')¶

deprefix_url()[source]¶

get_cookie_rewriter(scope=None)[source]¶

get_new_url(**kwargs)[source]¶

pywb_static_prefix¶: Returns the static path URL :rtype: str

rebase_rewriter(base_url)[source]¶

rewrite(url, mod=None, force_abs=False)[source]¶

static urljoin(orig_url, url)[source]¶

pywb.rewrite.wburl module¶

WbUrl represents the standard wayback archival url format. A regular url is a subset of the WbUrl (latest replay).

The WbUrl expresses the common interface for interacting with the wayback machine.

There WbUrl may represent one of the following forms:

query form: [/modifier]/[timestamp][-end_timestamp]*/<url>

modifier, timestamp and end_timestamp are optional:

*/example.com
20101112030201*/http://example.com
2009-2015*/http://example.com
/cdx/*/http://example.com

url query form: used to indicate query across urls same as query form but with a final *:

*/example.com*
20101112030201*/http://example.com*

replay form:

20101112030201/http://example.com
20101112030201im_/http://example.com

latest_replay: (no timestamp):

http://example.com

Additionally, the BaseWbUrl provides the base components (url, timestamp, end_timestamp, modifier, type) which can be used to provide a custom representation of the wayback url format.

class pywb.rewrite.wburl.BaseWbUrl(url='', mod='', timestamp='', end_timestamp='', type=None)[source]¶

Bases: object

LATEST_REPLAY = 'latest_replay'¶

QUERY = 'query'¶

REPLAY = 'replay'¶

URL_QUERY = 'url_query'¶

is_latest_replay()[source]¶

is_query()[source]¶

static is_query_type(type_)[source]¶

is_replay()[source]¶

static is_replay_type(type_)[source]¶

is_url_query()[source]¶

class pywb.rewrite.wburl.WbUrl(orig_url)[source]¶

Bases: pywb.rewrite.wburl.BaseWbUrl

DEFAULT_SCHEME = 'http://'¶

FIRST_PATH = re.compile('(?<![:/])[/?](?![/])')¶

QUERY_REGEX = re.compile('^(?:([\\w\\-:]+)/)?(\\d*)[*-](\\d*)/?(.+)$')¶

REPLAY_REGEX = re.compile('^(\\d*)([a-z]+_|[$][a-z0-9:.-]+)?/{1,3}(.+)$')¶

SCHEME_RX = re.compile('[a-zA-Z0-9+-.]+(:/)')¶

deprefix_url(prefix)[source]¶

get_url(url=None)[source]¶

is_banner_only¶

is_embed¶

is_identity¶

is_url_rewrite_only¶

static percent_encode_host(url)[source]¶: Convert the host of uri formatted with to_uri() to have a %-encoded host instead of punycode host The rest of url should be unchanged

set_replay_timestamp(timestamp)[source]¶

to_str(**overrides)[source]¶

static to_uri(url)[source]¶: Converts a url to an ascii %-encoded form where: - scheme is ascii, - host is punycode, - and remainder is %-encoded Not using urlsplit to also decode partially encoded scheme urls

static to_wburl_str(url, type='latest_replay', mod='', timestamp='', end_timestamp='')[source]¶

pywb.rewrite package¶

Submodules¶

pywb.rewrite.content_rewriter module¶

pywb.rewrite.cookie_rewriter module¶

pywb.rewrite.cookies module¶

pywb.rewrite.default_rewriter module¶

pywb.rewrite.header_rewriter module¶

pywb.rewrite.html_insert_rewriter module¶

pywb.rewrite.html_rewriter module¶

pywb.rewrite.jsonp_rewriter module¶

pywb.rewrite.regex_rewriters module¶

pywb.rewrite.rewrite_amf module¶

pywb.rewrite.rewrite_dash module¶

pywb.rewrite.rewrite_hls module¶

pywb.rewrite.rewrite_js_workers module¶

pywb.rewrite.rewriteinputreq module¶

pywb.rewrite.templateview module¶

pywb.rewrite.url_rewriter module¶

pywb.rewrite.wburl module¶

Module contents¶