Command-Line Apps

After installing pywb tool-suite, the following command-line apps are made available (in the Python binary directory or current environment):

All server tools have a different default port, which can be override via the -p <port> command-line option.

cdx-indexer

The CDX Indexer provides a way to create a CDX(J) file from a WARC/ARC. The tool supports both classic-CDX and new CDXJ formats.

The indexer also provides options for including all WARC records, and merging data from POST request (and other HTTP records).

See cdx-indexer -h for a list of options.

Note: In a future pywb release, this tool will be removed in favor of the standalone cdxj-indexer app, which will have additional indexing options.

wb-manager

The wb-manager command-line tool is used to to configure the collections directory structure and its contents, which pywb uses to automatically read collections.

The tool can be used while wayback is running, and pywb will detect many changes automatically.

It can be used to:

  • Create a new collection – wb-manager init <coll>
  • Add WARCs or WACZs to collection – wb-manager add <coll> <warc/wacz>
  • Add override templates
  • Add and remove metadata to a collections metadata.yaml
  • List all collections
  • Reindex a collection
  • Migrate old CDX to CDXJ style indexes.

For more details, run wb-manager -h.

warcserver

The Warcserver is a standalone server component that adheres to the Warcserver API.

The server runs on port 8070 by default serving both index and content.

The CDX Server is a subset of the Warcserver and queries using the CDXJ Server API are included:

http://localhost:8070/<coll>/index?url=http://example.com/

No rewriting or recording is performed by the Warcserver, but all collections from config.yaml are loaded.

wayback (pywb)

The main pywb application is installed as the wayback application. (The pywb name is the same application, may become the primary name in future versions).

The app will start on port 8080 by default, and configuration is read from config.yaml

See Configuring the Web Archive for a detailed overview of configuration options and customizations.

live-rewrite-server

This cli is a shortcut for wayback, but configured to run with only the Live Web Collection.

The live rewrite server runs on port 8090 and rewrites content from live web, useful for testing.

This app is almost equivalent to wayback --live, except no other collections from config.yaml are used.