OpenWayback vs pywb Terms¶
pywb and OpenWayback use slightly different terms to describe the configuration options, as explained below.
- Some differences are:
wayback.xmlconfig file in OpenWayback is replaced with
- The terms
Wayback Collectionare replaced with
Collectionin pywb. The collection configuration represents a unique path (access point) and the data that is accessed at that path.
Resource Storein OpenWayback is known in pywb as the archive paths, configured under
Resource Indexin OpenWayback is known in pywb as the index paths, configurable under
Exclusionsin OpenWayback are replaced with general Access Control System
Pywb Collection Basics¶
A pywb collection must consist of a minimum of three parts: the collection name, the
index_paths (where to read the index), and the
archive_paths (where to read the WARC files).
The collection is accessed by name, so there is no distinct access point.
The collections are configured in the
config.yaml under the
For example, a basic collection definition can be specified via:
collections: wayback: index_paths: /archive/cdx/ archive_paths: /archive/storage/warcs/
Pywb also supports a convention-based directory structure. Collections created in this structure can be detected automatically
and need not be specified in the
config.yaml. This structure is designed for smaller collections that are all stored locally in a subdirectory.
See the Directory Structure for the default pywb directory structure.
However, for importing existing collections from OpenWayback, it is probably easier to specify the existing paths as shown above.