OpenWayback vs pywb Terms¶
pywb and OpenWayback use slightly different terms to describe the configuration options, as explained below.
- Some differences are:
- The
wayback.xmlconfig file in OpenWayback is replaced withconfig.yamlyaml - The terms
Access PointandWayback Collectionare replaced withCollectionin pywb. The collection configuration represents a unique path (access point) and the data that is accessed at that path. - The
Resource Storein OpenWayback is known in pywb as the archive paths, configured underarchive_paths - The
Resource Indexin OpenWayback is known in pywb as the index paths, configurable underindex_paths - The
Exclusionsin OpenWayback are replaced with general Access Control System
- The
Pywb Collection Basics¶
A pywb collection must consist of a minimum of three parts: the collection name, the index_paths (where to read the index), and the archive_paths (where to read the WARC files).
The collection is accessed by name, so there is no distinct access point.
The collections are configured in the config.yaml under the collections key:
For example, a basic collection definition can be specified via:
collections:
wayback:
index_paths: /archive/cdx/
archive_paths: /archive/storage/warcs/
Pywb also supports a convention-based directory structure. Collections created in this structure can be detected automatically
and need not be specified in the config.yaml. This structure is designed for smaller collections that are all stored locally in a subdirectory.
See the Directory Structure for the default pywb directory structure.
However, for importing existing collections from OpenWayback, it is probably easier to specify the existing paths as shown above.