Shopping Cart

No products in the cart.

BS ISO 28500:2017

$167.15

Information and documentation. WARC file format

Published By Publication Date Number of Pages
BSI 2017 36
Guaranteed Safe Checkout
Categories: ,

If you have any questions, feel free to reach out to our online customer service team by clicking on the bottom right corner. We’re here to assist you 24/7.
Email:[email protected]

This document specifies the WARC file format:

  • to store both the payload content and control information from mainstream Internet application layer protocols, such as the HTTP, DNS, and FTP;

  • to store arbitrary metadata linked to other stored data (e.g. subject classifier, discovered language, encoding);

  • to support data compression and maintain data record integrity;

  • to store all control information from the harvesting protocol (e.g. request headers), not just response information;

  • to store the results of data transformations linked to other stored data;

  • to store a duplicate detection event linked to other stored data (to reduce storage in the presence of identical or substantially similar resources);

  • to be extended without disruption to existing functionality;

  • to support handling of overly long records by truncation or segmentation, where desired.

PDF Catalog

PDF Pages PDF Title
2 National foreword
7 Foreword
8 Introduction
9 1 Scope
2 Normative references
10 3 Terms, definitions and abbreviated terms
11 4 File and record model
13 5 Named fields
5.1 General
5.2 WARC-Record-ID (mandatory)
5.3 Content-Length (mandatory)
14 5.4 WARC-Date (mandatory)
5.5 WARC-Type (mandatory)
5.6 Content-Type
15 5.7 WARC-Concurrent-To
5.8 WARC-Block-Digest
5.9 WARC-Payload-Digest
16 5.10 WARC-IP-Address
5.11 WARC-Refers-To
5.12 WARC-Refers-To-Target-URI
5.13 WARC-Refers-To-Date
17 5.14 WARC-Target-URI
5.15 WARC-Truncated
5.16 WARC-Warcinfo-ID
5.17 WARC-Filename
18 5.18 WARC-Profile
5.19 WARC-Identified-Payload-Type
5.20 WARC-Segment-Number
5.21 WARC-Segment-Origin-ID
5.22 WARC-Segment-Total-Length
19 6 WARC record types
6.1 General
6.2 ‘warcinfo’
6.3 ‘response’
6.3.1 General
20 6.3.2 ‘http’ and ‘https’ schemes
6.3.3 Other URI schemes
6.4 ‘resource’
6.4.1 General
6.4.2 ‘http’ and ‘https’ schemes
6.4.3 ‘ftp’ scheme
21 6.4.4 ‘dns’ scheme
6.4.5 Other URI schemes
6.5 ‘request’
6.5.1 General
6.5.2 ‘http’ and ‘https’ schemes
6.5.3 Other URI schemes
6.6 ‘metadata’
22 6.7 ‘revisit’
6.7.1 General
6.7.2 Profile: Identical Payload Digest
23 6.7.3 Profile: Server Not Modified
6.7.4 Other profiles
6.8 ‘conversion’
24 6.9 ‘continuation’
7 Record segmentation
8 WARC file name, size and compression
26 Annex A (informative) Use cases for writing WARC records
29 Annex B (informative) Examples of WARC records
32 Annex C (informative) WARC file size and name recommendations
33 Annex D (informative) Compression recommendations
34 Bibliography
BS ISO 28500:2017
$167.15