gtsocial-umbx

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | README | LICENSE

README.md (57335B)


      1 # S2 Compression
      2 
      3 S2 is an extension of [Snappy](https://github.com/google/snappy).
      4 
      5 S2 is aimed for high throughput, which is why it features concurrent compression for bigger payloads.
      6 
      7 Decoding is compatible with Snappy compressed content, but content compressed with S2 cannot be decompressed by Snappy.
      8 This means that S2 can seamlessly replace Snappy without converting compressed content.
      9 
     10 S2 can produce Snappy compatible output, faster and better than Snappy.
     11 If you want full benefit of the changes you should use s2 without Snappy compatibility. 
     12 
     13 S2 is designed to have high throughput on content that cannot be compressed.
     14 This is important, so you don't have to worry about spending CPU cycles on already compressed data. 
     15 
     16 ## Benefits over Snappy
     17 
     18 * Better compression
     19 * Adjustable compression (3 levels) 
     20 * Concurrent stream compression
     21 * Faster decompression, even for Snappy compatible content
     22 * Concurrent Snappy/S2 stream decompression
     23 * Skip forward in compressed stream
     24 * Random seeking with indexes
     25 * Compatible with reading Snappy compressed content
     26 * Smaller block size overhead on incompressible blocks
     27 * Block concatenation
     28 * Block Dictionary support
     29 * Uncompressed stream mode
     30 * Automatic stream size padding
     31 * Snappy compatible block compression
     32 
     33 ## Drawbacks over Snappy
     34 
     35 * Not optimized for 32 bit systems
     36 * Streams use slightly more memory due to larger blocks and concurrency (configurable)
     37 
     38 # Usage
     39 
     40 Installation: `go get -u github.com/klauspost/compress/s2`
     41 
     42 Full package documentation:
     43  
     44 [![godoc][1]][2]
     45 
     46 [1]: https://godoc.org/github.com/klauspost/compress?status.svg
     47 [2]: https://godoc.org/github.com/klauspost/compress/s2
     48 
     49 ## Compression
     50 
     51 ```Go
     52 func EncodeStream(src io.Reader, dst io.Writer) error {
     53     enc := s2.NewWriter(dst)
     54     _, err := io.Copy(enc, src)
     55     if err != nil {
     56         enc.Close()
     57         return err
     58     }
     59     // Blocks until compression is done.
     60     return enc.Close() 
     61 }
     62 ```
     63 
     64 You should always call `enc.Close()`, otherwise you will leak resources and your encode will be incomplete.
     65 
     66 For the best throughput, you should attempt to reuse the `Writer` using the `Reset()` method.
     67 
     68 The Writer in S2 is always buffered, therefore `NewBufferedWriter` in Snappy can be replaced with `NewWriter` in S2.
     69 It is possible to flush any buffered data using the `Flush()` method. 
     70 This will block until all data sent to the encoder has been written to the output.
     71 
     72 S2 also supports the `io.ReaderFrom` interface, which will consume all input from a reader.
     73 
     74 As a final method to compress data, if you have a single block of data you would like to have encoded as a stream,
     75 a slightly more efficient method is to use the `EncodeBuffer` method.
     76 This will take ownership of the buffer until the stream is closed.
     77 
     78 ```Go
     79 func EncodeStream(src []byte, dst io.Writer) error {
     80     enc := s2.NewWriter(dst)
     81     // The encoder owns the buffer until Flush or Close is called.
     82     err := enc.EncodeBuffer(buf)
     83     if err != nil {
     84         enc.Close()
     85         return err
     86     }
     87     // Blocks until compression is done.
     88     return enc.Close()
     89 }
     90 ```
     91 
     92 Each call to `EncodeBuffer` will result in discrete blocks being created without buffering, 
     93 so it should only be used a single time per stream.
     94 If you need to write several blocks, you should use the regular io.Writer interface.
     95 
     96 
     97 ## Decompression
     98 
     99 ```Go
    100 func DecodeStream(src io.Reader, dst io.Writer) error {
    101     dec := s2.NewReader(src)
    102     _, err := io.Copy(dst, dec)
    103     return err
    104 }
    105 ```
    106 
    107 Similar to the Writer, a Reader can be reused using the `Reset` method.
    108 
    109 For the best possible throughput, there is a `EncodeBuffer(buf []byte)` function available.
    110 However, it requires that the provided buffer isn't used after it is handed over to S2 and until the stream is flushed or closed.  
    111 
    112 For smaller data blocks, there is also a non-streaming interface: `Encode()`, `EncodeBetter()` and `Decode()`.
    113 Do however note that these functions (similar to Snappy) does not provide validation of data, 
    114 so data corruption may be undetected. Stream encoding provides CRC checks of data.
    115 
    116 It is possible to efficiently skip forward in a compressed stream using the `Skip()` method. 
    117 For big skips the decompressor is able to skip blocks without decompressing them.
    118 
    119 ## Single Blocks
    120 
    121 Similar to Snappy S2 offers single block compression. 
    122 Blocks do not offer the same flexibility and safety as streams,
    123 but may be preferable for very small payloads, less than 100K.
    124 
    125 Using a simple `dst := s2.Encode(nil, src)` will compress `src` and return the compressed result. 
    126 It is possible to provide a destination buffer. 
    127 If the buffer has a capacity of `s2.MaxEncodedLen(len(src))` it will be used. 
    128 If not a new will be allocated. 
    129 
    130 Alternatively `EncodeBetter`/`EncodeBest` can also be used for better, but slightly slower compression.
    131 
    132 Similarly to decompress a block you can use `dst, err := s2.Decode(nil, src)`. 
    133 Again an optional destination buffer can be supplied. 
    134 The `s2.DecodedLen(src)` can be used to get the minimum capacity needed. 
    135 If that is not satisfied a new buffer will be allocated.
    136 
    137 Block function always operate on a single goroutine since it should only be used for small payloads.
    138 
    139 # Commandline tools
    140 
    141 Some very simply commandline tools are provided; `s2c` for compression and `s2d` for decompression.
    142 
    143 Binaries can be downloaded on the [Releases Page](https://github.com/klauspost/compress/releases).
    144 
    145 Installing then requires Go to be installed. To install them, use:
    146 
    147 `go install github.com/klauspost/compress/s2/cmd/s2c@latest && go install github.com/klauspost/compress/s2/cmd/s2d@latest`
    148 
    149 To build binaries to the current folder use:
    150 
    151 `go build github.com/klauspost/compress/s2/cmd/s2c && go build github.com/klauspost/compress/s2/cmd/s2d`
    152 
    153 
    154 ## s2c
    155 
    156 ```
    157 Usage: s2c [options] file1 file2
    158 
    159 Compresses all files supplied as input separately.
    160 Output files are written as 'filename.ext.s2' or 'filename.ext.snappy'.
    161 By default output files will be overwritten.
    162 Use - as the only file name to read from stdin and write to stdout.
    163 
    164 Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
    165 Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
    166 
    167 File names beginning with 'http://' and 'https://' will be downloaded and compressed.
    168 Only http response code 200 is accepted.
    169 
    170 Options:
    171   -bench int
    172     	Run benchmark n times. No output will be written
    173   -blocksize string
    174     	Max  block size. Examples: 64K, 256K, 1M, 4M. Must be power of two and <= 4MB (default "4M")
    175   -c	Write all output to stdout. Multiple input files will be concatenated
    176   -cpu int
    177     	Compress using this amount of threads (default 32)
    178   -faster
    179     	Compress faster, but with a minor compression loss
    180   -help
    181     	Display help
    182   -index
    183         Add seek index (default true)    	
    184   -o string
    185         Write output to another file. Single input file only
    186   -pad string
    187     	Pad size to a multiple of this value, Examples: 500, 64K, 256K, 1M, 4M, etc (default "1")
    188   -q	Don't write any output to terminal, except errors
    189   -rm
    190     	Delete source file(s) after successful compression
    191   -safe
    192     	Do not overwrite output files
    193   -slower
    194     	Compress more, but a lot slower
    195   -snappy
    196         Generate Snappy compatible output stream
    197   -verify
    198     	Verify written files  
    199 
    200 ```
    201 
    202 ## s2d
    203 
    204 ```
    205 Usage: s2d [options] file1 file2
    206 
    207 Decompresses all files supplied as input. Input files must end with '.s2' or '.snappy'.
    208 Output file names have the extension removed. By default output files will be overwritten.
    209 Use - as the only file name to read from stdin and write to stdout.
    210 
    211 Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
    212 Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
    213 
    214 File names beginning with 'http://' and 'https://' will be downloaded and decompressed.
    215 Extensions on downloaded files are ignored. Only http response code 200 is accepted.
    216 
    217 Options:
    218   -bench int
    219     	Run benchmark n times. No output will be written
    220   -c	Write all output to stdout. Multiple input files will be concatenated
    221   -help
    222     	Display help
    223   -o string
    224         Write output to another file. Single input file only
    225   -offset string
    226         Start at offset. Examples: 92, 64K, 256K, 1M, 4M. Requires Index
    227   -q    Don't write any output to terminal, except errors
    228   -rm
    229         Delete source file(s) after successful decompression
    230   -safe
    231         Do not overwrite output files
    232   -tail string
    233         Return last of compressed file. Examples: 92, 64K, 256K, 1M, 4M. Requires Index
    234   -verify
    235     	Verify files, but do not write output                                      
    236 ```
    237 
    238 ## s2sx: self-extracting archives
    239 
    240 s2sx allows creating self-extracting archives with no dependencies.
    241 
    242 By default, executables are created for the same platforms as the host os, 
    243 but this can be overridden with `-os` and `-arch` parameters.
    244 
    245 Extracted files have 0666 permissions, except when untar option used.
    246 
    247 ```
    248 Usage: s2sx [options] file1 file2
    249 
    250 Compresses all files supplied as input separately.
    251 If files have '.s2' extension they are assumed to be compressed already.
    252 Output files are written as 'filename.s2sx' and with '.exe' for windows targets.
    253 If output is big, an additional file with ".more" is written. This must be included as well.
    254 By default output files will be overwritten.
    255 
    256 Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
    257 Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
    258 
    259 Options:
    260   -arch string
    261         Destination architecture (default "amd64")
    262   -c    Write all output to stdout. Multiple input files will be concatenated
    263   -cpu int
    264         Compress using this amount of threads (default 32)
    265   -help
    266         Display help
    267   -max string
    268         Maximum executable size. Rest will be written to another file. (default "1G")
    269   -os string
    270         Destination operating system (default "windows")
    271   -q    Don't write any output to terminal, except errors
    272   -rm
    273         Delete source file(s) after successful compression
    274   -safe
    275         Do not overwrite output files
    276   -untar
    277         Untar on destination
    278 ```
    279 
    280 Available platforms are:
    281 
    282  * darwin-amd64
    283  * darwin-arm64
    284  * linux-amd64
    285  * linux-arm
    286  * linux-arm64
    287  * linux-mips64
    288  * linux-ppc64le
    289  * windows-386
    290  * windows-amd64                                                                             
    291 
    292 By default, there is a size limit of 1GB for the output executable.
    293 
    294 When this is exceeded the remaining file content is written to a file called
    295 output+`.more`. This file must be included for a successful extraction and 
    296 placed alongside the executable for a successful extraction.
    297 
    298 This file *must* have the same name as the executable, so if the executable is renamed, 
    299 so must the `.more` file. 
    300 
    301 This functionality is disabled with stdin/stdout. 
    302 
    303 ### Self-extracting TAR files
    304 
    305 If you wrap a TAR file you can specify `-untar` to make it untar on the destination host.
    306 
    307 Files are extracted to the current folder with the path specified in the tar file.
    308 
    309 Note that tar files are not validated before they are wrapped.
    310 
    311 For security reasons files that move below the root folder are not allowed.
    312 
    313 # Performance
    314 
    315 This section will focus on comparisons to Snappy. 
    316 This package is solely aimed at replacing Snappy as a high speed compression package.
    317 If you are mainly looking for better compression [zstandard](https://github.com/klauspost/compress/tree/master/zstd#zstd)
    318 gives better compression, but typically at speeds slightly below "better" mode in this package.
    319 
    320 Compression is increased compared to Snappy, mostly around 5-20% and the throughput is typically 25-40% increased (single threaded) compared to the Snappy Go implementation.
    321 
    322 Streams are concurrently compressed. The stream will be distributed among all available CPU cores for the best possible throughput.
    323 
    324 A "better" compression mode is also available. This allows to trade a bit of speed for a minor compression gain.
    325 The content compressed in this mode is fully compatible with the standard decoder.
    326 
    327 Snappy vs S2 **compression** speed on 16 core (32 thread) computer, using all threads and a single thread (1 CPU):
    328 
    329 | File                                                                                                    | S2 Speed | S2 Throughput | S2 % smaller | S2 "better" | "better" throughput | "better" % smaller |
    330 |---------------------------------------------------------------------------------------------------------|----------|---------------|--------------|-------------|---------------------|--------------------|
    331 | [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                        | 16.33x   | 10556 MB/s    | 8.0%         | 6.04x       | 5252 MB/s           | 14.7%              |
    332 | (1 CPU)                                                                                                 | 1.08x    | 940 MB/s      | -            | 0.46x       | 400 MB/s            | -                  |
    333 | [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst)     | 16.51x   | 15224 MB/s    | 31.70%       | 9.47x       | 8734 MB/s           | 37.71%             |
    334 | (1 CPU)                                                                                                 | 1.26x    | 1157 MB/s     | -            | 0.60x       | 556 MB/s            | -                  |
    335 | [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)             | 15.14x   | 12598 MB/s    | -5.76%       | 6.23x       | 5675 MB/s           | 3.62%              |
    336 | (1 CPU)                                                                                                 | 1.02x    | 932 MB/s      | -            | 0.47x       | 432 MB/s            | -                  |
    337 | [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                         | 11.21x   | 12116 MB/s    | 15.95%       | 3.24x       | 3500 MB/s           | 18.00%             |
    338 | (1 CPU)                                                                                                 | 1.05x    | 1135 MB/s     | -            | 0.27x       | 292 MB/s            | -                  |
    339 | [apache.log](https://files.klauspost.com/compress/apache.log.zst)                                       | 8.55x    | 16673 MB/s    | 20.54%       | 5.85x       | 11420 MB/s          | 24.97%             |
    340 | (1 CPU)                                                                                                 | 1.91x    | 1771 MB/s     | -            | 0.53x       | 1041 MB/s           | -                  |
    341 | [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                        | 15.76x   | 14357 MB/s    | 24.01%       | 8.67x       | 7891 MB/s           | 33.68%             |
    342 | (1 CPU)                                                                                                 | 1.17x    | 1064 MB/s     | -            | 0.65x       | 595 MB/s            | -                  |
    343 | [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                         | 13.33x   | 9835 MB/s     | 2.34%        | 6.85x       | 4863 MB/s           | 9.96%              |
    344 | (1 CPU)                                                                                                 | 0.97x    | 689 MB/s      | -            | 0.55x       | 387 MB/s            | -                  |
    345 | sharnd.out.2gb                                                                                          | 9.11x    | 13213 MB/s    | 0.01%        | 1.49x       | 9184 MB/s           | 0.01%              |
    346 | (1 CPU)                                                                                                 | 0.88x    | 5418 MB/s     | -            | 0.77x       | 5417 MB/s           | -                  |
    347 | [sofia-air-quality-dataset csv](https://files.klauspost.com/compress/sofia-air-quality-dataset.tar.zst) | 22.00x   | 11477 MB/s    | 18.73%       | 11.15x      | 5817 MB/s           | 27.88%             |
    348 | (1 CPU)                                                                                                 | 1.23x    | 642 MB/s      | -            | 0.71x       | 642 MB/s            | -                  |
    349 | [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                        | 11.23x   | 6520 MB/s     | 5.9%         | 5.35x       | 3109 MB/s           | 15.88%             |
    350 | (1 CPU)                                                                                                 | 1.05x    | 607 MB/s      | -            | 0.52x       | 304 MB/s            | -                  |
    351 | [enwik9](https://files.klauspost.com/compress/enwik9.zst)                                               | 19.28x   | 8440 MB/s     | 4.04%        | 9.31x       | 4076 MB/s           | 18.04%             |
    352 | (1 CPU)                                                                                                 | 1.12x    | 488 MB/s      | -            | 0.57x       | 250 MB/s            | -                  |
    353 
    354 ### Legend
    355 
    356 * `S2 Speed`: Speed of S2 compared to Snappy, using 16 cores and 1 core.
    357 * `S2 Throughput`: Throughput of S2 in MB/s. 
    358 * `S2 % smaller`: How many percent of the Snappy output size is S2 better.
    359 * `S2 "better"`: Speed when enabling "better" compression mode in S2 compared to Snappy. 
    360 * `"better" throughput`: Speed when enabling "better" compression mode in S2 compared to Snappy. 
    361 * `"better" % smaller`: How many percent of the Snappy output size is S2 better when using "better" compression.
    362 
    363 There is a good speedup across the board when using a single thread and a significant speedup when using multiple threads.
    364 
    365 Machine generated data gets by far the biggest compression boost, with size being reduced by up to 35% of Snappy size.
    366 
    367 The "better" compression mode sees a good improvement in all cases, but usually at a performance cost.
    368 
    369 Incompressible content (`sharnd.out.2gb`, 2GB random data) sees the smallest speedup. 
    370 This is likely dominated by synchronization overhead, which is confirmed by the fact that single threaded performance is higher (see above). 
    371 
    372 ## Decompression
    373 
    374 S2 attempts to create content that is also fast to decompress, except in "better" mode where the smallest representation is used.
    375 
    376 S2 vs Snappy **decompression** speed. Both operating on single core:
    377 
    378 | File                                                                                                | S2 Throughput | vs. Snappy | Better Throughput | vs. Snappy |
    379 |-----------------------------------------------------------------------------------------------------|---------------|------------|-------------------|------------|
    380 | [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                    | 2117 MB/s     | 1.14x      | 1738 MB/s         | 0.94x      |
    381 | [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst) | 2401 MB/s     | 1.25x      | 2307 MB/s         | 1.20x      |
    382 | [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)         | 2075 MB/s     | 0.98x      | 1764 MB/s         | 0.83x      |
    383 | [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                     | 2967 MB/s     | 1.05x      | 2885 MB/s         | 1.02x      |
    384 | [adresser.json](https://files.klauspost.com/compress/adresser.json.zst)                             | 4141 MB/s     | 1.07x      | 4184 MB/s         | 1.08x      |
    385 | [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                    | 2264 MB/s     | 1.12x      | 2185 MB/s         | 1.08x      |
    386 | [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                     | 1525 MB/s     | 1.03x      | 1347 MB/s         | 0.91x      |
    387 | sharnd.out.2gb                                                                                      | 3813 MB/s     | 0.79x      | 3900 MB/s         | 0.81x      |
    388 | [enwik9](http://mattmahoney.net/dc/textdata.html)                                                   | 1246 MB/s     | 1.29x      | 967 MB/s          | 1.00x      |
    389 | [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                    | 1433 MB/s     | 1.12x      | 1203 MB/s         | 0.94x      |
    390 | [enwik10](https://encode.su/threads/3315-enwik10-benchmark-results)                                 | 1284 MB/s     | 1.32x      | 1010 MB/s         | 1.04x      |
    391 
    392 ### Legend
    393 
    394 * `S2 Throughput`: Decompression speed of S2 encoded content.
    395 * `Better Throughput`: Decompression speed of S2 "better" encoded content.
    396 * `vs Snappy`: Decompression speed of S2 "better" mode compared to Snappy and absolute speed.
    397 
    398 
    399 While the decompression code hasn't changed, there is a significant speedup in decompression speed. 
    400 S2 prefers longer matches and will typically only find matches that are 6 bytes or longer. 
    401 While this reduces compression a bit, it improves decompression speed.
    402 
    403 The "better" compression mode will actively look for shorter matches, which is why it has a decompression speed quite similar to Snappy.   
    404 
    405 Without assembly decompression is also very fast; single goroutine decompression speed. No assembly:
    406 
    407 | File                           | S2 Throughput | S2 throughput |
    408 |--------------------------------|---------------|---------------|
    409 | consensus.db.10gb.s2           | 1.84x         | 2289.8 MB/s   |
    410 | 10gb.tar.s2                    | 1.30x         | 867.07 MB/s   |
    411 | rawstudio-mint14.tar.s2        | 1.66x         | 1329.65 MB/s  |
    412 | github-june-2days-2019.json.s2 | 2.36x         | 1831.59 MB/s  |
    413 | github-ranks-backup.bin.s2     | 1.73x         | 1390.7 MB/s   |
    414 | enwik9.s2                      | 1.67x         | 681.53 MB/s   |
    415 | adresser.json.s2               | 3.41x         | 4230.53 MB/s  |
    416 | silesia.tar.s2                 | 1.52x         | 811.58        |
    417 
    418 Even though S2 typically compresses better than Snappy, decompression speed is always better. 
    419 
    420 ### Concurrent Stream Decompression
    421 
    422 For full stream decompression S2 offers a [DecodeConcurrent](https://pkg.go.dev/github.com/klauspost/compress/s2#Reader.DecodeConcurrent) 
    423 that will decode a full stream using multiple goroutines.
    424 
    425 Example scaling, AMD Ryzen 3950X, 16 cores, decompression using `s2d -bench=3 <input>`, best of 3: 
    426 
    427 | Input                                     | `-cpu=1`   | `-cpu=2`   | `-cpu=4`   | `-cpu=8`   | `-cpu=16`   |
    428 |-------------------------------------------|------------|------------|------------|------------|-------------|
    429 | enwik10.snappy                            | 1098.6MB/s | 1819.8MB/s | 3625.6MB/s | 6910.6MB/s | 10818.2MB/s |
    430 | enwik10.s2                                | 1303.5MB/s | 2606.1MB/s | 4847.9MB/s | 8878.4MB/s | 9592.1MB/s  |
    431 | sofia-air-quality-dataset.tar.snappy      | 1302.0MB/s | 2165.0MB/s | 4244.5MB/s | 8241.0MB/s | 12920.5MB/s |
    432 | sofia-air-quality-dataset.tar.s2          | 1399.2MB/s | 2463.2MB/s | 5196.5MB/s | 9639.8MB/s | 11439.5MB/s |
    433 | sofia-air-quality-dataset.tar.s2 (no asm) | 837.5MB/s  | 1652.6MB/s | 3183.6MB/s | 5945.0MB/s | 9620.7MB/s  |
    434 
    435 Scaling can be expected to be pretty linear until memory bandwidth is saturated. 
    436 
    437 For now the DecodeConcurrent can only be used for full streams without seeking or combining with regular reads.
    438 
    439 ## Block compression
    440 
    441 
    442 When compressing blocks no concurrent compression is performed just as Snappy. 
    443 This is because blocks are for smaller payloads and generally will not benefit from concurrent compression.
    444 
    445 An important change is that incompressible blocks will not be more than at most 10 bytes bigger than the input.
    446 In rare, worst case scenario Snappy blocks could be significantly bigger than the input.  
    447 
    448 ### Mixed content blocks
    449 
    450 The most reliable is a wide dataset. 
    451 For this we use [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
    452 53927 files, total input size: 4,014,735,833 bytes. Single goroutine used.
    453 
    454 | *                 | Input      | Output     | Reduction  | MB/s       |
    455 |-------------------|------------|------------|------------|------------|
    456 | S2                | 4014735833 | 1059723369 | 73.60%     | **936.73** |
    457 | S2 Better         | 4014735833 | 961580539  | 76.05%     | 451.10     |
    458 | S2 Best           | 4014735833 | 899182886  | **77.60%** | 46.84      |
    459 | Snappy            | 4014735833 | 1128706759 | 71.89%     | 790.15     |
    460 | S2, Snappy Output | 4014735833 | 1093823291 | 72.75%     | 936.60     |
    461 | LZ4               | 4014735833 | 1063768713 | 73.50%     | 452.02     |
    462 
    463 S2 delivers both the best single threaded throughput with regular mode and the best compression rate with "best".
    464 "Better" mode provides the same compression speed as LZ4 with better compression ratio. 
    465 
    466 When outputting Snappy compatible output it still delivers better throughput (150MB/s more) and better compression.
    467 
    468 As can be seen from the other benchmarks decompression should also be easier on the S2 generated output.
    469 
    470 Though they cannot be compared due to different decompression speeds here are the speed/size comparisons for
    471 other Go compressors:
    472 
    473 | *                 | Input      | Output     | Reduction | MB/s   |
    474 |-------------------|------------|------------|-----------|--------|
    475 | Zstd Fastest (Go) | 4014735833 | 794608518  | 80.21%    | 236.04 |
    476 | Zstd Best (Go)    | 4014735833 | 704603356  | 82.45%    | 35.63  |
    477 | Deflate (Go) l1   | 4014735833 | 871294239  | 78.30%    | 214.04 |
    478 | Deflate (Go) l9   | 4014735833 | 730389060  | 81.81%    | 41.17  |
    479 
    480 ### Standard block compression
    481 
    482 Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns.
    483 So individual benchmarks should only be seen as a guideline and the overall picture is more important.
    484 
    485 These micro-benchmarks are with data in cache and trained branch predictors. For a more realistic benchmark see the mixed content above. 
    486 
    487 Block compression. Parallel benchmark running on 16 cores, 16 goroutines.
    488 
    489 AMD64 assembly is use for both S2 and Snappy.
    490 
    491 | Absolute Perf         | Snappy size | S2 Size | Snappy Speed | S2 Speed    | Snappy dec  | S2 dec      |
    492 |-----------------------|-------------|---------|--------------|-------------|-------------|-------------|
    493 | html                  | 22843       | 20868   | 16246 MB/s   | 18617 MB/s  | 40972 MB/s  | 49263 MB/s  |
    494 | urls.10K              | 335492      | 286541  | 7943 MB/s    | 10201 MB/s  | 22523 MB/s  | 26484 MB/s  |
    495 | fireworks.jpeg        | 123034      | 123100  | 349544 MB/s  | 303228 MB/s | 718321 MB/s | 827552 MB/s |
    496 | fireworks.jpeg (200B) | 146         | 155     | 8869 MB/s    | 20180 MB/s  | 33691 MB/s  | 52421 MB/s  |
    497 | paper-100k.pdf        | 85304       | 84202   | 167546 MB/s  | 112988 MB/s | 326905 MB/s | 291944 MB/s |
    498 | html_x_4              | 92234       | 20870   | 15194 MB/s   | 54457 MB/s  | 30843 MB/s  | 32217 MB/s  |
    499 | alice29.txt           | 88034       | 85934   | 5936 MB/s    | 6540 MB/s   | 12882 MB/s  | 20044 MB/s  |
    500 | asyoulik.txt          | 77503       | 79575   | 5517 MB/s    | 6657 MB/s   | 12735 MB/s  | 22806 MB/s  |
    501 | lcet10.txt            | 234661      | 220383  | 6235 MB/s    | 6303 MB/s   | 14519 MB/s  | 18697 MB/s  |
    502 | plrabn12.txt          | 319267      | 318196  | 5159 MB/s    | 6074 MB/s   | 11923 MB/s  | 19901 MB/s  |
    503 | geo.protodata         | 23335       | 18606   | 21220 MB/s   | 25432 MB/s  | 56271 MB/s  | 62540 MB/s  |
    504 | kppkn.gtb             | 69526       | 65019   | 9732 MB/s    | 8905 MB/s   | 18491 MB/s  | 18969 MB/s  |
    505 | alice29.txt (128B)    | 80          | 82      | 6691 MB/s    | 17179 MB/s  | 31883 MB/s  | 38874 MB/s  |
    506 | alice29.txt (1000B)   | 774         | 774     | 12204 MB/s   | 13273 MB/s  | 48056 MB/s  | 52341 MB/s  |
    507 | alice29.txt (10000B)  | 6648        | 6933    | 10044 MB/s   | 12824 MB/s  | 32378 MB/s  | 46322 MB/s  |
    508 | alice29.txt (20000B)  | 12686       | 13516   | 7733 MB/s    | 12160 MB/s  | 30566 MB/s  | 58969 MB/s  |
    509 
    510 
    511 Speed is generally at or above Snappy. Small blocks gets a significant speedup, although at the expense of size. 
    512 
    513 Decompression speed is better than Snappy, except in one case. 
    514 
    515 Since payloads are very small the variance in terms of size is rather big, so they should only be seen as a general guideline.
    516 
    517 Size is on average around Snappy, but varies on content type. 
    518 In cases where compression is worse, it usually is compensated by a speed boost. 
    519 
    520 
    521 ### Better compression
    522 
    523 Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns.
    524 So individual benchmarks should only be seen as a guideline and the overall picture is more important.
    525 
    526 | Absolute Perf         | Snappy size | Better Size | Snappy Speed | Better Speed | Snappy dec  | Better dec  |
    527 |-----------------------|-------------|-------------|--------------|--------------|-------------|-------------|
    528 | html                  | 22843       | 18972       | 16246 MB/s   | 8621 MB/s    | 40972 MB/s  | 40292 MB/s  |
    529 | urls.10K              | 335492      | 248079      | 7943 MB/s    | 5104 MB/s    | 22523 MB/s  | 20981 MB/s  |
    530 | fireworks.jpeg        | 123034      | 123100      | 349544 MB/s  | 84429 MB/s   | 718321 MB/s | 823698 MB/s |
    531 | fireworks.jpeg (200B) | 146         | 149         | 8869 MB/s    | 7125 MB/s    | 33691 MB/s  | 30101 MB/s  |
    532 | paper-100k.pdf        | 85304       | 82887       | 167546 MB/s  | 11087 MB/s   | 326905 MB/s | 198869 MB/s |
    533 | html_x_4              | 92234       | 18982       | 15194 MB/s   | 29316 MB/s   | 30843 MB/s  | 30937 MB/s  |
    534 | alice29.txt           | 88034       | 71611       | 5936 MB/s    | 3709 MB/s    | 12882 MB/s  | 16611 MB/s  |
    535 | asyoulik.txt          | 77503       | 65941       | 5517 MB/s    | 3380 MB/s    | 12735 MB/s  | 14975 MB/s  |
    536 | lcet10.txt            | 234661      | 184939      | 6235 MB/s    | 3537 MB/s    | 14519 MB/s  | 16634 MB/s  |
    537 | plrabn12.txt          | 319267      | 264990      | 5159 MB/s    | 2960 MB/s    | 11923 MB/s  | 13382 MB/s  |
    538 | geo.protodata         | 23335       | 17689       | 21220 MB/s   | 10859 MB/s   | 56271 MB/s  | 57961 MB/s  |
    539 | kppkn.gtb             | 69526       | 55398       | 9732 MB/s    | 5206 MB/s    | 18491 MB/s  | 16524 MB/s  |
    540 | alice29.txt (128B)    | 80          | 78          | 6691 MB/s    | 7422 MB/s    | 31883 MB/s  | 34225 MB/s  |
    541 | alice29.txt (1000B)   | 774         | 746         | 12204 MB/s   | 5734 MB/s    | 48056 MB/s  | 42068 MB/s  |
    542 | alice29.txt (10000B)  | 6648        | 6218        | 10044 MB/s   | 6055 MB/s    | 32378 MB/s  | 28813 MB/s  |
    543 | alice29.txt (20000B)  | 12686       | 11492       | 7733 MB/s    | 3143 MB/s    | 30566 MB/s  | 27315 MB/s  |
    544 
    545 
    546 Except for the mostly incompressible JPEG image compression is better and usually in the 
    547 double digits in terms of percentage reduction over Snappy.
    548 
    549 The PDF sample shows a significant slowdown compared to Snappy, as this mode tries harder 
    550 to compress the data. Very small blocks are also not favorable for better compression, so throughput is way down.
    551 
    552 This mode aims to provide better compression at the expense of performance and achieves that 
    553 without a huge performance penalty, except on very small blocks. 
    554 
    555 Decompression speed suffers a little compared to the regular S2 mode, 
    556 but still manages to be close to Snappy in spite of increased compression.  
    557  
    558 # Best compression mode
    559 
    560 S2 offers a "best" compression mode. 
    561 
    562 This will compress as much as possible with little regard to CPU usage.
    563 
    564 Mainly for offline compression, but where decompression speed should still
    565 be high and compatible with other S2 compressed data.
    566 
    567 Some examples compared on 16 core CPU, amd64 assembly used:
    568 
    569 ```
    570 * enwik10
    571 Default... 10000000000 -> 4759950115 [47.60%]; 1.03s, 9263.0MB/s
    572 Better...  10000000000 -> 4084706676 [40.85%]; 2.16s, 4415.4MB/s
    573 Best...    10000000000 -> 3615520079 [36.16%]; 42.259s, 225.7MB/s
    574 
    575 * github-june-2days-2019.json
    576 Default... 6273951764 -> 1041700255 [16.60%]; 431ms, 13882.3MB/s
    577 Better...  6273951764 -> 945841238 [15.08%]; 547ms, 10938.4MB/s
    578 Best...    6273951764 -> 826392576 [13.17%]; 9.455s, 632.8MB/s
    579 
    580 * nyc-taxi-data-10M.csv
    581 Default... 3325605752 -> 1093516949 [32.88%]; 324ms, 9788.7MB/s
    582 Better...  3325605752 -> 885394158 [26.62%]; 491ms, 6459.4MB/s
    583 Best...    3325605752 -> 773681257 [23.26%]; 8.29s, 412.0MB/s
    584 
    585 * 10gb.tar
    586 Default... 10065157632 -> 5915541066 [58.77%]; 1.028s, 9337.4MB/s
    587 Better...  10065157632 -> 5453844650 [54.19%]; 1.597s, 4862.7MB/s
    588 Best...    10065157632 -> 5192495021 [51.59%]; 32.78s, 308.2MB/
    589 
    590 * consensus.db.10gb
    591 Default... 10737418240 -> 4549762344 [42.37%]; 882ms, 12118.4MB/s
    592 Better...  10737418240 -> 4438535064 [41.34%]; 1.533s, 3500.9MB/s
    593 Best...    10737418240 -> 4210602774 [39.21%]; 42.96s, 254.4MB/s
    594 ```
    595 
    596 Decompression speed should be around the same as using the 'better' compression mode. 
    597 
    598 ## Dictionaries
    599 
    600 *Note: S2 dictionary compression is currently at an early implementation stage, with no assembly for
    601 neither encoding nor decoding. Performance improvements can be expected in the future.*
    602 
    603 Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
    604 
    605 The same dictionary *must* be used for both encoding and decoding. 
    606 S2 does not keep track of whether the same dictionary is used,
    607 and using the wrong dictionary will most often not result in an error when decompressing.
    608 
    609 Blocks encoded *without* dictionaries can be decompressed seamlessly *with* a dictionary.
    610 This means it is possible to switch from an encoding without dictionaries to an encoding with dictionaries
    611 and treat the blocks similarly.
    612 
    613 Similar to [zStandard dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression), 
    614 the same usage scenario applies to S2 dictionaries.  
    615 
    616 > Training works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no universal dictionary). Hence, deploying one dictionary per type of data will provide the greatest benefits. Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
    617 
    618 S2 further limits the dictionary to only be enabled on the first 64KB of a block.
    619 This will remove any negative (speed) impacts of the dictionaries on bigger blocks. 
    620 
    621 ### Compression
    622 
    623 Using the [github_users_sample_set](https://github.com/facebook/zstd/releases/download/v1.1.3/github_users_sample_set.tar.zst) 
    624 and a 64KB dictionary trained with zStandard the following sizes can be achieved. 
    625 
    626 |                    | Default          | Better           | Best                  |
    627 |--------------------|------------------|------------------|-----------------------|
    628 | Without Dictionary | 3362023 (44.92%) | 3083163 (41.19%) | 3057944 (40.86%)      |
    629 | With Dictionary    | 921524 (12.31%)  | 873154 (11.67%)  | 785503 bytes (10.49%) |
    630 
    631 So for highly repetitive content, this case provides an almost 3x reduction in size.
    632 
    633 For less uniform data we will use the Go source code tree.
    634 Compressing First 64KB of all `.go` files in `go/src`, Go 1.19.5, 8912 files, 51253563 bytes input:
    635 
    636 |                    | Default           | Better            | Best              |
    637 |--------------------|-------------------|-------------------|-------------------|
    638 | Without Dictionary | 22955767 (44.79%) | 20189613 (39.39%  | 19482828 (38.01%) |
    639 | With Dictionary    | 19654568 (38.35%) | 16289357 (31.78%) | 15184589 (29.63%) |
    640 | Saving/file        | 362 bytes         | 428 bytes         | 472 bytes         |
    641 
    642 
    643 ### Creating Dictionaries
    644 
    645 There are no tools to create dictionaries in S2. 
    646 However, there are multiple ways to create a useful dictionary:
    647 
    648 #### Using a Sample File
    649 
    650 If your input is very uniform, you can just use a sample file as the dictionary.
    651 
    652 For example in the `github_users_sample_set` above, the average compression only goes up from 
    653 10.49% to 11.48% by using the first file as dictionary compared to using a dedicated dictionary.
    654 
    655 ```Go
    656     // Read a sample
    657     sample, err := os.ReadFile("sample.json")
    658 
    659     // Create a dictionary.
    660     dict := s2.MakeDict(sample, nil)
    661 	
    662     // b := dict.Bytes() will provide a dictionary that can be saved
    663     // and reloaded with s2.NewDict(b).
    664 	
    665     // To encode:
    666     encoded := dict.Encode(nil, file)
    667 
    668     // To decode:
    669     decoded, err := dict.Decode(nil, file)
    670 ```
    671 
    672 #### Using Zstandard
    673 
    674 Zstandard dictionaries can easily be converted to S2 dictionaries.
    675 
    676 This can be helpful to generate dictionaries for files that don't have a fixed structure.
    677 
    678 
    679 Example, with training set files  placed in `./training-set`: 
    680 
    681 `λ zstd -r --train-fastcover training-set/* --maxdict=65536 -o name.dict`
    682 
    683 This will create a dictionary of 64KB, that can be converted to a dictionary like this:
    684 
    685 ```Go
    686     // Decode the Zstandard dictionary.
    687     insp, err := zstd.InspectDictionary(zdict)
    688     if err != nil {
    689         panic(err)
    690     }
    691 	
    692     // We are only interested in the contents.
    693     // Assume that files start with "// Copyright (c) 2023".
    694     // Search for the longest match for that.
    695     // This may save a few bytes.
    696     dict := s2.MakeDict(insp.Content(), []byte("// Copyright (c) 2023"))
    697 
    698     // b := dict.Bytes() will provide a dictionary that can be saved
    699     // and reloaded with s2.NewDict(b).
    700 
    701     // We can now encode using this dictionary
    702     encodedWithDict := dict.Encode(nil, payload)
    703 
    704     // To decode content:
    705     decoded, err := dict.Decode(nil, encodedWithDict)
    706 ```
    707 
    708 It is recommended to save the dictionary returned by ` b:= dict.Bytes()`, since that will contain only the S2 dictionary.
    709 
    710 This dictionary can later be loaded using `s2.NewDict(b)`. The dictionary then no longer requires `zstd` to be initialized.
    711 
    712 Also note how `s2.MakeDict` allows you to search for a common starting sequence of your files.
    713 This can be omitted, at the expense of a few bytes.
    714 
    715 # Snappy Compatibility
    716 
    717 S2 now offers full compatibility with Snappy.
    718 
    719 This means that the efficient encoders of S2 can be used to generate fully Snappy compatible output.
    720 
    721 There is a [snappy](https://github.com/klauspost/compress/tree/master/snappy) package that can be used by
    722 simply changing imports from `github.com/golang/snappy` to `github.com/klauspost/compress/snappy`.
    723 This uses "better" mode for all operations.
    724 If you would like more control, you can use the s2 package as described below: 
    725 
    726 ## Blocks
    727 
    728 Snappy compatible blocks can be generated with the S2 encoder. 
    729 Compression and speed is typically a bit better `MaxEncodedLen` is also smaller for smaller memory usage. Replace 
    730 
    731 | Snappy                    | S2 replacement        |
    732 |---------------------------|-----------------------|
    733 | snappy.Encode(...)        | s2.EncodeSnappy(...)  |
    734 | snappy.MaxEncodedLen(...) | s2.MaxEncodedLen(...) |
    735 
    736 `s2.EncodeSnappy` can be replaced with `s2.EncodeSnappyBetter` or `s2.EncodeSnappyBest` to get more efficiently compressed snappy compatible output. 
    737 
    738 `s2.ConcatBlocks` is compatible with snappy blocks.
    739 
    740 Comparison of [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
    741 53927 files, total input size: 4,014,735,833 bytes. amd64, single goroutine used:
    742 
    743 | Encoder               | Size       | MB/s       | Reduction  |
    744 |-----------------------|------------|------------|------------|
    745 | snappy.Encode         | 1128706759 | 725.59     | 71.89%     |
    746 | s2.EncodeSnappy       | 1093823291 | **899.16** | 72.75%     |
    747 | s2.EncodeSnappyBetter | 1001158548 | 578.49     | 75.06%     |
    748 | s2.EncodeSnappyBest   | 944507998  | 66.00      | **76.47%** |
    749 
    750 ## Streams
    751 
    752 For streams, replace `enc = snappy.NewBufferedWriter(w)` with `enc = s2.NewWriter(w, s2.WriterSnappyCompat())`.
    753 All other options are available, but note that block size limit is different for snappy.
    754 
    755 Comparison of different streams, AMD Ryzen 3950x, 16 cores. Size and throughput: 
    756 
    757 | File                        | snappy.NewWriter         | S2 Snappy                 | S2 Snappy, Better        | S2 Snappy, Best         |
    758 |-----------------------------|--------------------------|---------------------------|--------------------------|-------------------------|
    759 | nyc-taxi-data-10M.csv       | 1316042016 - 539.47MB/s  | 1307003093 - 10132.73MB/s | 1174534014 - 5002.44MB/s | 1115904679 - 177.97MB/s |
    760 | enwik10 (xml)               | 5088294643 - 451.13MB/s  | 5175840939 -  9440.69MB/s | 4560784526 - 4487.21MB/s | 4340299103 - 158.92MB/s |
    761 | 10gb.tar (mixed)            | 6056946612 - 729.73MB/s  | 6208571995 -  9978.05MB/s | 5741646126 - 4919.98MB/s | 5548973895 - 180.44MB/s |
    762 | github-june-2days-2019.json | 1525176492 - 933.00MB/s  | 1476519054 - 13150.12MB/s | 1400547532 - 5803.40MB/s | 1321887137 - 204.29MB/s |
    763 | consensus.db.10gb (db)      | 5412897703 - 1102.14MB/s | 5354073487 - 13562.91MB/s | 5335069899 - 5294.73MB/s | 5201000954 - 175.72MB/s |
    764 
    765 # Decompression
    766 
    767 All decompression functions map directly to equivalent s2 functions.
    768 
    769 | Snappy                 | S2 replacement     |
    770 |------------------------|--------------------|
    771 | snappy.Decode(...)     | s2.Decode(...)     |
    772 | snappy.DecodedLen(...) | s2.DecodedLen(...) |
    773 | snappy.NewReader(...)  | s2.NewReader(...)  |
    774 
    775 Features like [quick forward skipping without decompression](https://pkg.go.dev/github.com/klauspost/compress/s2#Reader.Skip)
    776 are also available for Snappy streams.
    777 
    778 If you know you are only decompressing snappy streams, setting [`ReaderMaxBlockSize(64<<10)`](https://pkg.go.dev/github.com/klauspost/compress/s2#ReaderMaxBlockSize)
    779 on your Reader will reduce memory consumption.
    780 
    781 # Concatenating blocks and streams.
    782 
    783 Concatenating streams will concatenate the output of both without recompressing them. 
    784 While this is inefficient in terms of compression it might be usable in certain scenarios. 
    785 The 10 byte 'stream identifier' of the second stream can optionally be stripped, but it is not a requirement.
    786 
    787 Blocks can be concatenated using the `ConcatBlocks` function.
    788 
    789 Snappy blocks/streams can safely be concatenated with S2 blocks and streams.
    790 Streams with indexes (see below) will currently not work on concatenated streams.
    791 
    792 # Stream Seek Index
    793 
    794 S2 and Snappy streams can have indexes. These indexes will allow random seeking within the compressed data.
    795 
    796 The index can either be appended to the stream as a skippable block or returned for separate storage.
    797 
    798 When the index is appended to a stream it will be skipped by regular decoders, 
    799 so the output remains compatible with other decoders. 
    800 
    801 ## Creating an Index
    802 
    803 To automatically add an index to a stream, add `WriterAddIndex()` option to your writer.
    804 Then the index will be added to the stream when `Close()` is called.
    805 
    806 ```
    807 	// Add Index to stream...
    808 	enc := s2.NewWriter(w, s2.WriterAddIndex())
    809 	io.Copy(enc, r)
    810 	enc.Close()
    811 ```
    812 
    813 If you want to store the index separately, you can use `CloseIndex()` instead of the regular `Close()`.
    814 This will return the index. Note that `CloseIndex()` should only be called once, and you shouldn't call `Close()`.
    815 
    816 ```
    817 	// Get index for separate storage... 
    818 	enc := s2.NewWriter(w)
    819 	io.Copy(enc, r)
    820 	index, err := enc.CloseIndex()
    821 ```
    822 
    823 The `index` can then be used needing to read from the stream. 
    824 This means the index can be used without needing to seek to the end of the stream 
    825 or for manually forwarding streams. See below.
    826 
    827 Finally, an existing S2/Snappy stream can be indexed using the `s2.IndexStream(r io.Reader)` function.
    828 
    829 ## Using Indexes
    830 
    831 To use indexes there is a `ReadSeeker(random bool, index []byte) (*ReadSeeker, error)` function available.
    832 
    833 Calling ReadSeeker will return an [io.ReadSeeker](https://pkg.go.dev/io#ReadSeeker) compatible version of the reader.
    834 
    835 If 'random' is specified the returned io.Seeker can be used for random seeking, otherwise only forward seeking is supported.
    836 Enabling random seeking requires the original input to support the [io.Seeker](https://pkg.go.dev/io#Seeker) interface.
    837 
    838 ```
    839 	dec := s2.NewReader(r)
    840 	rs, err := dec.ReadSeeker(false, nil)
    841 	rs.Seek(wantOffset, io.SeekStart)	
    842 ```
    843 
    844 Get a seeker to seek forward. Since no index is provided, the index is read from the stream.
    845 This requires that an index was added and that `r` supports the [io.Seeker](https://pkg.go.dev/io#Seeker) interface.
    846 
    847 A custom index can be specified which will be used if supplied.
    848 When using a custom index, it will not be read from the input stream.
    849 
    850 ```
    851 	dec := s2.NewReader(r)
    852 	rs, err := dec.ReadSeeker(false, index)
    853 	rs.Seek(wantOffset, io.SeekStart)	
    854 ```
    855 
    856 This will read the index from `index`. Since we specify non-random (forward only) seeking `r` does not have to be an io.Seeker
    857 
    858 ```
    859 	dec := s2.NewReader(r)
    860 	rs, err := dec.ReadSeeker(true, index)
    861 	rs.Seek(wantOffset, io.SeekStart)	
    862 ```
    863 
    864 Finally, since we specify that we want to do random seeking `r` must be an io.Seeker. 
    865 
    866 The returned [ReadSeeker](https://pkg.go.dev/github.com/klauspost/compress/s2#ReadSeeker) contains a shallow reference to the existing Reader,
    867 meaning changes performed to one is reflected in the other.
    868 
    869 To check if a stream contains an index at the end, the `(*Index).LoadStream(rs io.ReadSeeker) error` can be used.
    870 
    871 ## Manually Forwarding Streams
    872 
    873 Indexes can also be read outside the decoder using the [Index](https://pkg.go.dev/github.com/klauspost/compress/s2#Index) type.
    874 This can be used for parsing indexes, either separate or in streams.
    875 
    876 In some cases it may not be possible to serve a seekable stream.
    877 This can for instance be an HTTP stream, where the Range request 
    878 is sent at the start of the stream. 
    879 
    880 With a little bit of extra code it is still possible to use indexes
    881 to forward to specific offset with a single forward skip. 
    882 
    883 It is possible to load the index manually like this: 
    884 ```
    885 	var index s2.Index
    886 	_, err = index.Load(idxBytes)
    887 ```
    888 
    889 This can be used to figure out how much to offset the compressed stream:
    890 
    891 ```
    892 	compressedOffset, uncompressedOffset, err := index.Find(wantOffset)
    893 ```
    894 
    895 The `compressedOffset` is the number of bytes that should be skipped 
    896 from the beginning of the compressed file.
    897 
    898 The `uncompressedOffset` will then be offset of the uncompressed bytes returned
    899 when decoding from that position. This will always be <= wantOffset.
    900 
    901 When creating a decoder it must be specified that it should *not* expect a stream identifier
    902 at the beginning of the stream. Assuming the io.Reader `r` has been forwarded to `compressedOffset`
    903 we create the decoder like this:
    904 
    905 ```
    906 	dec := s2.NewReader(r, s2.ReaderIgnoreStreamIdentifier())
    907 ```
    908 
    909 We are not completely done. We still need to forward the stream the uncompressed bytes we didn't want.
    910 This is done using the regular "Skip" function:
    911 
    912 ```
    913 	err = dec.Skip(wantOffset - uncompressedOffset)
    914 ```
    915 
    916 This will ensure that we are at exactly the offset we want, and reading from `dec` will start at the requested offset.
    917 
    918 # Compact storage
    919 
    920 For compact storage [RemoveIndexHeaders](https://pkg.go.dev/github.com/klauspost/compress/s2#RemoveIndexHeaders) can be used to remove any redundant info from 
    921 a serialized index. If you remove the header it must be restored before [Loading](https://pkg.go.dev/github.com/klauspost/compress/s2#Index.Load).
    922 
    923 This is expected to save 20 bytes. These can be restored using [RestoreIndexHeaders](https://pkg.go.dev/github.com/klauspost/compress/s2#RestoreIndexHeaders). This removes a layer of security, but is the most compact representation. Returns nil if headers contains errors.
    924 
    925 ## Index Format:
    926 
    927 Each block is structured as a snappy skippable block, with the chunk ID 0x99.
    928 
    929 The block can be read from the front, but contains information so it can be read from the back as well.
    930 
    931 Numbers are stored as fixed size little endian values or [zigzag encoded](https://developers.google.com/protocol-buffers/docs/encoding#signed_integers) [base 128 varints](https://developers.google.com/protocol-buffers/docs/encoding), 
    932 with un-encoded value length of 64 bits, unless other limits are specified. 
    933 
    934 | Content                              | Format                                                                                                                        |
    935 |--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
    936 | ID, `[1]byte`                        | Always 0x99.                                                                                                                  |
    937 | Data Length, `[3]byte`               | 3 byte little-endian length of the chunk in bytes, following this.                                                            |
    938 | Header `[6]byte`                     | Header, must be `[115, 50, 105, 100, 120, 0]` or in text: "s2idx\x00".                                                        |
    939 | UncompressedSize, Varint             | Total Uncompressed size.                                                                                                      |
    940 | CompressedSize, Varint               | Total Compressed size if known. Should be -1 if unknown.                                                                      |
    941 | EstBlockSize, Varint                 | Block Size, used for guessing uncompressed offsets. Must be >= 0.                                                             |
    942 | Entries, Varint                      | Number of Entries in index, must be < 65536 and >=0.                                                                          |
    943 | HasUncompressedOffsets `byte`        | 0 if no uncompressed offsets are present, 1 if present. Other values are invalid.                                             |
    944 | UncompressedOffsets, [Entries]VarInt | Uncompressed offsets. See below how to decode.                                                                                |
    945 | CompressedOffsets, [Entries]VarInt   | Compressed offsets. See below how to decode.                                                                                  |
    946 | Block Size, `[4]byte`                | Little Endian total encoded size (including header and trailer). Can be used for searching backwards to start of block.       |
    947 | Trailer `[6]byte`                    | Trailer, must be `[0, 120, 100, 105, 50, 115]` or in text: "\x00xdi2s". Can be used for identifying block from end of stream. |
    948 
    949 For regular streams the uncompressed offsets are fully predictable,
    950 so `HasUncompressedOffsets` allows to specify that compressed blocks all have 
    951 exactly `EstBlockSize` bytes of uncompressed content.
    952 
    953 Entries *must* be in order, starting with the lowest offset, 
    954 and there *must* be no uncompressed offset duplicates.  
    955 Entries *may* point to the start of a skippable block, 
    956 but it is then not allowed to also have an entry for the next block since 
    957 that would give an uncompressed offset duplicate.
    958 
    959 There is no requirement for all blocks to be represented in the index. 
    960 In fact there is a maximum of 65536 block entries in an index.
    961 
    962 The writer can use any method to reduce the number of entries.
    963 An implicit block start at 0,0 can be assumed.
    964 
    965 ### Decoding entries:
    966 
    967 ```
    968 // Read Uncompressed entries.
    969 // Each assumes EstBlockSize delta from previous.
    970 for each entry {
    971     uOff = 0
    972     if HasUncompressedOffsets == 1 {
    973         uOff = ReadVarInt // Read value from stream
    974     }
    975    
    976     // Except for the first entry, use previous values.
    977     if entryNum == 0 {
    978         entry[entryNum].UncompressedOffset = uOff
    979         continue
    980     }
    981     
    982     // Uncompressed uses previous offset and adds EstBlockSize
    983     entry[entryNum].UncompressedOffset = entry[entryNum-1].UncompressedOffset + EstBlockSize + uOff
    984 }
    985 
    986 
    987 // Guess that the first block will be 50% of uncompressed size.
    988 // Integer truncating division must be used.
    989 CompressGuess := EstBlockSize / 2
    990 
    991 // Read Compressed entries.
    992 // Each assumes CompressGuess delta from previous.
    993 // CompressGuess is adjusted for each value.
    994 for each entry {
    995     cOff = ReadVarInt // Read value from stream
    996     
    997     // Except for the first entry, use previous values.
    998     if entryNum == 0 {
    999         entry[entryNum].CompressedOffset = cOff
   1000         continue
   1001     }
   1002     
   1003     // Compressed uses previous and our estimate.
   1004     entry[entryNum].CompressedOffset = entry[entryNum-1].CompressedOffset + CompressGuess + cOff
   1005         
   1006      // Adjust compressed offset for next loop, integer truncating division must be used. 
   1007      CompressGuess += cOff/2               
   1008 }
   1009 ```
   1010 
   1011 To decode from any given uncompressed offset `(wantOffset)`:
   1012 
   1013 * Iterate entries until `entry[n].UncompressedOffset > wantOffset`.
   1014 * Start decoding from `entry[n-1].CompressedOffset`.
   1015 * Discard `entry[n-1].UncompressedOffset - wantOffset` bytes from the decoded stream.
   1016 
   1017 See [using indexes](https://github.com/klauspost/compress/tree/master/s2#using-indexes) for functions that perform the operations with a simpler interface.
   1018 
   1019 
   1020 # Format Extensions
   1021 
   1022 * Frame [Stream identifier](https://github.com/google/snappy/blob/master/framing_format.txt#L68) changed from `sNaPpY` to `S2sTwO`.
   1023 * [Framed compressed blocks](https://github.com/google/snappy/blob/master/format_description.txt) can be up to 4MB (up from 64KB).
   1024 * Compressed blocks can have an offset of `0`, which indicates to repeat the last seen offset.
   1025 
   1026 Repeat offsets must be encoded as a [2.2.1. Copy with 1-byte offset (01)](https://github.com/google/snappy/blob/master/format_description.txt#L89), where the offset is 0.
   1027 
   1028 The length is specified by reading the 3-bit length specified in the tag and decode using this table:
   1029 
   1030 | Length | Actual Length        |
   1031 |--------|----------------------|
   1032 | 0      | 4                    |
   1033 | 1      | 5                    |
   1034 | 2      | 6                    |
   1035 | 3      | 7                    |
   1036 | 4      | 8                    |
   1037 | 5      | 8 + read 1 byte      |
   1038 | 6      | 260 + read 2 bytes   |
   1039 | 7      | 65540 + read 3 bytes |
   1040 
   1041 This allows any repeat offset + length to be represented by 2 to 5 bytes.
   1042 It also allows to emit matches longer than 64 bytes with one copy + one repeat instead of several 64 byte copies.
   1043 
   1044 Lengths are stored as little endian values.
   1045 
   1046 The first copy of a block cannot be a repeat offset and the offset is reset on every block in streams.
   1047 
   1048 Default streaming block size is 1MB.
   1049 
   1050 # Dictionary Encoding
   1051 
   1052 Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
   1053 
   1054 A dictionary provides an initial repeat value that can be used to point to a common header.
   1055 
   1056 Other than that the dictionary contains values that can be used as back-references.
   1057 
   1058 Often used data should be placed at the *end* of the dictionary since offsets < 2048 bytes will be smaller.
   1059 
   1060 ## Format
   1061 
   1062 Dictionary *content* must at least 16 bytes and less or equal to 64KiB (65536 bytes).
   1063 
   1064 Encoding: `[repeat value (uvarint)][dictionary content...]`
   1065 
   1066 Before the dictionary content, an unsigned base-128 (uvarint) encoded value specifying the initial repeat offset.
   1067 This value is an offset into the dictionary content and not a back-reference offset,
   1068 so setting this to 0 will make the repeat value point to the first value of the dictionary.
   1069 
   1070 The value must be less than the dictionary length-8
   1071 
   1072 ## Encoding
   1073 
   1074 From the decoder point of view the dictionary content is seen as preceding the encoded content.
   1075 
   1076 `[dictionary content][decoded output]`
   1077 
   1078 Backreferences to the dictionary are encoded as ordinary backreferences that have an offset before the start of the decoded block.
   1079 
   1080 Matches copying from the dictionary are **not** allowed to cross from the dictionary into the decoded data.
   1081 However, if a copy ends at the end of the dictionary the next repeat will point to the start of the decoded buffer, which is allowed.
   1082 
   1083 The first match can be a repeat value, which will use the repeat offset stored in the dictionary.
   1084 
   1085 When 64KB (65536 bytes) has been en/decoded it is no longer allowed to reference the dictionary, 
   1086 neither by a copy nor repeat operations. 
   1087 If the boundary is crossed while copying from the dictionary, the operation should complete, 
   1088 but the next instruction is not allowed to reference the dictionary.
   1089 
   1090 Valid blocks encoded *without* a dictionary can be decoded with any dictionary. 
   1091 There are no checks whether the supplied dictionary is the correct for a block.
   1092 Because of this there is no overhead by using a dictionary.
   1093 
   1094 ## Example
   1095 
   1096 This is the dictionary content. Elements are separated by `[]`.
   1097 
   1098 Dictionary: `[0x0a][Yesterday 25 bananas were added to Benjamins brown bag]`.
   1099 
   1100 Initial repeat offset is set at 10, which is the letter `2`.
   1101 
   1102 Encoded `[LIT "10"][REPEAT len=10][LIT "hich"][MATCH off=50 len=6][MATCH off=31 len=6][MATCH off=61 len=10]`
   1103 
   1104 Decoded: `[10][ bananas w][hich][ were ][brown ][were added]`
   1105 
   1106 Output: `10 bananas which were brown were added`
   1107 
   1108 
   1109 ## Streams
   1110 
   1111 For streams each block can use the dictionary.
   1112 
   1113 The dictionary cannot not currently be provided on the stream.
   1114 
   1115 
   1116 # LICENSE
   1117 
   1118 This code is based on the [Snappy-Go](https://github.com/golang/snappy) implementation.
   1119 
   1120 Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.