gtsocial-umbx

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | README | LICENSE

README.md (20576B)


      1 # Sonic
      2 
      3 English | [中文](README_ZH_CN.md)
      4 
      5 A blazingly fast JSON serializing & deserializing library, accelerated by JIT (just-in-time compiling) and SIMD (single-instruction-multiple-data).
      6 
      7 ## Requirement
      8 - Go 1.15~1.20
      9 - Linux/MacOS/Windows
     10 - Amd64 ARCH
     11 
     12 ## Features
     13 - Runtime object binding without code generation
     14 - Complete APIs for JSON value manipulation
     15 - Fast, fast, fast!
     16 
     17 ## Benchmarks
     18 For **all sizes** of json and **all scenarios** of usage, **Sonic performs best**.
     19 - [Medium](https://github.com/bytedance/sonic/blob/main/decoder/testdata_test.go#L19) (13KB, 300+ key, 6 layers)
     20 ```powershell
     21 goversion: 1.17.1
     22 goos: darwin
     23 goarch: amd64
     24 cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
     25 BenchmarkEncoder_Generic_Sonic-16                      32393 ns/op         402.40 MB/s       11965 B/op          4 allocs/op
     26 BenchmarkEncoder_Generic_Sonic_Fast-16                 21668 ns/op         601.57 MB/s       10940 B/op          4 allocs/op
     27 BenchmarkEncoder_Generic_JsonIter-16                   42168 ns/op         309.12 MB/s       14345 B/op        115 allocs/op
     28 BenchmarkEncoder_Generic_GoJson-16                     65189 ns/op         199.96 MB/s       23261 B/op         16 allocs/op
     29 BenchmarkEncoder_Generic_StdLib-16                    106322 ns/op         122.60 MB/s       49136 B/op        789 allocs/op
     30 BenchmarkEncoder_Binding_Sonic-16                       6269 ns/op        2079.26 MB/s       14173 B/op          4 allocs/op
     31 BenchmarkEncoder_Binding_Sonic_Fast-16                  5281 ns/op        2468.16 MB/s       12322 B/op          4 allocs/op
     32 BenchmarkEncoder_Binding_JsonIter-16                   20056 ns/op         649.93 MB/s        9488 B/op          2 allocs/op
     33 BenchmarkEncoder_Binding_GoJson-16                      8311 ns/op        1568.32 MB/s        9481 B/op          1 allocs/op
     34 BenchmarkEncoder_Binding_StdLib-16                     16448 ns/op         792.52 MB/s        9479 B/op          1 allocs/op
     35 BenchmarkEncoder_Parallel_Generic_Sonic-16              6681 ns/op        1950.93 MB/s       12738 B/op          4 allocs/op
     36 BenchmarkEncoder_Parallel_Generic_Sonic_Fast-16         4179 ns/op        3118.99 MB/s       10757 B/op          4 allocs/op
     37 BenchmarkEncoder_Parallel_Generic_JsonIter-16           9861 ns/op        1321.84 MB/s       14362 B/op        115 allocs/op
     38 BenchmarkEncoder_Parallel_Generic_GoJson-16            18850 ns/op         691.52 MB/s       23278 B/op         16 allocs/op
     39 BenchmarkEncoder_Parallel_Generic_StdLib-16            45902 ns/op         283.97 MB/s       49174 B/op        789 allocs/op
     40 BenchmarkEncoder_Parallel_Binding_Sonic-16              1480 ns/op        8810.09 MB/s       13049 B/op          4 allocs/op
     41 BenchmarkEncoder_Parallel_Binding_Sonic_Fast-16         1209 ns/op        10785.23 MB/s      11546 B/op          4 allocs/op
     42 BenchmarkEncoder_Parallel_Binding_JsonIter-16           6170 ns/op        2112.58 MB/s        9504 B/op          2 allocs/op
     43 BenchmarkEncoder_Parallel_Binding_GoJson-16             3321 ns/op        3925.52 MB/s        9496 B/op          1 allocs/op
     44 BenchmarkEncoder_Parallel_Binding_StdLib-16             3739 ns/op        3486.49 MB/s        9480 B/op          1 allocs/op
     45 
     46 BenchmarkDecoder_Generic_Sonic-16                      66812 ns/op         195.10 MB/s       57602 B/op        723 allocs/op
     47 BenchmarkDecoder_Generic_Sonic_Fast-16                 54523 ns/op         239.07 MB/s       49786 B/op        313 allocs/op
     48 BenchmarkDecoder_Generic_StdLib-16                    124260 ns/op         104.90 MB/s       50869 B/op        772 allocs/op
     49 BenchmarkDecoder_Generic_JsonIter-16                   91274 ns/op         142.81 MB/s       55782 B/op       1068 allocs/op
     50 BenchmarkDecoder_Generic_GoJson-16                     88569 ns/op         147.17 MB/s       66367 B/op        973 allocs/op
     51 BenchmarkDecoder_Binding_Sonic-16                      32557 ns/op         400.38 MB/s       28302 B/op        137 allocs/op
     52 BenchmarkDecoder_Binding_Sonic_Fast-16                 28649 ns/op         455.00 MB/s       24999 B/op         34 allocs/op
     53 BenchmarkDecoder_Binding_StdLib-16                    111437 ns/op         116.97 MB/s       10576 B/op        208 allocs/op
     54 BenchmarkDecoder_Binding_JsonIter-16                   35090 ns/op         371.48 MB/s       14673 B/op        385 allocs/op
     55 BenchmarkDecoder_Binding_GoJson-16                     28738 ns/op         453.59 MB/s       22039 B/op         49 allocs/op
     56 BenchmarkDecoder_Parallel_Generic_Sonic-16             12321 ns/op        1057.91 MB/s       57233 B/op        723 allocs/op
     57 BenchmarkDecoder_Parallel_Generic_Sonic_Fast-16        10644 ns/op        1224.64 MB/s       49362 B/op        313 allocs/op
     58 BenchmarkDecoder_Parallel_Generic_StdLib-16            57587 ns/op         226.35 MB/s       50874 B/op        772 allocs/op
     59 BenchmarkDecoder_Parallel_Generic_JsonIter-16          38666 ns/op         337.12 MB/s       55789 B/op       1068 allocs/op
     60 BenchmarkDecoder_Parallel_Generic_GoJson-16            30259 ns/op         430.79 MB/s       66370 B/op        974 allocs/op
     61 BenchmarkDecoder_Parallel_Binding_Sonic-16              5965 ns/op        2185.28 MB/s       27747 B/op        137 allocs/op
     62 BenchmarkDecoder_Parallel_Binding_Sonic_Fast-16         5170 ns/op        2521.31 MB/s       24715 B/op         34 allocs/op
     63 BenchmarkDecoder_Parallel_Binding_StdLib-16            27582 ns/op         472.58 MB/s       10576 B/op        208 allocs/op
     64 BenchmarkDecoder_Parallel_Binding_JsonIter-16          13571 ns/op         960.51 MB/s       14685 B/op        385 allocs/op
     65 BenchmarkDecoder_Parallel_Binding_GoJson-16            10031 ns/op        1299.51 MB/s       22111 B/op         49 allocs/op
     66 
     67 BenchmarkGetOne_Sonic-16                                3276 ns/op        3975.78 MB/s          24 B/op          1 allocs/op
     68 BenchmarkGetOne_Gjson-16                                9431 ns/op        1380.81 MB/s           0 B/op          0 allocs/op
     69 BenchmarkGetOne_Jsoniter-16                            51178 ns/op         254.46 MB/s       27936 B/op        647 allocs/op
     70 BenchmarkGetOne_Parallel_Sonic-16                      216.7 ns/op       60098.95 MB/s          24 B/op          1 allocs/op
     71 BenchmarkGetOne_Parallel_Gjson-16                       1076 ns/op        12098.62 MB/s          0 B/op          0 allocs/op
     72 BenchmarkGetOne_Parallel_Jsoniter-16                   17741 ns/op         734.06 MB/s       27945 B/op        647 allocs/op
     73 BenchmarkSetOne_Sonic-16                               9571 ns/op         1360.61 MB/s        1584 B/op         17 allocs/op
     74 BenchmarkSetOne_Sjson-16                               36456 ns/op         357.22 MB/s       52180 B/op          9 allocs/op
     75 BenchmarkSetOne_Jsoniter-16                            79475 ns/op         163.86 MB/s       45862 B/op        964 allocs/op
     76 BenchmarkSetOne_Parallel_Sonic-16                      850.9 ns/op       15305.31 MB/s        1584 B/op         17 allocs/op
     77 BenchmarkSetOne_Parallel_Sjson-16                      18194 ns/op         715.77 MB/s       52247 B/op          9 allocs/op
     78 BenchmarkSetOne_Parallel_Jsoniter-16                   33560 ns/op         388.05 MB/s       45892 B/op        964 allocs/op
     79 ```
     80 - [Small](https://github.com/bytedance/sonic/blob/main/testdata/small.go) (400B, 11 keys, 3 layers)
     81 ![small benchmarks](./docs/imgs/bench-small.png)
     82 - [Large](https://github.com/bytedance/sonic/blob/main/testdata/twitter.json) (635KB, 10000+ key, 6 layers)
     83 ![large benchmarks](./docs/imgs/bench-large.png)
     84 
     85 See [bench.sh](https://github.com/bytedance/sonic/blob/main/bench.sh) for benchmark codes.
     86 
     87 ## How it works
     88 See [INTRODUCTION.md](./docs/INTRODUCTION.md).
     89 
     90 ## Usage
     91 
     92 ### Marshal/Unmarshal
     93 
     94 Default behaviors are mostly consistent with `encoding/json`, except HTML escaping form (see [Escape HTML](https://github.com/bytedance/sonic/blob/main/README.md#escape-html)) and `SortKeys` feature (optional support see [Sort Keys](https://github.com/bytedance/sonic/blob/main/README.md#sort-keys)) that is **NOT** in conformity to [RFC8259](https://datatracker.ietf.org/doc/html/rfc8259).
     95  ```go
     96 import "github.com/bytedance/sonic"
     97 
     98 var data YourSchema
     99 // Marshal
    100 output, err := sonic.Marshal(&data)
    101 // Unmarshal
    102 err := sonic.Unmarshal(output, &data)
    103  ```
    104 
    105 ### Streaming IO
    106 Sonic supports decoding json from `io.Reader` or encoding objects into `io.`Writer`, aims at handling multiple values as well as reducing memory consumption.
    107 - encoder
    108 ```go
    109 var o1 = map[string]interface{}{
    110     "a": "b",
    111 }
    112 var o2 = 1
    113 var w = bytes.NewBuffer(nil)
    114 var enc = sonic.ConfigDefault.NewEncoder(w)
    115 enc.Encode(o1)
    116 enc.Encode(o2)
    117 fmt.Println(w.String())
    118 // Output:
    119 // {"a":"b"}
    120 // 1
    121 ```
    122 - decoder
    123 ```go
    124 var o =  map[string]interface{}{}
    125 var r = strings.NewReader(`{"a":"b"}{"1":"2"}`)
    126 var dec = sonic.ConfigDefault.NewDecoder(r)
    127 dec.Decode(&o)
    128 dec.Decode(&o)
    129 fmt.Printf("%+v", o)
    130 // Output:
    131 // map[1:2 a:b]
    132 ```
    133 
    134 ### Use Number/Use Int64
    135  ```go
    136 import "github.com/bytedance/sonic/decoder"
    137 
    138 var input = `1`
    139 var data interface{}
    140 
    141 // default float64
    142 dc := decoder.NewDecoder(input)
    143 dc.Decode(&data) // data == float64(1)
    144 // use json.Number
    145 dc = decoder.NewDecoder(input)
    146 dc.UseNumber()
    147 dc.Decode(&data) // data == json.Number("1")
    148 // use int64
    149 dc = decoder.NewDecoder(input)
    150 dc.UseInt64()
    151 dc.Decode(&data) // data == int64(1)
    152 
    153 root, err := sonic.GetFromString(input)
    154 // Get json.Number
    155 jn := root.Number()
    156 jm := root.InterfaceUseNumber().(json.Number) // jn == jm
    157 // Get float64
    158 fn := root.Float64()
    159 fm := root.Interface().(float64) // jn == jm
    160  ```
    161 
    162 ### Sort Keys
    163 On account of the performance loss from sorting (roughly 10%), sonic doesn't enable this feature by default. If your component depends on it to work (like [zstd](https://github.com/facebook/zstd)), Use it like this:
    164 ```go
    165 import "github.com/bytedance/sonic"
    166 import "github.com/bytedance/sonic/encoder"
    167 
    168 // Binding map only
    169 m := map[string]interface{}{}
    170 v, err := encoder.Encode(m, encoder.SortMapKeys)
    171 
    172 // Or ast.Node.SortKeys() before marshal
    173 var root := sonic.Get(JSON)
    174 err := root.SortKeys()
    175 ```
    176 ### Escape HTML
    177 On account of the performance loss (roughly 15%), sonic doesn't enable this feature by default. You can use `encoder.EscapeHTML` option to open this feature (align with `encoding/json.HTMLEscape`).
    178 ```go
    179 import "github.com/bytedance/sonic"
    180 
    181 v := map[string]string{"&&":"<>"}
    182 ret, err := Encode(v, EscapeHTML) // ret == `{"\u0026\u0026":{"X":"\u003c\u003e"}}`
    183 ```
    184 ### Compact Format
    185 Sonic encodes primitive objects (struct/map...) as compact-format JSON by default, except marshaling `json.RawMessage` or `json.Marshaler`: sonic ensures validating their output JSON but **DONOT** compacting them for performance concerns. We provide the option `encoder.CompactMarshaler` to add compacting process.
    186 
    187 ### Print Error
    188 If there invalid syntax in input JSON, sonic will return `decoder.SyntaxError`, which supports pretty-printing of error position
    189 ```go
    190 import "github.com/bytedance/sonic"
    191 import "github.com/bytedance/sonic/decoder"
    192 
    193 var data interface{}
    194 err := sonic.UnmarshalString("[[[}]]", &data)
    195 if err != nil {
    196     /* One line by default */
    197     println(e.Error()) // "Syntax error at index 3: invalid char\n\n\t[[[}]]\n\t...^..\n"
    198     /* Pretty print */
    199     if e, ok := err.(decoder.SyntaxError); ok {
    200         /*Syntax error at index 3: invalid char
    201 
    202             [[[}]]
    203             ...^..
    204         */
    205         print(e.Description())
    206     } else if me, ok := err.(*decoder.MismatchTypeError); ok {
    207         // decoder.MismatchTypeError is new to Sonic v1.6.0
    208         print(me.Description())
    209     }
    210 }
    211 ```
    212 
    213 #### Mismatched Types [Sonic v1.6.0]
    214 If there a **mismatch-typed** value for a given key, sonic will report `decoder.MismatchTypeError` (if there are many, report the last one), but still skip wrong the value and keep decoding next JSON.
    215 ```go
    216 import "github.com/bytedance/sonic"
    217 import "github.com/bytedance/sonic/decoder"
    218 
    219 var data = struct{
    220     A int
    221     B int
    222 }{}
    223 err := UnmarshalString(`{"A":"1","B":1}`, &data)
    224 println(err.Error())    // Mismatch type int with value string "at index 5: mismatched type with value\n\n\t{\"A\":\"1\",\"B\":1}\n\t.....^.........\n"
    225 fmt.Printf("%+v", data) // {A:0 B:1}
    226 ```
    227 ### Ast.Node
    228 Sonic/ast.Node is a completely self-contained AST for JSON. It implements serialization and deserialization both and provides robust APIs for obtaining and modification of generic data.
    229 #### Get/Index
    230 Search partial JSON by given paths, which must be non-negative integer or string, or nil
    231 ```go
    232 import "github.com/bytedance/sonic"
    233 
    234 input := []byte(`{"key1":[{},{"key2":{"key3":[1,2,3]}}]}`)
    235 
    236 // no path, returns entire json
    237 root, err := sonic.Get(input)
    238 raw := root.Raw() // == string(input)
    239 
    240 // multiple paths
    241 root, err := sonic.Get(input, "key1", 1, "key2")
    242 sub := root.Get("key3").Index(2).Int64() // == 3
    243 ```
    244 **Tip**: since `Index()` uses offset to locate data, which is much faster than scanning like `Get()`, we suggest you use it as much as possible. And sonic also provides another API `IndexOrGet()` to underlying use offset as well as ensure the key is matched.
    245 
    246 #### Set/Unset
    247 Modify the json content by Set()/Unset()
    248 ```go
    249 import "github.com/bytedance/sonic"
    250 
    251 // Set
    252 exist, err := root.Set("key4", NewBool(true)) // exist == false
    253 alias1 := root.Get("key4")
    254 println(alias1.Valid()) // true
    255 alias2 := root.Index(1)
    256 println(alias1 == alias2) // true
    257 
    258 // Unset
    259 exist, err := root.UnsetByIndex(1) // exist == true
    260 println(root.Get("key4").Check()) // "value not exist"
    261 ```
    262 
    263 #### Serialize
    264 To encode `ast.Node` as json, use `MarshalJson()` or `json.Marshal()` (MUST pass the node's pointer)
    265 ```go
    266 import (
    267     "encoding/json"
    268     "github.com/bytedance/sonic"
    269 )
    270 
    271 buf, err := root.MarshalJson()
    272 println(string(buf))                // {"key1":[{},{"key2":{"key3":[1,2,3]}}]}
    273 exp, err := json.Marshal(&root)     // WARN: use pointer
    274 println(string(buf) == string(exp)) // true
    275 ```
    276 
    277 #### APIs
    278 - validation: `Check()`, `Error()`, `Valid()`, `Exist()`
    279 - searching: `Index()`, `Get()`, `IndexPair()`, `IndexOrGet()`, `GetByPath()`
    280 - go-type casting: `Int64()`, `Float64()`, `String()`, `Number()`, `Bool()`, `Map[UseNumber|UseNode]()`, `Array[UseNumber|UseNode]()`, `Interface[UseNumber|UseNode]()`
    281 - go-type packing: `NewRaw()`, `NewNumber()`, `NewNull()`, `NewBool()`, `NewString()`, `NewObject()`, `NewArray()`
    282 - iteration: `Values()`, `Properties()`, `ForEach()`, `SortKeys()`
    283 - modification: `Set()`, `SetByIndex()`, `Add()`
    284 
    285 ## Compatibility
    286 Sonic **DOES NOT** ensure to support all environments, due to the difficulty of developing high-performance codes. For developers who use sonic to build their applications in different environments, we have the following suggestions:
    287 
    288 - Developing on **Mac M1**: Make sure you have Rosetta 2 installed on your machine, and set `GOARCH=amd64` when building your application. Rosetta 2 can automatically translate x86 binaries to arm64 binaries and run x86 applications on Mac M1.
    289 - Developing on **Linux arm64**: You can install qemu and use the `qemu-x86_64 -cpu max` command to convert x86 binaries to amr64 binaries for applications built with sonic. The qemu can achieve a similar transfer effect to Rosetta 2 on Mac M1.
    290 
    291 For developers who want to use sonic on Linux arm64 without qemu, or those who want to handle JSON strictly consistent with `encoding/json`, we provide some compatible APIs as `sonic.API`
    292 - `ConfigDefault`: the sonic's default config (`EscapeHTML=false`,`SortKeys=false`...) to run on sonic-supporting environment. It will fall back to `encoding/json` with the corresponding config, and some options like `SortKeys=false` will be invalid.
    293 - `ConfigStd`: the std-compatible config (`EscapeHTML=true`,`SortKeys=true`...) to run on sonic-supporting environment. It will fall back to `encoding/json`.
    294 - `ConfigFastest`: the fastest config (`NoQuoteTextMarshaler=true`) to run on sonic-supporting environment. It will fall back to `encoding/json` with the corresponding config, and some options will be invalid.
    295 
    296 ## Tips
    297 
    298 ### Pretouch
    299 Since Sonic uses [golang-asm](https://github.com/twitchyliquid64/golang-asm) as a JIT assembler, which is NOT very suitable for runtime compiling, first-hit running of a huge schema may cause request-timeout or even process-OOM. For better stability, we advise **using `Pretouch()` for huge-schema or compact-memory applications** before `Marshal()/Unmarshal()`.
    300 ```go
    301 import (
    302     "reflect"
    303     "github.com/bytedance/sonic"
    304     "github.com/bytedance/sonic/option"
    305 )
    306 
    307 func init() {
    308     var v HugeStruct
    309 
    310     // For most large types (nesting depth <= option.DefaultMaxInlineDepth)
    311     err := sonic.Pretouch(reflect.TypeOf(v))
    312 
    313     // with more CompileOption...
    314     err := sonic.Pretouch(reflect.TypeOf(v), 
    315         // If the type is too deep nesting (nesting depth > option.DefaultMaxInlineDepth),
    316         // you can set compile recursive loops in Pretouch for better stability in JIT.
    317         option.WithCompileRecursiveDepth(loop),
    318         // For a large nested struct, try to set a smaller depth to reduce compiling time.
    319         option.WithCompileMaxInlineDepth(depth),
    320     )
    321 }
    322 ```
    323 
    324 ### Copy string
    325 When decoding **string values without any escaped characters**, sonic references them from the origin JSON buffer instead of mallocing a new buffer to copy. This helps a lot for CPU performance but may leave the whole JSON buffer in memory as long as the decoded objects are being used. In practice, we found the extra memory introduced by referring JSON buffer is usually 20% ~ 80% of decoded objects. Once an application holds these objects for a long time (for example, cache the decoded objects for reusing), its in-use memory on the server may go up. We provide the option `decoder.CopyString()` for users to choose not to reference the JSON buffer, which may cause a decline in CPU performance to some degree.
    326 
    327 ### Pass string or []byte?
    328 For alignment to `encoding/json`, we provide API to pass `[]byte` as an argument, but the string-to-bytes copy is conducted at the same time considering safety, which may lose performance when the origin JSON is huge. Therefore, you can use `UnmarshalString()` and `GetFromString()` to pass a string, as long as your origin data is a string or **nocopy-cast** is safe for your []byte. We also provide API `MarshalString()` for convenient **nocopy-cast** of encoded JSON []byte, which is safe since sonic's output bytes is always duplicated and unique.
    329 
    330 ### Accelerate `encoding.TextMarshaler`
    331 To ensure data security, sonic.Encoder quotes and escapes string values from `encoding.TextMarshaler` interfaces by default, which may degrade performance much if most of your data is in form of them. We provide `encoder.NoQuoteTextMarshaler` to skip these operations, which means you **MUST** ensure their output string escaped and quoted following [RFC8259](https://datatracker.ietf.org/doc/html/rfc8259).
    332 
    333 
    334 ### Better performance for generic data
    335 In **fully-parsed** scenario, `Unmarshal()` performs better than `Get()`+`Node.Interface()`. But if you only have a part of the schema for specific json, you can combine `Get()` and `Unmarshal()` together:
    336 ```go
    337 import "github.com/bytedance/sonic"
    338 
    339 node, err := sonic.GetFromString(_TwitterJson, "statuses", 3, "user")
    340 var user User // your partial schema...
    341 err = sonic.UnmarshalString(node.Raw(), &user)
    342 ```
    343 Even if you don't have any schema, use `ast.Node` as the container of generic values instead of `map` or `interface`:
    344 ```go
    345 import "github.com/bytedance/sonic"
    346 
    347 root, err := sonic.GetFromString(_TwitterJson)
    348 user := root.GetByPath("statuses", 3, "user")  // === root.Get("status").Index(3).Get("user")
    349 err = user.Check()
    350 
    351 // err = user.LoadAll() // only call this when you want to use 'user' concurrently...
    352 go someFunc(user)
    353 ```
    354 Why? Because `ast.Node` stores its children using `array`:
    355 - `Array`'s performance is **much better** than `Map` when Inserting (Deserialize) and Scanning (Serialize) data;
    356 - **Hashing** (`map[x]`) is not as efficient as **Indexing** (`array[x]`), which `ast.Node` can conduct on **both array and object**;
    357 - Using `Interface()`/`Map()` means Sonic must parse all the underlying values, while `ast.Node` can parse them **on demand**.
    358 
    359 **CAUTION:** `ast.Node` **DOESN'T** ensure concurrent security directly, due to its **lazy-load** design. However, you can call `Node.Load()`/`Node.LoadAll()` to achieve that, which may bring performance reduction while it still works faster than converting to `map` or `interface{}`
    360 
    361 ## Community
    362 Sonic is a subproject of [CloudWeGo](https://www.cloudwego.io/). We are committed to building a cloud native ecosystem.