JS>> gson>> 返回
项目作者: bnclabs

项目描述 :
Algorithms on data formats - JSON, CBOR, Collation.
高级语言: Go
项目地址: git://github.com/bnclabs/gson.git
创建时间: 2014-10-02T06:30:43Z
项目社区:https://github.com/bnclabs/gson

开源协议:MIT License

下载


Object formats and notations

Build Status
Coverage Status
GoDoc
Go Report Card
GitPitch

  • High performance algorithms for data transformation, serialization and
    manipulation.
  • Based on well established standards.
  • ZERO allocation when transforming from one format to another, except
    for APIs creating golang values from encoded data.
  • JSON for web.
  • CBOR for machine.
  • Binary-Collation for crazy fast comparison/sorting.

This package is under continuous development, but the APIs are fairly stable.

What is what

JSON

  • Java Script Object Notation, also called JSON,
    RFC-7159
  • Fast becoming the internet standard for data exchange.
  • Human readable format, not so friendly for machine representation.

Value (aka gson)

  • Golang object parsed from JSON, CBOR or collate representation.
  • JSON arrays are represented in golang as []interface{}.
  • JSON objects, aka properties, are presented in golang as
    map[string]interface{}.
  • Following golang-types can be transformed to JSON, CBOR, or,
    Binary-collation - nil, bool,
    byte, int8, int16, uint16, int32, uint32, int, uint, int64, uint64,
    float32, float64,
    string, []interface{}, map[string]interface{},
    [][2]interface{}.
  • For type [][2]interface{}, first item is treated as key (string) and
    second item is treated as value, hence equivalent to
    map[string]interface{}.
  • Gson objects support operations like, Get(), Set(), and
    Delete() on individual fields located by the json-pointer.

CBOR

  • Concise Binary Object Representation, also called CBOR,
    RFC-7049link.
  • Machine friendly, designed for IoT, inter-networking of light weight
    devices, and easy to implement in many languages.
  • Can be used for more than data exchange, left to user
    imagination :) …

Binary-Collation

  • A custom encoding based on a paper and improvised to
    handle JSON specification.
  • Binary representation preserving the sort order.
  • Transform back to original JSON from binary representation.
  • Numbers can be treated as floating-point, for better performance or either as
    floating-point or integer, for flexibility.
  • More details can be found here

JSON-Pointer

  • URL like field locator within a JSON object, RFC-6901.
  • For navigating through JSON arrays and objects, but to any level of nesting.
  • JSON-pointers shall be unquoted before they are used as path into
    JSON document.
  • Documents encoded in CBOR format using LengthPrefix are not
    supported by lookup APIs.

Performance and memory pressure

Following Benchmark is made on a map data which has a shape similar to:

  1. {"key1": nil, "key2": true, "key3": false,
  2. "key4": "hello world", "key5": 10.23122312}

or,

  1. {"a":null,"b":true,"c":false,"d\"":10,"e":"tru\"e", "f":[1,2]}
  1. BenchmarkVal2JsonMap5-8 3000000 461 ns/op 0 B/op 0 allocs/op
  2. BenchmarkVal2CborMap5-8 5000000 262 ns/op 0 B/op 0 allocs/op
  3. BenchmarkVal2CollMap-8 1000000 1321 ns/op 128 B/op 2 allocs/op
  4. BenchmarkJson2CborMap-8 2000000 838 ns/op 0 B/op 0 allocs/op
  5. BenchmarkCbor2JsonMap-8 2000000 1010 ns/op 0 B/op 0 allocs/op
  6. BenchmarkJson2CollMap-8 1000000 1825 ns/op 202 B/op 2 allocs/op
  7. BenchmarkColl2JsonMap-8 1000000 2028 ns/op 434 B/op 6 allocs/op
  8. BenchmarkCbor2CollMap-8 1000000 1692 ns/op 131 B/op 2 allocs/op
  9. BenchmarkColl2CborMap-8 1000000 1769 ns/op 440 B/op 6 allocs/op

Though converting to golang value incurs cost.

  1. BenchmarkJson2ValMap5 1000000 1621 ns/op 699 B/op 14 allocs/op
  2. BenchmarkCbor2ValMap5 1000000 1711 ns/op 496 B/op 18 allocs/op
  3. BenchmarkColl2ValMap 1000000 2235 ns/op 1440 B/op 33 allocs/op

Configuration

Configuration APIs are not re-entrant. For concurrent use of Gson, please
create a gson.Config{} per routine, or protect them with a mutex.

NumberKind

There are two ways to treat numbers in Gson, as integers (upto 64-bit width)
or floating-point (float64).

  • FloatNumber configuration can be used to tell Gson to treat all numbers
    as floating point. This has the convenience of having precision between
    discrete values, but suffers round of errors and inability to represent
    integer values greater than 2^53. DEFAULT choice.

  • SmartNumber will use int64, uint64 for representing integer values and
    use float64 when decimal precision is required. Choosing this option, Gson
    might incur a slight performance penalty.

Can be configured per configuration instance via SetNumberKind().

MaxKeys

Maximum number of keys allowed in a property object. This can be configured
globally via gson.MaxKeys or per configuration via SetMaxkeys().

Memory-pools

Gson uses memory pools:

  • Pool of Byte blocks for listing json-pointers.
  • Pool of Byte blocks for for encoding / decoding string types,
    and property keys.
  • Pool of set of strings for sorting keys within property.

Memory foot print of gson depends on the pools size, maximum length of
input string, number keys in input property-map.

Mempools can be configured globally via
gson.MaxStringLen, gson.MaxKeys, gson.MaxCollateLen, gson.MaxJsonpointerLen
package variables or per configuration instance via ResetPools().

CBOR ContainerEncoding

In CBOR both map and array (called container types) can be encoded as length
followed by items, or items followed by end-marker.

  • LengthPrefix to encode length of container type first, followed by
    each item in the container.
  • Stream to encode each item in the container as it appears in the input
    stream, finally ending it with a End-Stream-Marker. DEFAULT choice.

Can be configured per configuration instance via SetContainerEncoding

JSON Strict

  • If configured as true, encode / decode JSON strings operations will use
    Golang’s encoding / JSON package.

Can be configured per configuration instance via SetStrict.

JSON SpaceKind

How to interpret space characters ? There are two options:

  • AnsiSpace will be faster but does not support unicode.
  • UnicodeSpace will be slower but supports unicode. DEFAULT choice.

Can be configured per configuration instance via SetSpaceKind.

Collate ArrayLenPrefix

While sorting array, which is a container type, should collation algorithm
consider the arity of the array ? If ArrayLenPrefix prefix is configured as
true, arrays with more number of items will sort after arrays with lesser
number of items.

Can be configured per configuration instance via SortbyArrayLen.

Collate PropertyLenPrefix

While sorting property-map, which is a container type, should collation
algorithm consider number of entries in the map ? If PropertyLenPrefix is
configured as true, maps with more number of items will sort after maps
with lesser number of items.

Can be configured per configuration instance via SortbyPropertyLen.

JSON-Pointer JsonpointerLength

Maximum length a JSON-pointer string can take. Can be configured globally
via MaxJsonpointerLen or per configuration instance via SetJptrlen.

NOTE: JSON pointers are composed of path segments, there is an upper limit
to the number of path-segments a JSON pointer can have. If your configuration
exceeds that limit, try increasing the JsonpointerLength.

Transforms

transforms

Value to CBOR

  • Golang types nil, true, false are encodable into CBOR
    format.
  • All Golang number types, including signed, unsigned, and
    floating-point variants, are encodable into CBOR format.
  • Type []byte is encoded as CBOR byte-string.
  • Type string is encoded as CBOR text.
  • Generic array is interpreted as Golang []interface{} and
    encoded as CBOR array.
    • With LengthPrefix option for ContainerEncoding, arrays and
      maps are encoded with its length.
    • With Stream option, arrays and maps are encoded using
      Indefinite and Breakstop encoding.
  • Generic property is interpreted as golang [][2]interface{}
    and encoded as CBOR array of 2-element array, where the first item
    is key represented as string and second item is any valid JSON
    value.
  • Before encoding map[string]interface{} type, use
    GolangMap2cborMap() function to transform them to
    [][2]interface{}.
  • Following golang data types are encoded using CBOR-tags,
    • Type time.Time encoded with tag-0.
    • Type Epoch type supplied by CBOR package, encoded
      with tag-1.
    • Type EpochMicro type supplied by CBOR package, encoded
      with tag-1.
    • Type math/big.Int positive numbers are encoded with tag-2, and
      negative numbers are encoded with tag-3.
    • Type DecimalFraction type supplied by CBOR package,
      encoded with tag-4.
    • Type BigFloat type supplied by CBOR package, encoded
      with tag-5.
    • Type CborTagBytes type supplied by CBOR package, encoded with
      tag-24.
    • Type regexp.Regexp encoded with tag-35.
    • Type CborTagPrefix type supplied by CBOR package, encoded
      with tag-55799.
  • All other types shall cause a panic.

Value to collate

  • Types nil, true, false, float64, int64, int,
    string, []byte, []interface{}, map[string]interface{}
    are supported for collation.
  • All JSON numbers are collated as arbitrary sized floating point numbers.
  • Array-length (if configured) and property-length (if configured) are
    collated as integer.

JSON to Value

  • Gson uses custom parser that must be faster than encoding/JSON.
  • Numbers can be interpreted as integer, or float64,
    • FloatNumber to interpret JSON number as 64-bit floating point.
    • SmartNumber to interpret JSON number as int64, or uint64, or float64.
  • Whitespace can be interpreted, based on configuration type SpaceKind.
    SpaceKind can be one of the following AnsiSpace or UnicodeSpace.
    • AnsiSpace that should be faster
    • UnicodeSpace supports unicode white-spaces as well.

JSON to collate

  • All number are collated as float.
  • If config.nk is FloatNumber, all numbers are interpreted as float64
    and collated as float64.
  • If config.nk is SmartNumber, all JSON numbers are collated as arbitrary
    sized floating point numbers.
  • Array-length (if configured) and property-length (if configured) are
    collated as integer.

JSON to CBOR

  • JSON Types null, true, false are encodable into CBOR format.
  • Types number are encoded based on configuration type NumberKind,
    which can be one of the following.
    • If config.nk is FloatNumber, all numbers are encoded as CBOR-float64.
    • If config.nk is SmartNumber, all JSON float64 numbers are encoded as
      CBOR-float64, and, all JSON positive integers are encoded as
      CBOR-uint64, and, all JSON negative integers are encoded as
      CBOR-int64.
  • Type string will be parsed and translated into UTF-8, and subsequently
    encoded as CBOR-text.
  • Type arrays can be encoded in Stream mode, using CBOR’s
    indefinite-length scheme, or in LengthPrefix mode.
  • Type properties can be encoded either using CBOR’s indefinite-length
    scheme (Stream), or using CBOR’s LengthPrefix.
  • Property-keys are always interpreted as string and encoded as
    UTF-8 CBOR-text.

CBOR to Value

  • Reverse of all value to CBOR encoding, described above, are
    supported.
  • Cannot decode float16 type and int64 > 9223372036854775807.
  • Indefinite byte-string chunks, text chunks shall be decoded outside
    this package using IsIndefinite*() and IsBreakstop() APIs.

CBOR to JSON

  • CBOR types nil, true, false are transformed back to equivalent
    JSON types.
  • Types float32 and float64 are transformed back to 32 bit
    JSON-float and 64 bit JSON-float respectively, in
    non-exponent format.
  • Type integer is transformed back to JSON-integer representation,
    and integers exceeding 9223372036854775807 are not supported.
  • Type array either with length prefix or with indefinite encoding
    are converted back to JSON array.
  • Type map either with length prefix or with indefinite encoding
    are converted back to JSON property.
  • Type bytes-strings are not supported or transformed to JSON.
  • Type CBOR-text with indefinite encoding are not supported.
  • Type Simple type float16 are not supported.

For transforming to and from binary-collation refer here

CBOR to Collate

  • CBOR Types null, true, false, float32, float64, integer,
    string, []byte (aka binary), array, object can be
    collated.
  • All number are collated as float.
  • If config.nk is FloatNumber, all numbers are interpreted as float64
    and collated as float64.
  • If config.nk is SmartNumber, all JSON numbers are collated as arbitrary
    sized floating point numbers.
  • Array-length (if configured) and property-length (if configured) are
    collated as integer.
  • Indefinite-length encoding for text and binary are not supported.
  • LengthPrefix and Stream encoding for array and maps are supported.

Collate to CBOR

  • Missing, null, true, false, floating-point, small-decimal,
    integer, string, []byte (aka binary), array, object types
    from its collated from can be converted back to CBOR.
  • Since all numbers are collated as float, it is converted back to text
    representation of float, in format: [+-]x.e[+-].
  • If config.nk is FloatNumber, all number are encoded as CBOR-float64.
  • If config.nk is SmartNumber, all numbers whose exponent is >= 15 is encoded
    as uint64 (if number is positive), or int64 (if number is negative).
    Others are encoded as CBOR-float64.

Collate to JSON

  • Since all numbers are collated as float, it is converted back to text
    representation of float, in format: [+-]x.e[+-].
  • If config.nk is FloatNumber, all number are encoded as JSON-float64.
  • If config.nk is SmartNumber, all numers whose exponent is >= 15 is encoded
    as uint64 (if number is positive), or int64 (if number is negative).
    Others are encoded as JSON-float64.

Collate to Value

  • Since all numbers are collated as float, it is converted back to text
    representation of float, in format: [+-]x.e[+-].
  • If config.nk is FloatNumber, all number are encoded as JSON-float64.
  • If config.nk is SmartNumber, all numers whose exponent is >= 15 is encoded
    as uint64 (if number is positive), or int64 (if number is negative).
    Others are treated as float64.

Articles

How to contribute

Issue Stats
Issue Stats

  • Pick an issue, or create an new issue. Provide adequate documentation for
    the issue.
  • Assign the issue or get it assigned.
  • Work on the code, once finished, raise a pull request.
  • Gson is written in golang, hence expected to follow the
    global guidelines for writing go programs.
  • If the changeset is more than few lines, please generate a
    report card.
  • As of now, branch master is the development branch.

Task list

  • Binary collation: transparently handle int64, uint64 and float64.
  • Support for json.Number.
  • UTF-8 collation of strings.
  • JSON-pointer.
    • JSON pointer for looking up within CBOR map.
    • JSON pointer for looking up within value-map.

Notes

  • Don’t change the tag number.
  • All supplied APIs will panic in case of error, applications can
    recover from panic, dump a stack trace along with input passed on to
    the API, and subsequently handle all such panics as a single valued
    error.
  • For now, maximum integer range shall be within int64.
  • Config instances, and its APIs, are neither re-entrant nor thread safe.

list of changes from github.com/prataprc/collatejson

  • Codec type is renamed to Config.
  • Caller should make sure that the o/p buffer passed to encoding
    and decoding APIs are adequately sized.
  • Name and signature of NewCodec() (now, NewDefaultConfig) has changed.
  • Configuration APIs,SortbyArrayLen, SortbyPropertyLen, UseMissing, NumberType
    all now return the config object back the caller - helps in call-chaining.
  • All APIs panic instead of returning an error.
  • Output buffer should have its len() == cap(), so that encoder and decoder
    can avoid append and instead use buffer index.