diff options
author | Luke Shumaker <lukeshu@lukeshu.com> | 2023-01-25 21:05:17 -0700 |
---|---|---|
committer | Luke Shumaker <lukeshu@lukeshu.com> | 2023-01-26 00:45:27 -0700 |
commit | ffee5c8516f3f55f82ed5bb8f0a4f340d485fa92 (patch) | |
tree | 0c10526b1ea57b043230402e9378b341c6966965 | |
parent | 4148776399cb7ea5e10c74dc465e4e1e682cb399 (diff) |
Write documentationv0.2.0
-rw-r--r-- | README.md | 170 | ||||
-rw-r--r-- | compat/json/README.md | 60 | ||||
-rw-r--r-- | decode.go | 115 | ||||
-rw-r--r-- | encode.go | 30 | ||||
-rw-r--r-- | errors.go | 68 | ||||
-rw-r--r-- | internal/parse.go | 13 | ||||
-rw-r--r-- | misc.go | 47 | ||||
-rw-r--r-- | reencode.go | 30 |
8 files changed, 486 insertions, 47 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..c8e05ab --- /dev/null +++ b/README.md @@ -0,0 +1,170 @@ +<!-- +Copyright (C) 2023 Luke Shumaker <lukeshu@lukeshu.com> + +SPDX-License-Identifier: GPL-2.0-or-later +--> + +# lowmemjson + +`lowmemjson` is a mostly-compatible alternative to the standard +library's [`encoding/json`][] that has dramatically lower memory +requirements for large data structures. + +`lowmemjson` is not targeting extremely resource-constrained +environments, but rather targets being able to efficiently stream +gigabytes of JSON without requiring gigabytes of memory overhead. + +## Compatibility + +`encoding/json`'s APIs are designed around the idea that it can buffer +the entire JSON document as a `[]byte`, and as intermediate steps it +may have a fragment buffered multiple times while encoding; encoding a +gigabyte of data may consume several gigabytes of memory. In +contrast, `lowmemjson`'s APIs are designed around streaming +(`io.Writer` and `io.RuneScanner`), trying to have the memory overhead +of encode and decode operations be as close to O(1) as possible. + +`lowmemjson` offers a high level of compatibility with the +`encoding/json` APIs, but for best memory usage (avoiding storing +large byte arrays inherent in `encoding/json`'s API), it is +recommended to migrate to `lowmemjson`'s own APIs. + +### Callee API (objects to be encoded-to/decoded-from JSON) + +`lowmemjson` supports `encoding/json`'s `json:` struct field tags, as +well as the `encoding/json.Marshaler` and `encoding/json.Unmarshaler` +interfaces; you do not need to adjust your types to successfully +migrate from `encoding/json` to `lowmemjson`. + +That is: Given types that decode as desired with `encoding/json`, +those types should decode identically with `lowmemjson`. Given types +that encode as desired with `encoding/json`, those types should encode +identically with `lowmemjson` (assuming an appropriately configured +`ReEncoder` to match the whitespace-handling and special-character +escaping; a `ReEncoder` with `Compact=true` and all other settings +left as zero will match the behavior of `json.Marshal`). + +For better memory usage: + - Instead of implementing [`json.Marshaler`][], consider implementing + [`lowmemjson.Encodable`][] (or implementing both). + - Instead of implementing [`json.Unmarshaler`][], consider + implementing [`lowmemjson.Decodable`][] (or implementing both). + +### Caller API + +`lowmemjson` offers a [`lowmemjson/compat/json`][] package that is a +(mostly) drop-in replacement for `encoding/json` (see the package's +documentation for the small incompatibilities). + +For better memory usage, avoid using `lowmemjson/compat/json` and +instead use `lowmemjson` directly: + - Instead of using <code>[json.Marshal][`json.Marshal`](val)</code>, + consider using + <code>[lowmemjson.NewEncoder][`lowmemjson.NewEncoder`](w).[Encode][`lowmemjson.Encoder.Encode`](val)</code>. + - Instead of using + <code>[json.Unmarshal][`json.Unmarshal`](dat, &val)</code>, consider + using + <code>[lowmemjson.NewDecoder][`lowmemjson.NewDecoder`](r).[DecodeThenEOF][`lowmemjson.Decoder.DecodeThenEOF`](&val)</code>. + - Instead of using [`json.Compact`][], [`json.HTMLEscape`][], or + [`json.Indent`][]; consider using a [`lowmemjson.ReEncoder`][]. + - Instead of using [`json.Valid`][], consider using a + [`lowmemjson.ReEncoder`][] with `io.Discard` as the output. + +The error types returned from `lowmemjson` are different from the +error types returned by `encoding/json`, but `lowmemjson/compat/json` +translates them back to the types returned by `encoding/json`. + +## Overview + +### Caller API + +There are 3 main types that make up the caller API for producing and +handling streams of JSON, and each of those types has some associated +types that go with it: + + 1. `type Decoder` + + `type DecodeArgumentError` + + `type DecodeError` + * `type DecodeReadError` + * `type DecodeSyntaxError` + * `type DecodeTypeError` + + 2. `type Encoder` + + `type EncodeTypeError` + + `type EncodeValueError` + + `type EncodeMethodError` + + 3. `type ReEncoder` + + `type ReEncodeSyntaxError` + + `type BackslashEscaper` + * `type BackslashEscapeMode` + +A `*Decoder` handles decoding a JSON stream into Go values; the most +common use of it will be +`lowmemjson.NewDecoder(r).DecodeThenEOF(&val)` or +`lowmemjson.NewDecoder(bufio.NewReader(r)).DecodeThenEOF(&val)`. + +A `*ReEncoder` handles transforming a JSON stream; this is useful for +prettifying, minifying, sanitizing, and/or validating JSON. A +`*ReEncoder` wraps an `io.Writer`, itself implementing `io.Writer`. +The most common use of it will be something along the lines of + +```go +out = &ReEncoder{ + Out: out, + // settings here +} +``` + +An `*Encoder` handles encoding Go values into a JSON stream. +`*Encoder` doesn't take much care in to making its output nice; so it +is usually desirable to have the output stream of an `*Encoder` be a `*ReEncoder`; the most +common use of it will be + +```go +lowmemjson.NewEncoder(&lowmemjson.ReEncoder{ + Out: out, + // settings here +}).Encode(val) +``` + +### Callee API + +For defining Go types with custom JSON representations, `lowmemjson` +respects all of the `json:` struct field tags of `encoding/json`, as +well as respecting the same "marshaler" and "unmarshaler" interfaces +as `encoding/json`. In addition to those interfaces, `lowmemjson` +adds two of its own interfaces, and some helper functions to help with +implementing those interfaces: + + 1. `type Decodable` + + `func DecodeArray` + + `func DecodeObject` + 2. `type Encodable` + +These are streaming variants of the standard `json.Unmarshaler` and +`json.Marshaler` interfaces. + +<!-- packages --> +[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson +[`lowmemjson/compat/json`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson/compat/json +[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18 + +<!-- encoding/json symbols --> +[`json.Marshaler`]: https://pkg.go.dev/encoding/json@go1.18#Marshaler +[`json.Unmarshaler`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshaler +[`json.Marshal`]: https://pkg.go.dev/encoding/json@go1.18#Marshal +[`json.Unmarshal`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal +[`json.Compact`]: https://pkg.go.dev/encoding/json@go1.18#Compact +[`json.HTMLEscape`]: https://pkg.go.dev/encoding/json@go1.18#HTMLEscape +[`json.Indent`]: https://pkg.go.dev/encoding/json@go1.18#Indent +[`json.Valid`]: https://pkg.go.dev/encoding/json@go1.18#Valid + +<!-- lowmemjson symbols --> +[`lowmemjson.Encodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encodable +[`lowmemjson.Decodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decodable +[`lowmemjson.NewEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewEncoder +[`lowmemjson.Encoder.Encode`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encoder.Encode +[`lowmemjson.NewDecoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewDecoder +[`lowmemjson.Decoder.DecodeThenEOF`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decoder.DecodeThenEOF +[`lowmemjson.ReEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#ReEncoder diff --git a/compat/json/README.md b/compat/json/README.md new file mode 100644 index 0000000..ec8dbed --- /dev/null +++ b/compat/json/README.md @@ -0,0 +1,60 @@ +<!-- +Copyright (C) 2023 Luke Shumaker <lukeshu@lukeshu.com> + +SPDX-License-Identifier: GPL-2.0-or-later +--> + +# lowmemjson/compat/json + +`lowmemjson/compat/json` is a wrapper around [`lowmemjson`][] that is +a (mostly) drop-in replacement for the standard library's +[`encoding/json`][]. + +This package does not bother to duplicate `encoding/json`'s +documentation; you should instead refer to [`encoding/json`'s own +documentation][`encoding/json`]. + +## Incompatibilities + +### Tokens + +Because the `lowmemjson` parser is fundamentally different than the +`encoding/json` parser and does not have any notion of tokens, the +token API is not included in `lowmemjson/compat/json`: + + - There is no [`Delim`][] type. + - There is no [`Token`][] type. + - There is no [`Decoder.Token`][] method. + +### Types + +When possible, `lowmemjson/compat/json` uses type aliases for the +`encoding/json` types, but in several cases that is not possible +(`Encoder`, `Decoder`, `SyntaxError`, `MarshalError`). This means +that while `lowmemjson/compat/json` is source-compatible with +`encoding/json`, it may not interoperate with code that also uses +`encoding/json` and relies on those type identities. + +The errors returned by the various functions *are* the same errors as +returned by `encoding/json` (with the exception that `SyntaxError` and +`MarshalError` are not type aliases). + +### Deprecations + +Types that are deprecated in `encoding/json` are not mimiced here: + + - There is no [`InvalidUTF8Error`][] type, as it has been depricated + since Go 1.2. + - There is no [`UnmarshalFieldError`][] type, as it has been + depricated since Go 1.1. + +<!-- packages --> +[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson +[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18 + +<!-- symbols --> +[`Delim`]: https://pkg.go.dev/encoding/json@go1.18#Delim +[`Token`]: https://pkg.go.dev/encoding/json@go1.18#Token +[`Decoder.Token`]: https://pkg.go.dev/encoding/json@go1.18#Decoder.Token +[`InvalidUTF8Error`]: https://pkg.go.dev/encoding/json@go1.18#InvalidUTF8Error +[`UnmarshalFieldError`]: https://pkg.go.dev/encoding/json@go1.18#UnmarshalFieldError @@ -1,6 +1,13 @@ // Copyright (C) 2022-2023 Luke Shumaker <lukeshu@lukeshu.com> // // SPDX-License-Identifier: GPL-2.0-or-later +// +// Some doc comments are +// copyright 2010 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. +// +// SPDX-License-Identifier: BSD-3-Clause package lowmemjson @@ -19,6 +26,26 @@ import ( "git.lukeshu.com/go/lowmemjson/internal" ) +// Decodable is the interface implemented by types that can decode a +// JSON representation of themselves. Decodable is a +// low-memory-overhead replacement for the json.Unmarshaler interface. +// +// The io.RuneScanner passed to DecodeJSON... +// +// - ...will return ErrInvalidUnreadRune .UnreadRune if the last +// operation was not a successful .ReadRune() call. +// +// - ...will return EOF at the end of the JSON value; it is not +// possible for DecodeJSON to read past the end of the value in to +// another value. +// +// - ...if invalid JSON is encountered, will return the invalid rune +// with err!=nil. Implementations are encouraged to simply +// `return err` if .ReadRune returns an error. +// +// DecodeJSON is expected to consume the entire scanner until io.EOF +// or another is encountered; if it does not, then the parent Decode +// call will return a *DecodeTypeError. type Decodable interface { DecodeJSON(io.RuneScanner) error } @@ -28,6 +55,26 @@ type decodeStackItem struct { idx any } +// A Decoder reads and decodes values from an input stream of JSON +// elements. +// +// Decoder is analogous to, and has a similar API to the standard +// library's encoding/json.Decoder. Differences are: +// +// - lowmemjson.NewDecoder takes an io.RuneScanner, while +// json.NewDecoder takes an io.Reader. +// +// - lowmemjson.Decoder does not have a .Buffered() method, while +// json.Decoder does. +// +// - lowmemjson.Decoder does not have a .Token() method, while +// json.Decoder does. +// +// If something more similar to a json.Decoder is desired, +// lowmemjson/compat/json.NewDecoder takes an io.Reader (and turns it +// into an io.RuneScanner by wrapping it in a bufio.Reader), and +// lowmemjson/compat/json.Decoder has a .Buffered() method; though +// lowmemjson/compat/json.Decoder also lacks the .Token() method. type Decoder struct { io runeTypeScanner @@ -42,6 +89,11 @@ type Decoder struct { const maxNestingDepth = 10000 +// NewDecoder returns a new Decoder that reads from r. +// +// NewDecoder is analogous to the standard library's +// encoding/json.NewDecoder, but takes an io.RuneScanner rather than +// an io.Reader. func NewDecoder(r io.RuneScanner) *Decoder { return &Decoder{ io: &noWSRuneTypeScanner{ @@ -55,10 +107,35 @@ func NewDecoder(r io.RuneScanner) *Decoder { } } +// DisallowUnknownFields causes the Decoder to return an error when +// the destination is a struct and the input contains object keys +// which do not match any non-ignored, exported fields in the +// destination. +// +// This is identical to the standard library's +// encoding/json.Decoder.DisallowUnknownFields. func (dec *Decoder) DisallowUnknownFields() { dec.disallowUnknownFields = true } -func (dec *Decoder) UseNumber() { dec.useNumber = true } -func (dec *Decoder) InputOffset() int64 { return dec.io.InputOffset() } +// UseNumber causes the Decoder to unmarshal a number into an +// interface{} as a Number instead of as a float64. +// +// This is identical to the standard library's +// encoding/json.Decoder.UseNumber. +func (dec *Decoder) UseNumber() { dec.useNumber = true } + +// InputOffset returns the input stream byte offset of the current +// decoder position. The offset gives the location of the rune that +// will be returned from the next call to .ReadRune(). +// +// This is identical to the standard library's +// encoding/json.Decoder.InputOffset. +func (dec *Decoder) InputOffset() int64 { return dec.io.InputOffset() } + +// More reports whether there is more to the stream of JSON elements, +// or if the Decoder has reached EOF or an error. +// +// More is identical to the standard library's +// encoding/json.Decoder.More. func (dec *Decoder) More() bool { dec.io.Reset() _, _, t, e := dec.io.ReadRuneType() @@ -105,8 +182,10 @@ func (dec *Decoder) stackName() string { return strings.Join(fields, ".") } -// DecodeThenEOF is like decode, but emits an error if there is extra -// data after the JSON. +// DecodeThenEOF is like Decode, but emits an error if there is extra +// data after the JSON. A JSON document is specified to be a single +// JSON element; repeated calls to Decoder.Decode will happily decode +// a stream of multiple JSON elements. func (dec *Decoder) DecodeThenEOF(ptr any) (err error) { if err := dec.Decode(ptr); err != nil { return err @@ -126,6 +205,16 @@ func (dec *Decoder) DecodeThenEOF(ptr any) (err error) { return nil } +// Decode reads the next JSON element from the Decoder's input stream +// and stores it in the value pointed to by ptr. +// +// See the [documentation for encoding/json.Unmarshal] for details +// about the conversion of JSON into a Go value; Decode behaves +// identically to that, with the exception that in addition to the +// json.Unmarshaler interface it also checks for the Decodable +// interface. +// +// [documentation for encoding/json.Unmarshal]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal func (dec *Decoder) Decode(ptr any) (err error) { ptrVal := reflect.ValueOf(ptr) if ptrVal.Kind() != reflect.Pointer || ptrVal.IsNil() || !ptrVal.Elem().CanSet() { @@ -721,7 +810,14 @@ func (dec *Decoder) decodeAny() any { } } -// DecodeObject is a helper function for implementing the Decoder interface. +// DecodeObject is a helper function to ease implementing the +// Decodable interface; allowing the lowmemjson package to handle +// decoding the object syntax, while the Decodable only needs to +// handle decoding the keys and values within the object. +// +// Outside of implementing Decodable.DecodeJSON methods, callers +// should instead simply use NewDecoder(r).Decode(&val) rather than +// attempting to call DecodeObject directly. func DecodeObject(r io.RuneScanner, decodeKey, decodeVal func(io.RuneScanner) error) (err error) { defer func() { if r := recover(); r != nil { @@ -784,7 +880,14 @@ func (dec *Decoder) decodeObject(gTyp reflect.Type, decodeKey, decodeVal func()) } } -// DecodeArray is a helper function for implementing the Decoder interface. +// DecodeArray is a helper function to ease implementing the Decoder +// interface; allowing the lowmemjson package to handle decoding the +// array syntax, while the Decodable only needs to handle decoding +// members within the array. +// +// Outside of implementing Decodable.DecodeJSON methods, callers +// should instead simply use NewDecoder(r).Decode(&val) rather than +// attempting to call DecodeArray directly. func DecodeArray(r io.RuneScanner, decodeMember func(r io.RuneScanner) error) (err error) { defer func() { if r := recover(); r != nil { @@ -21,6 +21,12 @@ import ( "unsafe" ) +// Encodable is the interface implemented by types that can encode +// themselves to JSON. Encodable is a low-memory-overhead replacement +// for the json.Marshaler interface. +// +// The io.Writer passed to EncodeJSON returns an error if invalid JSON +// is written to it. type Encodable interface { EncodeJSON(w io.Writer) error } @@ -41,6 +47,15 @@ func encodeWriteString(w io.Writer, str string) { } } +// An Encoder encodes and writes values to a stream of JSON elements. +// +// Encoder is analogous to, and has a similar API to the standar +// library's encoding/json.Encoder. Differences are that rather than +// having .SetEscapeHTML and .SetIndent methods, the io.Writer passed +// to it may be a *ReEncoder that has these settings (and more). If +// something more similar to a json.Encoder is desired, +// lowmemjson/compat/json.Encoder offers those .SetEscapeHTML and +// .SetIndent methods. type Encoder struct { w *ReEncoder closeAfterEncode bool @@ -65,6 +80,15 @@ func NewEncoder(w io.Writer) *Encoder { } } +// Encode encodes obj to JSON and writes that JSON to the Encoder's +// output stream. +// +// See the [documentation for encoding/json.Marshal] for details about +// the conversion Go values to JSON; Encode behaves identically to +// that, with the exception that in addition to the json.Marshaler +// interface it also checks for the Encodable interface. +// +// [documentation for encoding/json.Marshal]: https://pkg.go.dev/encoding/json@go1.18#Marshal func (enc *Encoder) Encode(obj any) (err error) { defer func() { if r := recover(); r != nil { @@ -115,8 +139,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool if err := obj.EncodeJSON(validator); err != nil { panic(encodeError{&EncodeMethodError{ Type: val.Type(), - Err: err, SourceFunc: "EncodeJSON", + Err: err, }}) } if err := validator.Close(); err != nil && !errors.Is(err, iofs.ErrClosed) { @@ -140,8 +164,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool if err != nil { panic(encodeError{&EncodeMethodError{ Type: val.Type(), - Err: err, SourceFunc: "MarshalJSON", + Err: err, }}) } // Use a sub-ReEncoder to check that it's a full element. @@ -170,8 +194,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool if err != nil { panic(encodeError{&EncodeMethodError{ Type: val.Type(), - Err: err, SourceFunc: "MarshalText", + Err: err, }}) } encodeStringFromBytes(w, escaper, text) @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Luke Shumaker <lukeshu@lukeshu.com> +// Copyright (C) 2022-2023 Luke Shumaker <lukeshu@lukeshu.com> // // SPDX-License-Identifier: GPL-2.0-or-later @@ -14,21 +14,23 @@ import ( "git.lukeshu.com/go/lowmemjson/internal" ) -var ( - ErrInvalidUnreadRune = errors.New("lowmemjson: invalid use of UnreadRune") -) +// ErrInvalidUnreadRune is returned to Decodable.DecodeJSON(scanner) +// implementations from scanner.UnreadRune() if the last operation was +// not a successful .ReadRune() call. +var ErrInvalidUnreadRune = errors.New("lowmemjson: invalid use of UnreadRune") // parser errors /////////////////////////////////////////////////////////////////////////////////// -var ( - ErrParserExceededMaxDepth = internal.ErrParserExceededMaxDepth -) +// ErrParserExceededMaxDepth is the base error that a +// *DecodeSyntaxError wraps when the depth of the JSON document +// exceeds 10000. +var ErrParserExceededMaxDepth = internal.ErrParserExceededMaxDepth // low-level decode errors ///////////////////////////////////////////////////////////////////////// // These will be wrapped in a *DecodeError. -// A *DecodeReadError is returned from Decode if there is an I/O error -// reading the input. +// A *DecodeReadError is returned from Decode (wrapped in a +// *DecodeError) if there is an I/O error reading the input. type DecodeReadError struct { Err error Offset int64 @@ -39,8 +41,8 @@ func (e *DecodeReadError) Error() string { } func (e *DecodeReadError) Unwrap() error { return e.Err } -// A *DecodeSyntaxError is returned from Decode if there is a syntax -// error in the input. +// A *DecodeSyntaxError is returned from Decode (wrapped in a +// *DecodeError) if there is a syntax error in the input. type DecodeSyntaxError struct { Err error Offset int64 @@ -51,8 +53,9 @@ func (e *DecodeSyntaxError) Error() string { } func (e *DecodeSyntaxError) Unwrap() error { return e.Err } -// A *DecodeTypeError is returned from Decode if the JSON input is not -// appropriate for the given Go type. +// A *DecodeTypeError is returned from Decode (wrapped in a +// *DecodeError) if the JSON input is not appropriate for the given Go +// type. // // If a .DecodeJSON, .UnmarshalJSON, or .UnmashaleText method returns // an error, it is wrapped in a *DecodeTypeError. @@ -69,7 +72,7 @@ func (e *DecodeTypeError) Error() string { if e.JSONType != "" { fmt.Fprintf(&buf, "JSON %s ", e.JSONType) } - fmt.Fprintf(&buf, "at input byte %v in to Go %v", e.Offset, e.GoType) + fmt.Fprintf(&buf, "at input byte %v into Go %v", e.Offset, e.GoType) if e.Err != nil { fmt.Fprintf(&buf, ": %v", strings.TrimPrefix(e.Err.Error(), "json: ")) } @@ -78,9 +81,10 @@ func (e *DecodeTypeError) Error() string { func (e *DecodeTypeError) Unwrap() error { return e.Err } -var ( - ErrDecodeNonEmptyInterface = errors.New("cannot decode in to non-empty interface") -) +// ErrDecodeNonEmptyInterface is the base error that a +// *DecodeTypeError wraps when Decode is asked to unmarshal into an +// `interface` type that has one or more methods. +var ErrDecodeNonEmptyInterface = errors.New("cannot decode into non-empty interface") // high-level decode errors //////////////////////////////////////////////////////////////////////// @@ -88,21 +92,26 @@ var ( // not a non-nil pointer or is not settable. // // Alternatively, a *DecodeArgument error may be found inside of a -// *DecodeTypeError if the type being decoded in to is not a type that -// can be decoded in to (such as map with non-stringable type as -// keys). +// *DecodeTypeError if the type being decoded into is not a type that +// can be decoded into (such as map with non-stringable type as keys). // // type DecodeArgumentError struct { // Type reflect.Type // } type DecodeArgumentError = json.InvalidUnmarshalError +// A *DecodeError is returned from Decode for all errors except for +// *DecodeArgumentError. +// +// A *DecodeError wraps *DecodeSyntaxError for malformed or illegal +// input, *DecodeTypeError for Go type issues, or *DecodeReadError for +// I/O errors. type DecodeError struct { - Field string - Err error + Field string // Where in the JSON the error was, in the form "v[idx][idx][idx]". + Err error // What the error was. - FieldParent string // for compat - FieldName string // for compat + FieldParent string // for compat; the same as encoding/json.UnmarshalTypeError.Struct + FieldName string // for compat; the same as encoding/json.UnmarshalTypeError.Field } func (e *DecodeError) Error() string { @@ -129,19 +138,18 @@ type EncodeTypeError = json.UnsupportedTypeError // } type EncodeValueError = json.UnsupportedValueError -// An *EncodeTypeError is returned by Encode when attempting to encode -// an unsupported value type. +// An *EncodeMethodError wraps an error that is returned from an +// object's method when encoding that object to JSON. type EncodeMethodError struct { - Type reflect.Type - Err error - SourceFunc string + Type reflect.Type // The Go type that the method is on + SourceFunc string // The method: "EncodeJSON", "MarshalJSON", or "MarshalText" + Err error // The error that the method returned } func (e *EncodeMethodError) Error() string { return fmt.Sprintf("json: error calling %v for type %v: %v", e.SourceFunc, e.Type, strings.TrimPrefix(e.Err.Error(), "json: ")) } - func (e *EncodeMethodError) Unwrap() error { return e.Err } // reencode errors ///////////////////////////////////////////////////////////////////////////////// diff --git a/internal/parse.go b/internal/parse.go index 12d7600..895c930 100644 --- a/internal/parse.go +++ b/internal/parse.go @@ -14,10 +14,13 @@ import ( var ErrParserExceededMaxDepth = errors.New("exceeded max depth") +// RuneType is the classification of a rune when parsing JSON input. +// A Parser, rather than grouping runes into tokens and classifying +// tokens, classifies runes directly. type RuneType uint8 const ( - RuneTypeError = RuneType(iota) + RuneTypeError RuneType = iota RuneTypeSpace // whitespace @@ -42,7 +45,7 @@ const ( RuneTypeStringEnd // closing '"' RuneTypeNumberIntNeg - RuneTypeNumberIntZero + RuneTypeNumberIntZero // leading zero only; non-leading zeros are IntDig, not IntZero RuneTypeNumberIntDig RuneTypeNumberFracDot RuneTypeNumberFracDig @@ -69,6 +72,7 @@ const ( RuneTypeEOF ) +// GoString implements fmt.GoStringer. func (t RuneType) GoString() string { str, ok := map[RuneType]string{ RuneTypeError: "RuneTypeError", @@ -128,6 +132,7 @@ func (t RuneType) GoString() string { return fmt.Sprintf("RuneType(%d)", t) } +// String implements fmt.Stringer. func (t RuneType) String() string { str, ok := map[RuneType]string{ RuneTypeError: "x", @@ -202,10 +207,14 @@ func (t RuneType) JSONType() string { }[t] } +// IsNumber returns whether the RuneType is one of the +// RuneTypeNumberXXX values. func (t RuneType) IsNumber() bool { return RuneTypeNumberIntNeg <= t && t <= RuneTypeNumberExpDig } +// Parser is the low-level JSON parser that powers both *Decoder and +// *ReEncoder. type Parser struct { // Setting MaxError to a value greater than 0 causes // HandleRune to return ErrParserExceededMaxDepth if @@ -44,25 +44,43 @@ func writeRune(w io.Writer, c rune) (int, error) { // JSON string encoding //////////////////////////////////////////////////////// +// BackSlashEscapeMode identifies one of the three ways that a +// character may be represented in a JSON string: +// +// - literally (no backslash escaping) +// +// - as a short "well-known" `\X` backslash sequence (where `X` is a +// single-character) +// +// - as a long Unicode `\uXXXX` backslash sequence type BackslashEscapeMode uint8 const ( - BackslashEscapeNone = BackslashEscapeMode(iota) + BackslashEscapeNone BackslashEscapeMode = iota BackslashEscapeShort BackslashEscapeUnicode ) +// A BackslashEscaper controls how a ReEncoder emits a character in a +// JSON string. The `rune` argument is the character being +// considered, and the `BackslashEscapeMode` argument is how it was +// originally encoded in the input. type BackslashEscaper = func(rune, BackslashEscapeMode) BackslashEscapeMode +// EscapePreserve is a BackslashEscaper that preserves the original +// input escaping. func EscapePreserve(_ rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { return wasEscaped } +// EscapeJSSafe is a BackslashEscaper that escapes strings such that +// the JSON safe to embed in JS; it otherwise preserves the original +// input escaping. +// +// JSON is notionally a JS subset, but that's not actually true; so +// more conservative backslash-escaping is necessary to safely embed +// it in JS. http://timelessrepo.com/json-isnt-a-javascript-subset func EscapeJSSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { - // JSON is notionally a JS subset, but that's not actually - // true. - // - // http://timelessrepo.com/json-isnt-a-javascript-subset switch c { case '\u2028', '\u2029': return BackslashEscapeUnicode @@ -71,6 +89,9 @@ func EscapeJSSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { } } +// EscapeHTMLSafe is a BackslashEscaper that escapes strings such that +// the JSON is safe to embed in HTML; it otherwise preserves the +// original input escaping. func EscapeHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { switch c { case '&', '<', '>': @@ -80,6 +101,15 @@ func EscapeHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode } } +// EscapeDefault is a BackslashEscaper that mimics the default +// behavior of encoding/json. +// +// It is like EscapeHTMLSafe, but also uses long Unicode `\uXXXX` +// sequences for `\b`, `\f`, and the `\uFFFD` Unicode replacement +// character. +// +// A ReEncoder uses EscapeDefault if a BackslashEscaper is not +// specified. func EscapeDefault(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { switch c { case '\b', '\f', utf8.RuneError: @@ -89,6 +119,13 @@ func EscapeDefault(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { } } +// EscapeDefault is a BackslashEscaper that mimics the default +// behavior of an encoding/json.Encoder that has had +// SetEscapeHTML(false) called on it. +// +// It is like EscapeJSSafe, but also uses long Unicode `\uXXXX` +// sequences for `\b`, `\f`, and the `\uFFFD` Unicode replacement +// character. func EscapeDefaultNonHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode { switch c { case '\b', '\f', utf8.RuneError: diff --git a/reencode.go b/reencode.go index 34c3851..b20a503 100644 --- a/reencode.go +++ b/reencode.go @@ -20,10 +20,21 @@ type speculation struct { indentBuf bytes.Buffer } +// A ReEncoder takes a stream of JSON elements (by way of implementing +// io.Writer and WriteRune), and re-encodes the JSON, writing it to +// the .Out member. +// +// This is useful for prettifying, minifying, sanitizing, and/or +// validating JSON. +// // The memory use of a ReEncoder is O( (CompactIfUnder+1)^2 + depth). type ReEncoder struct { + // The output stream to write the re-encoded JSON to. Out io.Writer + // A JSON document is specified to be a single JSON element; + // but it is often desirable to handle streams of multiple + // JSON elements. AllowMultipleValues bool // Whether to minify the JSON. @@ -88,6 +99,14 @@ type ReEncoder struct { // public API ////////////////////////////////////////////////////////////////// +// Write implements io.Writer; it does what you'd expect, mostly. +// +// Rather than returning the number of bytes written to the output +// stream, it returns the nubmer of bytes from p that it successfully +// handled. This distinction is because *ReEncoder transforms the +// data written to it, and the number of bytes written may be wildly +// different than the number of bytes handled; and that would break +// virtually all users of io.Writer. func (enc *ReEncoder) Write(p []byte) (int, error) { if len(p) == 0 { return 0, nil @@ -113,7 +132,7 @@ func (enc *ReEncoder) Write(p []byte) (int, error) { return len(p), nil } -// Close does what you'd expect, mostly. +// Close implements io.Closer; it does what you'd expect, mostly. // // The *ReEncoder may continue to be written to with new JSON values // if enc.AllowMultipleValues is set. @@ -144,6 +163,15 @@ func (enc *ReEncoder) Close() error { return nil } +// WriteRune write a single Unicode code point, returning the number +// of bytes written to the output stream and any error. +// +// Even when there is no error, the number of bytes written may be +// zero (for example, when the rune is whitespace and the ReEncoder is +// minifying the JSON), or it may be substantially longer than one +// code point's worth (for example, when `\uXXXX` escaping a character +// in a string, or when outputing extra whitespace when the ReEncoder +// is prettifying the JSON). func (enc *ReEncoder) WriteRune(c rune) (n int, err error) { if enc.err != nil { return 0, enc.err |