Write documentationv0.2.0

author: Luke Shumaker <lukeshu@lukeshu.com> 2023-01-25 21:05:17 -0700
committer: Luke Shumaker <lukeshu@lukeshu.com> 2023-01-26 00:45:27 -0700
commit: ffee5c8516f3f55f82ed5bb8f0a4f340d485fa92 (patch)
tree: 0c10526b1ea57b043230402e9378b341c6966965
parent: 4148776399cb7ea5e10c74dc465e4e1e682cb399 (diff)
8 files changed, 486 insertions, 47 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..c8e05ab
--- /dev/null
+++ b/README.md
@@ -0,0 +1,170 @@
+<!--
+Copyright (C) 2023  Luke Shumaker <lukeshu@lukeshu.com>
+
+SPDX-License-Identifier: GPL-2.0-or-later
+-->
+
+# lowmemjson
+
+`lowmemjson` is a mostly-compatible alternative to the standard
+library's [`encoding/json`][] that has dramatically lower memory
+requirements for large data structures.
+
+`lowmemjson` is not targeting extremely resource-constrained
+environments, but rather targets being able to efficiently stream
+gigabytes of JSON without requiring gigabytes of memory overhead.
+
+## Compatibility
+
+`encoding/json`'s APIs are designed around the idea that it can buffer
+the entire JSON document as a `[]byte`, and as intermediate steps it
+may have a fragment buffered multiple times while encoding; encoding a
+gigabyte of data may consume several gigabytes of memory.  In
+contrast, `lowmemjson`'s APIs are designed around streaming
+(`io.Writer` and `io.RuneScanner`), trying to have the memory overhead
+of encode and decode operations be as close to O(1) as possible.
+
+`lowmemjson` offers a high level of compatibility with the
+`encoding/json` APIs, but for best memory usage (avoiding storing
+large byte arrays inherent in `encoding/json`'s API), it is
+recommended to migrate to `lowmemjson`'s own APIs.
+
+### Callee API (objects to be encoded-to/decoded-from JSON)
+
+`lowmemjson` supports `encoding/json`'s `json:` struct field tags, as
+well as the `encoding/json.Marshaler` and `encoding/json.Unmarshaler`
+interfaces; you do not need to adjust your types to successfully
+migrate from `encoding/json` to `lowmemjson`.
+
+That is: Given types that decode as desired with `encoding/json`,
+those types should decode identically with `lowmemjson`.  Given types
+that encode as desired with `encoding/json`, those types should encode
+identically with `lowmemjson` (assuming an appropriately configured
+`ReEncoder` to match the whitespace-handling and special-character
+escaping; a `ReEncoder` with `Compact=true` and all other settings
+left as zero will match the behavior of `json.Marshal`).
+
+For better memory usage:
+ - Instead of implementing [`json.Marshaler`][], consider implementing
+   [`lowmemjson.Encodable`][] (or implementing both).
+ - Instead of implementing [`json.Unmarshaler`][], consider
+   implementing [`lowmemjson.Decodable`][] (or implementing both).
+
+### Caller API
+
+`lowmemjson` offers a [`lowmemjson/compat/json`][] package that is a
+(mostly) drop-in replacement for `encoding/json` (see the package's
+documentation for the small incompatibilities).
+
+For better memory usage, avoid using `lowmemjson/compat/json` and
+instead use `lowmemjson` directly:
+ - Instead of using <code>[json.Marshal][`json.Marshal`](val)</code>,
+   consider using
+   <code>[lowmemjson.NewEncoder][`lowmemjson.NewEncoder`](w).[Encode][`lowmemjson.Encoder.Encode`](val)</code>.
+ - Instead of using
+   <code>[json.Unmarshal][`json.Unmarshal`](dat, &val)</code>, consider
+   using
+   <code>[lowmemjson.NewDecoder][`lowmemjson.NewDecoder`](r).[DecodeThenEOF][`lowmemjson.Decoder.DecodeThenEOF`](&val)</code>.
+ - Instead of using [`json.Compact`][], [`json.HTMLEscape`][], or
+   [`json.Indent`][]; consider using a [`lowmemjson.ReEncoder`][].
+ - Instead of using [`json.Valid`][], consider using a
+   [`lowmemjson.ReEncoder`][] with `io.Discard` as the output.
+
+The error types returned from `lowmemjson` are different from the
+error types returned by `encoding/json`, but `lowmemjson/compat/json`
+translates them back to the types returned by `encoding/json`.
+
+## Overview
+
+### Caller API
+
+There are 3 main types that make up the caller API for producing and
+handling streams of JSON, and each of those types has some associated
+types that go with it:
+
+ 1. `type Decoder`
+    + `type DecodeArgumentError`
+    + `type DecodeError`
+      * `type DecodeReadError`
+      * `type DecodeSyntaxError`
+      * `type DecodeTypeError`
+
+ 2. `type Encoder`
+    + `type EncodeTypeError`
+    + `type EncodeValueError`
+    + `type EncodeMethodError`
+
+ 3. `type ReEncoder`
+    + `type ReEncodeSyntaxError`
+    + `type BackslashEscaper`
+      * `type BackslashEscapeMode`
+
+A `*Decoder` handles decoding a JSON stream into Go values; the most
+common use of it will be
+`lowmemjson.NewDecoder(r).DecodeThenEOF(&val)` or
+`lowmemjson.NewDecoder(bufio.NewReader(r)).DecodeThenEOF(&val)`.
+
+A `*ReEncoder` handles transforming a JSON stream; this is useful for
+prettifying, minifying, sanitizing, and/or validating JSON.  A
+`*ReEncoder` wraps an `io.Writer`, itself implementing `io.Writer`.
+The most common use of it will be something along the lines of
+
+```go
+out = &ReEncoder{
+	Out: out,
+	// settings here
+}
+```
+
+An `*Encoder` handles encoding Go values into a JSON stream.
+`*Encoder` doesn't take much care in to making its output nice; so it
+is usually desirable to have the output stream of an `*Encoder` be a `*ReEncoder`; the most
+common use of it will be
+
+```go
+lowmemjson.NewEncoder(&lowmemjson.ReEncoder{
+    Out: out,
+	// settings here
+}).Encode(val)
+```
+
+### Callee API
+
+For defining Go types with custom JSON representations, `lowmemjson`
+respects all of the `json:` struct field tags of `encoding/json`, as
+well as respecting the same "marshaler" and "unmarshaler" interfaces
+as `encoding/json`.  In addition to those interfaces, `lowmemjson`
+adds two of its own interfaces, and some helper functions to help with
+implementing those interfaces:
+
+ 1. `type Decodable`
+    + `func DecodeArray`
+    + `func DecodeObject`
+ 2. `type Encodable`
+
+These are streaming variants of the standard `json.Unmarshaler` and
+`json.Marshaler` interfaces.
+
+<!-- packages -->
+[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson
+[`lowmemjson/compat/json`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson/compat/json
+[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18
+
+<!-- encoding/json symbols -->
+[`json.Marshaler`]: https://pkg.go.dev/encoding/json@go1.18#Marshaler
+[`json.Unmarshaler`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshaler
+[`json.Marshal`]: https://pkg.go.dev/encoding/json@go1.18#Marshal
+[`json.Unmarshal`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal
+[`json.Compact`]: https://pkg.go.dev/encoding/json@go1.18#Compact
+[`json.HTMLEscape`]: https://pkg.go.dev/encoding/json@go1.18#HTMLEscape
+[`json.Indent`]: https://pkg.go.dev/encoding/json@go1.18#Indent
+[`json.Valid`]: https://pkg.go.dev/encoding/json@go1.18#Valid
+
+<!-- lowmemjson symbols -->
+[`lowmemjson.Encodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encodable
+[`lowmemjson.Decodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decodable
+[`lowmemjson.NewEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewEncoder
+[`lowmemjson.Encoder.Encode`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encoder.Encode
+[`lowmemjson.NewDecoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewDecoder
+[`lowmemjson.Decoder.DecodeThenEOF`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decoder.DecodeThenEOF
+[`lowmemjson.ReEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#ReEncoder
diff --git a/compat/json/README.md b/compat/json/README.md
new file mode 100644
index 0000000..ec8dbed
--- /dev/null
+++ b/compat/json/README.md
@@ -0,0 +1,60 @@
+<!--
+Copyright (C) 2023  Luke Shumaker <lukeshu@lukeshu.com>
+
+SPDX-License-Identifier: GPL-2.0-or-later
+-->
+
+# lowmemjson/compat/json
+
+`lowmemjson/compat/json` is a wrapper around [`lowmemjson`][] that is
+a (mostly) drop-in replacement for the standard library's
+[`encoding/json`][].
+
+This package does not bother to duplicate `encoding/json`'s
+documentation; you should instead refer to [`encoding/json`'s own
+documentation][`encoding/json`].
+
+## Incompatibilities
+
+### Tokens
+
+Because the `lowmemjson` parser is fundamentally different than the
+`encoding/json` parser and does not have any notion of tokens, the
+token API is not included in `lowmemjson/compat/json`:
+
+ - There is no [`Delim`][] type.
+ - There is no [`Token`][] type.
+ - There is no [`Decoder.Token`][] method.
+
+### Types
+
+When possible, `lowmemjson/compat/json` uses type aliases for the
+`encoding/json` types, but in several cases that is not possible
+(`Encoder`, `Decoder`, `SyntaxError`, `MarshalError`).  This means
+that while `lowmemjson/compat/json` is source-compatible with
+`encoding/json`, it may not interoperate with code that also uses
+`encoding/json` and relies on those type identities.
+
+The errors returned by the various functions *are* the same errors as
+returned by `encoding/json` (with the exception that `SyntaxError` and
+`MarshalError` are not type aliases).
+
+### Deprecations
+
+Types that are deprecated in `encoding/json` are not mimiced here:
+
+ - There is no [`InvalidUTF8Error`][] type, as it has been depricated
+   since Go 1.2.
+ - There is no [`UnmarshalFieldError`][] type, as it has been
+   depricated since Go 1.1.
+
+<!-- packages -->
+[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson
+[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18
+
+<!-- symbols -->
+[`Delim`]: https://pkg.go.dev/encoding/json@go1.18#Delim
+[`Token`]: https://pkg.go.dev/encoding/json@go1.18#Token
+[`Decoder.Token`]: https://pkg.go.dev/encoding/json@go1.18#Decoder.Token
+[`InvalidUTF8Error`]: https://pkg.go.dev/encoding/json@go1.18#InvalidUTF8Error
+[`UnmarshalFieldError`]: https://pkg.go.dev/encoding/json@go1.18#UnmarshalFieldError
diff --git a/decode.go b/decode.go
index 51c1ed5..f911ac3 100644
--- a/decode.go
+++ b/decode.go
@@ -1,6 +1,13 @@
 // Copyright (C) 2022-2023  Luke Shumaker <lukeshu@lukeshu.com>
 //
 // SPDX-License-Identifier: GPL-2.0-or-later
+//
+// Some doc comments are
+// copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// SPDX-License-Identifier: BSD-3-Clause
 
 package lowmemjson
 
@@ -19,6 +26,26 @@ import (
 	"git.lukeshu.com/go/lowmemjson/internal"
 )
 
+// Decodable is the interface implemented by types that can decode a
+// JSON representation of themselves.  Decodable is a
+// low-memory-overhead replacement for the json.Unmarshaler interface.
+//
+// The io.RuneScanner passed to DecodeJSON...
+//
+//   - ...will return ErrInvalidUnreadRune .UnreadRune if the last
+//     operation was not a successful .ReadRune() call.
+//
+//   - ...will return EOF at the end of the JSON value; it is not
+//     possible for DecodeJSON to read past the end of the value in to
+//     another value.
+//
+//   - ...if invalid JSON is encountered, will return the invalid rune
+//     with err!=nil.  Implementations are encouraged to simply
+//     `return err` if .ReadRune returns an error.
+//
+// DecodeJSON is expected to consume the entire scanner until io.EOF
+// or another is encountered; if it does not, then the parent Decode
+// call will return a *DecodeTypeError.
 type Decodable interface {
 	DecodeJSON(io.RuneScanner) error
 }
@@ -28,6 +55,26 @@ type decodeStackItem struct {
 	idx any
 }
 
+// A Decoder reads and decodes values from an input stream of JSON
+// elements.
+//
+// Decoder is analogous to, and has a similar API to the standard
+// library's encoding/json.Decoder.  Differences are:
+//
+//   - lowmemjson.NewDecoder takes an io.RuneScanner, while
+//     json.NewDecoder takes an io.Reader.
+//
+//   - lowmemjson.Decoder does not have a .Buffered() method, while
+//     json.Decoder does.
+//
+//   - lowmemjson.Decoder does not have a .Token() method, while
+//     json.Decoder does.
+//
+// If something more similar to a json.Decoder is desired,
+// lowmemjson/compat/json.NewDecoder takes an io.Reader (and turns it
+// into an io.RuneScanner by wrapping it in a bufio.Reader), and
+// lowmemjson/compat/json.Decoder has a .Buffered() method; though
+// lowmemjson/compat/json.Decoder also lacks the .Token() method.
 type Decoder struct {
 	io runeTypeScanner
 
@@ -42,6 +89,11 @@ type Decoder struct {
 
 const maxNestingDepth = 10000
 
+// NewDecoder returns a new Decoder that reads from r.
+//
+// NewDecoder is analogous to the standard library's
+// encoding/json.NewDecoder, but takes an io.RuneScanner rather than
+// an io.Reader.
 func NewDecoder(r io.RuneScanner) *Decoder {
 	return &Decoder{
 		io: &noWSRuneTypeScanner{
@@ -55,10 +107,35 @@ func NewDecoder(r io.RuneScanner) *Decoder {
 	}
 }
 
+// DisallowUnknownFields causes the Decoder to return an error when
+// the destination is a struct and the input contains object keys
+// which do not match any non-ignored, exported fields in the
+// destination.
+//
+// This is identical to the standard library's
+// encoding/json.Decoder.DisallowUnknownFields.
 func (dec *Decoder) DisallowUnknownFields() { dec.disallowUnknownFields = true }
-func (dec *Decoder) UseNumber()             { dec.useNumber = true }
-func (dec *Decoder) InputOffset() int64     { return dec.io.InputOffset() }
 
+// UseNumber causes the Decoder to unmarshal a number into an
+// interface{} as a Number instead of as a float64.
+//
+// This is identical to the standard library's
+// encoding/json.Decoder.UseNumber.
+func (dec *Decoder) UseNumber() { dec.useNumber = true }
+
+// InputOffset returns the input stream byte offset of the current
+// decoder position.  The offset gives the location of the rune that
+// will be returned from the next call to .ReadRune().
+//
+// This is identical to the standard library's
+// encoding/json.Decoder.InputOffset.
+func (dec *Decoder) InputOffset() int64 { return dec.io.InputOffset() }
+
+// More reports whether there is more to the stream of JSON elements,
+// or if the Decoder has reached EOF or an error.
+//
+// More is identical to the standard library's
+// encoding/json.Decoder.More.
 func (dec *Decoder) More() bool {
 	dec.io.Reset()
 	_, _, t, e := dec.io.ReadRuneType()
@@ -105,8 +182,10 @@ func (dec *Decoder) stackName() string {
 	return strings.Join(fields, ".")
 }
 
-// DecodeThenEOF is like decode, but emits an error if there is extra
-// data after the JSON.
+// DecodeThenEOF is like Decode, but emits an error if there is extra
+// data after the JSON.  A JSON document is specified to be a single
+// JSON element; repeated calls to Decoder.Decode will happily decode
+// a stream of multiple JSON elements.
 func (dec *Decoder) DecodeThenEOF(ptr any) (err error) {
 	if err := dec.Decode(ptr); err != nil {
 		return err
@@ -126,6 +205,16 @@ func (dec *Decoder) DecodeThenEOF(ptr any) (err error) {
 	return nil
 }
 
+// Decode reads the next JSON element from the Decoder's input stream
+// and stores it in the value pointed to by ptr.
+//
+// See the [documentation for encoding/json.Unmarshal] for details
+// about the conversion of JSON into a Go value; Decode behaves
+// identically to that, with the exception that in addition to the
+// json.Unmarshaler interface it also checks for the Decodable
+// interface.
+//
+// [documentation for encoding/json.Unmarshal]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal
 func (dec *Decoder) Decode(ptr any) (err error) {
 	ptrVal := reflect.ValueOf(ptr)
 	if ptrVal.Kind() != reflect.Pointer || ptrVal.IsNil() || !ptrVal.Elem().CanSet() {
@@ -721,7 +810,14 @@ func (dec *Decoder) decodeAny() any {
 	}
 }
 
-// DecodeObject is a helper function for implementing the Decoder interface.
+// DecodeObject is a helper function to ease implementing the
+// Decodable interface; allowing the lowmemjson package to handle
+// decoding the object syntax, while the Decodable only needs to
+// handle decoding the keys and values within the object.
+//
+// Outside of implementing Decodable.DecodeJSON methods, callers
+// should instead simply use NewDecoder(r).Decode(&val) rather than
+// attempting to call DecodeObject directly.
 func DecodeObject(r io.RuneScanner, decodeKey, decodeVal func(io.RuneScanner) error) (err error) {
 	defer func() {
 		if r := recover(); r != nil {
@@ -784,7 +880,14 @@ func (dec *Decoder) decodeObject(gTyp reflect.Type, decodeKey, decodeVal func())
 	}
 }
 
-// DecodeArray is a helper function for implementing the Decoder interface.
+// DecodeArray is a helper function to ease implementing the Decoder
+// interface; allowing the lowmemjson package to handle decoding the
+// array syntax, while the Decodable only needs to handle decoding
+// members within the array.
+//
+// Outside of implementing Decodable.DecodeJSON methods, callers
+// should instead simply use NewDecoder(r).Decode(&val) rather than
+// attempting to call DecodeArray directly.
 func DecodeArray(r io.RuneScanner, decodeMember func(r io.RuneScanner) error) (err error) {
 	defer func() {
 		if r := recover(); r != nil {
diff --git a/encode.go b/encode.go
index 6963e3c..d31f36e 100644
--- a/encode.go
+++ b/encode.go
@@ -21,6 +21,12 @@ import (
 	"unsafe"
 )
 
+// Encodable is the interface implemented by types that can encode
+// themselves to JSON.  Encodable is a low-memory-overhead replacement
+// for the json.Marshaler interface.
+//
+// The io.Writer passed to EncodeJSON returns an error if invalid JSON
+// is written to it.
 type Encodable interface {
 	EncodeJSON(w io.Writer) error
 }
@@ -41,6 +47,15 @@ func encodeWriteString(w io.Writer, str string) {
 	}
 }
 
+// An Encoder encodes and writes values to a stream of JSON elements.
+//
+// Encoder is analogous to, and has a similar API to the standar
+// library's encoding/json.Encoder.  Differences are that rather than
+// having .SetEscapeHTML and .SetIndent methods, the io.Writer passed
+// to it may be a *ReEncoder that has these settings (and more).  If
+// something more similar to a json.Encoder is desired,
+// lowmemjson/compat/json.Encoder offers those .SetEscapeHTML and
+// .SetIndent methods.
 type Encoder struct {
 	w                *ReEncoder
 	closeAfterEncode bool
@@ -65,6 +80,15 @@ func NewEncoder(w io.Writer) *Encoder {
 	}
 }
 
+// Encode encodes obj to JSON and writes that JSON to the Encoder's
+// output stream.
+//
+// See the [documentation for encoding/json.Marshal] for details about
+// the conversion Go values to JSON; Encode behaves identically to
+// that, with the exception that in addition to the json.Marshaler
+// interface it also checks for the Encodable interface.
+//
+// [documentation for encoding/json.Marshal]: https://pkg.go.dev/encoding/json@go1.18#Marshal
 func (enc *Encoder) Encode(obj any) (err error) {
 	defer func() {
 		if r := recover(); r != nil {
@@ -115,8 +139,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool
 		if err := obj.EncodeJSON(validator); err != nil {
 			panic(encodeError{&EncodeMethodError{
 				Type:       val.Type(),
-				Err:        err,
 				SourceFunc: "EncodeJSON",
+				Err:        err,
 			}})
 		}
 		if err := validator.Close(); err != nil && !errors.Is(err, iofs.ErrClosed) {
@@ -140,8 +164,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool
 		if err != nil {
 			panic(encodeError{&EncodeMethodError{
 				Type:       val.Type(),
-				Err:        err,
 				SourceFunc: "MarshalJSON",
+				Err:        err,
 			}})
 		}
 		// Use a sub-ReEncoder to check that it's a full element.
@@ -170,8 +194,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool
 		if err != nil {
 			panic(encodeError{&EncodeMethodError{
 				Type:       val.Type(),
-				Err:        err,
 				SourceFunc: "MarshalText",
+				Err:        err,
 			}})
 		}
 		encodeStringFromBytes(w, escaper, text)
diff --git a/errors.go b/errors.go
index 67fe6c9..5669d36 100644
--- a/errors.go
+++ b/errors.go
@@ -1,4 +1,4 @@
-// Copyright (C) 2022  Luke Shumaker <lukeshu@lukeshu.com>
+// Copyright (C) 2022-2023  Luke Shumaker <lukeshu@lukeshu.com>
 //
 // SPDX-License-Identifier: GPL-2.0-or-later
 
@@ -14,21 +14,23 @@ import (
 	"git.lukeshu.com/go/lowmemjson/internal"
 )
 
-var (
-	ErrInvalidUnreadRune = errors.New("lowmemjson: invalid use of UnreadRune")
-)
+// ErrInvalidUnreadRune is returned to Decodable.DecodeJSON(scanner)
+// implementations from scanner.UnreadRune() if the last operation was
+// not a successful .ReadRune() call.
+var ErrInvalidUnreadRune = errors.New("lowmemjson: invalid use of UnreadRune")
 
 // parser errors ///////////////////////////////////////////////////////////////////////////////////
 
-var (
-	ErrParserExceededMaxDepth = internal.ErrParserExceededMaxDepth
-)
+// ErrParserExceededMaxDepth is the base error that a
+// *DecodeSyntaxError wraps when the depth of the JSON document
+// exceeds 10000.
+var ErrParserExceededMaxDepth = internal.ErrParserExceededMaxDepth
 
 // low-level decode errors /////////////////////////////////////////////////////////////////////////
 // These will be wrapped in a *DecodeError.
 
-// A *DecodeReadError is returned from Decode if there is an I/O error
-// reading the input.
+// A *DecodeReadError is returned from Decode (wrapped in a
+// *DecodeError) if there is an I/O error reading the input.
 type DecodeReadError struct {
 	Err    error
 	Offset int64
@@ -39,8 +41,8 @@ func (e *DecodeReadError) Error() string {
 }
 func (e *DecodeReadError) Unwrap() error { return e.Err }
 
-// A *DecodeSyntaxError is returned from Decode if there is a syntax
-// error in the input.
+// A *DecodeSyntaxError is returned from Decode (wrapped in a
+// *DecodeError) if there is a syntax error in the input.
 type DecodeSyntaxError struct {
 	Err    error
 	Offset int64
@@ -51,8 +53,9 @@ func (e *DecodeSyntaxError) Error() string {
 }
 func (e *DecodeSyntaxError) Unwrap() error { return e.Err }
 
-// A *DecodeTypeError is returned from Decode if the JSON input is not
-// appropriate for the given Go type.
+// A *DecodeTypeError is returned from Decode (wrapped in a
+// *DecodeError) if the JSON input is not appropriate for the given Go
+// type.
 //
 // If a .DecodeJSON, .UnmarshalJSON, or .UnmashaleText method returns
 // an error, it is wrapped in a *DecodeTypeError.
@@ -69,7 +72,7 @@ func (e *DecodeTypeError) Error() string {
 	if e.JSONType != "" {
 		fmt.Fprintf(&buf, "JSON %s ", e.JSONType)
 	}
-	fmt.Fprintf(&buf, "at input byte %v in to Go %v", e.Offset, e.GoType)
+	fmt.Fprintf(&buf, "at input byte %v into Go %v", e.Offset, e.GoType)
 	if e.Err != nil {
 		fmt.Fprintf(&buf, ": %v", strings.TrimPrefix(e.Err.Error(), "json: "))
 	}
@@ -78,9 +81,10 @@ func (e *DecodeTypeError) Error() string {
 
 func (e *DecodeTypeError) Unwrap() error { return e.Err }
 
-var (
-	ErrDecodeNonEmptyInterface = errors.New("cannot decode in to non-empty interface")
-)
+// ErrDecodeNonEmptyInterface is the base error that a
+// *DecodeTypeError wraps when Decode is asked to unmarshal into an
+// `interface` type that has one or more methods.
+var ErrDecodeNonEmptyInterface = errors.New("cannot decode into non-empty interface")
 
 // high-level decode errors ////////////////////////////////////////////////////////////////////////
 
@@ -88,21 +92,26 @@ var (
 // not a non-nil pointer or is not settable.
 //
 // Alternatively, a *DecodeArgument error may be found inside of a
-// *DecodeTypeError if the type being decoded in to is not a type that
-// can be decoded in to (such as map with non-stringable type as
-// keys).
+// *DecodeTypeError if the type being decoded into is not a type that
+// can be decoded into (such as map with non-stringable type as keys).
 //
 //	type DecodeArgumentError struct {
 //	    Type reflect.Type
 //	}
 type DecodeArgumentError = json.InvalidUnmarshalError
 
+// A *DecodeError is returned from Decode for all errors except for
+// *DecodeArgumentError.
+//
+// A *DecodeError wraps *DecodeSyntaxError for malformed or illegal
+// input, *DecodeTypeError for Go type issues, or *DecodeReadError for
+// I/O errors.
 type DecodeError struct {
-	Field string
-	Err   error
+	Field string // Where in the JSON the error was, in the form "v[idx][idx][idx]".
+	Err   error  // What the error was.
 
-	FieldParent string // for compat
-	FieldName   string // for compat
+	FieldParent string // for compat; the same as encoding/json.UnmarshalTypeError.Struct
+	FieldName   string // for compat; the same as encoding/json.UnmarshalTypeError.Field
 }
 
 func (e *DecodeError) Error() string {
@@ -129,19 +138,18 @@ type EncodeTypeError = json.UnsupportedTypeError
 //	}
 type EncodeValueError = json.UnsupportedValueError
 
-// An *EncodeTypeError is returned by Encode when attempting to encode
-// an unsupported value type.
+// An *EncodeMethodError wraps an error that is returned from an
+// object's method when encoding that object to JSON.
 type EncodeMethodError struct {
-	Type       reflect.Type
-	Err        error
-	SourceFunc string
+	Type       reflect.Type // The Go type that the method is on
+	SourceFunc string       // The method: "EncodeJSON", "MarshalJSON", or "MarshalText"
+	Err        error        // The error that the method returned
 }
 
 func (e *EncodeMethodError) Error() string {
 	return fmt.Sprintf("json: error calling %v for type %v: %v",
 		e.SourceFunc, e.Type, strings.TrimPrefix(e.Err.Error(), "json: "))
 }
-
 func (e *EncodeMethodError) Unwrap() error { return e.Err }
 
 // reencode errors /////////////////////////////////////////////////////////////////////////////////
diff --git a/internal/parse.go b/internal/parse.go
index 12d7600..895c930 100644
--- a/internal/parse.go
+++ b/internal/parse.go
@@ -14,10 +14,13 @@ import (
 
 var ErrParserExceededMaxDepth = errors.New("exceeded max depth")
 
+// RuneType is the classification of a rune when parsing JSON input.
+// A Parser, rather than grouping runes into tokens and classifying
+// tokens, classifies runes directly.
 type RuneType uint8
 
 const (
-	RuneTypeError = RuneType(iota)
+	RuneTypeError RuneType = iota
 
 	RuneTypeSpace // whitespace
 
@@ -42,7 +45,7 @@ const (
 	RuneTypeStringEnd   // closing '"'
 
 	RuneTypeNumberIntNeg
-	RuneTypeNumberIntZero
+	RuneTypeNumberIntZero // leading zero only; non-leading zeros are IntDig, not IntZero
 	RuneTypeNumberIntDig
 	RuneTypeNumberFracDot
 	RuneTypeNumberFracDig
@@ -69,6 +72,7 @@ const (
 	RuneTypeEOF
 )
 
+// GoString implements fmt.GoStringer.
 func (t RuneType) GoString() string {
 	str, ok := map[RuneType]string{
 		RuneTypeError: "RuneTypeError",
@@ -128,6 +132,7 @@ func (t RuneType) GoString() string {
 	return fmt.Sprintf("RuneType(%d)", t)
 }
 
+// String implements fmt.Stringer.
 func (t RuneType) String() string {
 	str, ok := map[RuneType]string{
 		RuneTypeError: "x",
@@ -202,10 +207,14 @@ func (t RuneType) JSONType() string {
 	}[t]
 }
 
+// IsNumber returns whether the RuneType is one of the
+// RuneTypeNumberXXX values.
 func (t RuneType) IsNumber() bool {
 	return RuneTypeNumberIntNeg <= t && t <= RuneTypeNumberExpDig
 }
 
+// Parser is the low-level JSON parser that powers both *Decoder and
+// *ReEncoder.
 type Parser struct {
 	// Setting MaxError to a value greater than 0 causes
 	// HandleRune to return ErrParserExceededMaxDepth if
diff --git a/misc.go b/misc.go
index 4f8e55e..92757f4 100644
--- a/misc.go
+++ b/misc.go
@@ -44,25 +44,43 @@ func writeRune(w io.Writer, c rune) (int, error) {
 
 // JSON string encoding ////////////////////////////////////////////////////////
 
+// BackSlashEscapeMode identifies one of the three ways that a
+// character may be represented in a JSON string:
+//
+//   - literally (no backslash escaping)
+//
+//   - as a short "well-known" `\X` backslash sequence (where `X` is a
+//     single-character)
+//
+//   - as a long Unicode `\uXXXX` backslash sequence
 type BackslashEscapeMode uint8
 
 const (
-	BackslashEscapeNone = BackslashEscapeMode(iota)
+	BackslashEscapeNone BackslashEscapeMode = iota
 	BackslashEscapeShort
 	BackslashEscapeUnicode
 )
 
+// A BackslashEscaper controls how a ReEncoder emits a character in a
+// JSON string.  The `rune` argument is the character being
+// considered, and the `BackslashEscapeMode` argument is how it was
+// originally encoded in the input.
 type BackslashEscaper = func(rune, BackslashEscapeMode) BackslashEscapeMode
 
+// EscapePreserve is a BackslashEscaper that preserves the original
+// input escaping.
 func EscapePreserve(_ rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
 	return wasEscaped
 }
 
+// EscapeJSSafe is a BackslashEscaper that escapes strings such that
+// the JSON safe to embed in JS; it otherwise preserves the original
+// input escaping.
+//
+// JSON is notionally a JS subset, but that's not actually true; so
+// more conservative backslash-escaping is necessary to safely embed
+// it in JS.  http://timelessrepo.com/json-isnt-a-javascript-subset
 func EscapeJSSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
-	// JSON is notionally a JS subset, but that's not actually
-	// true.
-	//
-	// http://timelessrepo.com/json-isnt-a-javascript-subset
 	switch c {
 	case '\u2028', '\u2029':
 		return BackslashEscapeUnicode
@@ -71,6 +89,9 @@ func EscapeJSSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
 	}
 }
 
+// EscapeHTMLSafe is a BackslashEscaper that escapes strings such that
+// the JSON is safe to embed in HTML; it otherwise preserves the
+// original input escaping.
 func EscapeHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
 	switch c {
 	case '&', '<', '>':
@@ -80,6 +101,15 @@ func EscapeHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode
 	}
 }
 
+// EscapeDefault is a BackslashEscaper that mimics the default
+// behavior of encoding/json.
+//
+// It is like EscapeHTMLSafe, but also uses long Unicode `\uXXXX`
+// sequences for `\b`, `\f`, and the `\uFFFD` Unicode replacement
+// character.
+//
+// A ReEncoder uses EscapeDefault if a BackslashEscaper is not
+// specified.
 func EscapeDefault(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
 	switch c {
 	case '\b', '\f', utf8.RuneError:
@@ -89,6 +119,13 @@ func EscapeDefault(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
 	}
 }
 
+// EscapeDefault is a BackslashEscaper that mimics the default
+// behavior of an encoding/json.Encoder that has had
+// SetEscapeHTML(false) called on it.
+//
+// It is like EscapeJSSafe, but also uses long Unicode `\uXXXX`
+// sequences for `\b`, `\f`, and the `\uFFFD` Unicode replacement
+// character.
 func EscapeDefaultNonHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
 	switch c {
 	case '\b', '\f', utf8.RuneError:
diff --git a/reencode.go b/reencode.go
index 34c3851..b20a503 100644
--- a/reencode.go
+++ b/reencode.go
@@ -20,10 +20,21 @@ type speculation struct {
 	indentBuf  bytes.Buffer
 }
 
+// A ReEncoder takes a stream of JSON elements (by way of implementing
+// io.Writer and WriteRune), and re-encodes the JSON, writing it to
+// the .Out member.
+//
+// This is useful for prettifying, minifying, sanitizing, and/or
+// validating JSON.
+//
 // The memory use of a ReEncoder is O( (CompactIfUnder+1)^2 + depth).
 type ReEncoder struct {
+	// The output stream to write the re-encoded JSON to.
 	Out io.Writer
 
+	// A JSON document is specified to be a single JSON element;
+	// but it is often desirable to handle streams of multiple
+	// JSON elements.
 	AllowMultipleValues bool
 
 	// Whether to minify the JSON.
@@ -88,6 +99,14 @@ type ReEncoder struct {
 
 // public API //////////////////////////////////////////////////////////////////
 
+// Write implements io.Writer; it does what you'd expect, mostly.
+//
+// Rather than returning the number of bytes written to the output
+// stream, it returns the nubmer of bytes from p that it successfully
+// handled.  This distinction is because *ReEncoder transforms the
+// data written to it, and the number of bytes written may be wildly
+// different than the number of bytes handled; and that would break
+// virtually all users of io.Writer.
 func (enc *ReEncoder) Write(p []byte) (int, error) {
 	if len(p) == 0 {
 		return 0, nil
@@ -113,7 +132,7 @@ func (enc *ReEncoder) Write(p []byte) (int, error) {
 	return len(p), nil
 }
 
-// Close does what you'd expect, mostly.
+// Close implements io.Closer; it does what you'd expect, mostly.
 //
 // The *ReEncoder may continue to be written to with new JSON values
 // if enc.AllowMultipleValues is set.
@@ -144,6 +163,15 @@ func (enc *ReEncoder) Close() error {
 	return nil
 }
 
+// WriteRune write a single Unicode code point, returning the number
+// of bytes written to the output stream and any error.
+//
+// Even when there is no error, the number of bytes written may be
+// zero (for example, when the rune is whitespace and the ReEncoder is
+// minifying the JSON), or it may be substantially longer than one
+// code point's worth (for example, when `\uXXXX` escaping a character
+// in a string, or when outputing extra whitespace when the ReEncoder
+// is prettifying the JSON).
 func (enc *ReEncoder) WriteRune(c rune) (n int, err error) {
 	if enc.err != nil {
 		return 0, enc.err
author	Luke Shumaker <lukeshu@lukeshu.com>	2023-01-25 21:05:17 -0700
committer	Luke Shumaker <lukeshu@lukeshu.com>	2023-01-26 00:45:27 -0700
commit	ffee5c8516f3f55f82ed5bb8f0a4f340d485fa92 (patch)
tree	0c10526b1ea57b043230402e9378b341c6966965
parent	4148776399cb7ea5e10c74dc465e4e1e682cb399 (diff)