summaryrefslogtreecommitdiff
path: root/README.md
blob: 108f3dc5f7a3a81b74cbbfc806727a353ff3c0dc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
<!--
Copyright (C) 2023  Luke Shumaker <lukeshu@lukeshu.com>

SPDX-License-Identifier: GPL-2.0-or-later
-->

# lowmemjson

`lowmemjson` is a mostly-compatible alternative to the standard
library's [`encoding/json`][] that has dramatically lower memory
requirements for large data structures.

`lowmemjson` is not targeting extremely resource-constrained
environments, but rather targets being able to efficiently stream
gigabytes of JSON without requiring gigabytes of memory overhead.

## Compatibility

`encoding/json`'s APIs are designed around the idea that it can buffer
the entire JSON document as a `[]byte`, and as intermediate steps it
may have a fragment buffered multiple times while encoding; encoding a
gigabyte of data may consume several gigabytes of memory.  In
contrast, `lowmemjson`'s APIs are designed around streaming
(`io.Writer` and `io.RuneScanner`), trying to have the memory overhead
of encode and decode operations be as close to O(1) as possible.

`lowmemjson` offers a high level of compatibility with the
`encoding/json` APIs, but for best memory usage (avoiding storing
large byte arrays inherent in `encoding/json`'s API), it is
recommended to migrate to `lowmemjson`'s own APIs.

### Callee API (objects to be encoded-to/decoded-from JSON)

`lowmemjson` supports `encoding/json`'s `json:` struct field tags, as
well as the `encoding/json.Marshaler` and `encoding/json.Unmarshaler`
interfaces; you do not need to adjust your types to successfully
migrate from `encoding/json` to `lowmemjson`.

That is: Given types that decode as desired with `encoding/json`,
those types should decode identically with `lowmemjson`.  Given types
that encode as desired with `encoding/json`, those types should encode
identically with `lowmemjson` (assuming an appropriately configured
`ReEncoder` to match the whitespace-handling and special-character
escaping; a `ReEncoderConfig` with `Compact=true` and all other
settings left as zero will match the behavior of `json.Marshal`).

For better memory usage:
 - Instead of implementing [`json.Marshaler`][], consider implementing
   [`lowmemjson.Encodable`][] (or implementing both).
 - Instead of implementing [`json.Unmarshaler`][], consider
   implementing [`lowmemjson.Decodable`][] (or implementing both).

### Caller API

`lowmemjson` offers a [`lowmemjson/compat/json`][] package that is a
(mostly) drop-in replacement for `encoding/json` (see the package's
documentation for the small incompatibilities).

For better memory usage, avoid using `lowmemjson/compat/json` and
instead use `lowmemjson` directly:
 - Instead of using <code>[json.Marshal][`json.Marshal`](val)</code>,
   consider using
   <code>[lowmemjson.NewEncoder][`lowmemjson.NewEncoder`](w).[Encode][`lowmemjson.Encoder.Encode`](val)</code>.
 - Instead of using
   <code>[json.Unmarshal][`json.Unmarshal`](dat, &val)</code>, consider
   using
   <code>[lowmemjson.NewDecoder][`lowmemjson.NewDecoder`](r).[DecodeThenEOF][`lowmemjson.Decoder.DecodeThenEOF`](&val)</code>.
 - Instead of using [`json.Compact`][], [`json.HTMLEscape`][], or
   [`json.Indent`][]; consider using a [`lowmemjson.ReEncoder`][].
 - Instead of using [`json.Valid`][], consider using a
   [`lowmemjson.ReEncoder`][] with `io.Discard` as the output.

The error types returned from `lowmemjson` are different from the
error types returned by `encoding/json`, but `lowmemjson/compat/json`
translates them back to the types returned by `encoding/json`.

## Overview

### Caller API

There are 3 main types that make up the caller API for producing and
handling streams of JSON, and each of those types has some associated
types that go with it:

 1. `type Decoder`
    + `type DecodeArgumentError`
    + `type DecodeError`
      * `type DecodeReadError`
      * `type DecodeSyntaxError`
      * `type DecodeTypeError`

 2. `type Encoder`
    + `type EncodeTypeError`
    + `type EncodeValueError`
    + `type EncodeMethodError`

 3. `type ReEncoder`
    + `type ReEncoderConfig`
    + `type ReEncodeSyntaxError`
    + `type BackslashEscaper`
      * `type BackslashEscapeMode`

A `*Decoder` handles decoding a JSON stream into Go values; the most
common use of it will be
`lowmemjson.NewDecoder(r).DecodeThenEOF(&val)` or
`lowmemjson.NewDecoder(bufio.NewReader(r)).DecodeThenEOF(&val)`.

A `*ReEncoder` handles transforming a JSON stream; this is useful for
prettifying, minifying, sanitizing, and/or validating JSON.  A
`*ReEncoder` wraps an `io.Writer`, itself implementing `io.Writer`.
The most common use of it will be something along the lines of
`out = lowmemjson.NewReEncoder(out, lowmemjson.ReEncoderConfig{…})`.

An `*Encoder` handles encoding Go values into a JSON stream.
`*Encoder` doesn't take much care in to making its output nice; so it
is usually desirable to have the output stream of an `*Encoder` be a `*ReEncoder`; the most
common use of it will be
`lowmemjson.NewEncoder(lowmemjson.NewReEncoder(out, lowmemjson.ReEncoderConfig{…})).Encode(val)`.

`*Encoder` and `*ReEncoder` both tend to make many small writes; if
writes are syscalls, you may want to wrap their output in a
`bufio.Writer`.

### Callee API

For defining Go types with custom JSON representations, `lowmemjson`
respects all of the `json:` struct field tags of `encoding/json`, as
well as respecting the same "marshaler" and "unmarshaler" interfaces
as `encoding/json`.  In addition to those interfaces, `lowmemjson`
adds two of its own interfaces, and some helper functions to help with
implementing those interfaces:

 1. `type Decodable`
    + `func DecodeArray`
    + `func DecodeObject`
 2. `type Encodable`

These are streaming variants of the standard `json.Unmarshaler` and
`json.Marshaler` interfaces.

<!-- packages -->
[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson
[`lowmemjson/compat/json`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson/compat/json
[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18

<!-- encoding/json symbols -->
[`json.Marshaler`]: https://pkg.go.dev/encoding/json@go1.18#Marshaler
[`json.Unmarshaler`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshaler
[`json.Marshal`]: https://pkg.go.dev/encoding/json@go1.18#Marshal
[`json.Unmarshal`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal
[`json.Compact`]: https://pkg.go.dev/encoding/json@go1.18#Compact
[`json.HTMLEscape`]: https://pkg.go.dev/encoding/json@go1.18#HTMLEscape
[`json.Indent`]: https://pkg.go.dev/encoding/json@go1.18#Indent
[`json.Valid`]: https://pkg.go.dev/encoding/json@go1.18#Valid

<!-- lowmemjson symbols -->
[`lowmemjson.Encodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encodable
[`lowmemjson.Decodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decodable
[`lowmemjson.NewEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewEncoder
[`lowmemjson.Encoder.Encode`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encoder.Encode
[`lowmemjson.NewDecoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewDecoder
[`lowmemjson.Decoder.DecodeThenEOF`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decoder.DecodeThenEOF
[`lowmemjson.ReEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#ReEncoder