250 lines
8.7 KiB
Plaintext
250 lines
8.7 KiB
Plaintext
|
[/
|
||
|
Copyright (c) 2019 Vinnie Falco (vinnie.falco@gmail.com)
|
||
|
|
||
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
||
|
|
||
|
Official repository: https://github.com/cppalliance/json
|
||
|
]
|
||
|
|
||
|
[/-----------------------------------------------------------------------------]
|
||
|
|
||
|
[section Parsing]
|
||
|
|
||
|
Parsing is the process where a serialized JSON text is validated
|
||
|
and decomposed into elements. The library provides these
|
||
|
functions and types to assist with parsing:
|
||
|
|
||
|
[table Parsing Functions and Types
|
||
|
[ [Name] [Description ]]
|
||
|
[
|
||
|
[__basic_parser__]
|
||
|
[
|
||
|
A SAX push parser implementation which converts a
|
||
|
serialized JSON text into a series of member function calls
|
||
|
to a user provided handler. This allows custom behaviors
|
||
|
to be implemented for representing the document in memory.
|
||
|
|
||
|
]
|
||
|
][
|
||
|
[__parse_options__]
|
||
|
[
|
||
|
A structure used to select which extensions are
|
||
|
enabled during parsing.
|
||
|
]
|
||
|
][
|
||
|
[__parse__]
|
||
|
[
|
||
|
Parse a string containing a complete serialized JSON text, and
|
||
|
return a __value__.
|
||
|
]
|
||
|
][
|
||
|
[__parser__]
|
||
|
[
|
||
|
A stateful DOM parser object which may be used
|
||
|
to efficiently parse a series of JSON texts each
|
||
|
contained in a single contiguous character buffer,
|
||
|
returning each result as a __value__.
|
||
|
]
|
||
|
][
|
||
|
[__stream_parser__]
|
||
|
[
|
||
|
A stateful DOM parser object which may be used to
|
||
|
efficiently parse a series of JSON texts incrementally,
|
||
|
returning each result as a __value__.
|
||
|
]
|
||
|
][
|
||
|
[__value_stack__]
|
||
|
[
|
||
|
A low level building block used for efficiently building
|
||
|
a __value__. The parsers use this internally, and users
|
||
|
may use it to adapt foreign parsers to produce this
|
||
|
library's containers.
|
||
|
]
|
||
|
]]
|
||
|
|
||
|
The __parse__ function offers a simple interface for converting
|
||
|
a serialized JSON text to a __value__ in a single function call.
|
||
|
This overload uses exceptions to indicate errors:
|
||
|
|
||
|
[doc_parsing_1]
|
||
|
|
||
|
Alternatively, an __error_code__ can be used:
|
||
|
|
||
|
[doc_parsing_2]
|
||
|
|
||
|
Even when using error codes, exceptions thrown from the underlying
|
||
|
__memory_resource__ are still possible:
|
||
|
|
||
|
[doc_parsing_3]
|
||
|
|
||
|
The __value__ returned in the preceding examples use the
|
||
|
__default_memory_resource__. The following code uses a __monotonic_resource__,
|
||
|
which results in faster parsing. `jv` is marked `const` to prevent
|
||
|
subsequent modification, because containers using a monotonic
|
||
|
resource waste memory when mutated.
|
||
|
|
||
|
[doc_parsing_4]
|
||
|
|
||
|
[/-----------------------------------------------------------------------------]
|
||
|
|
||
|
[heading Non-Standard JSON]
|
||
|
|
||
|
Unless otherwise specified, the parser in this library is strict.
|
||
|
It recognizes only valid, standard JSON. The parser can be configured
|
||
|
to allow certain non-standard extensions by filling in a __parse_options__
|
||
|
structure and passing it by value. By default all extensions are disabled:
|
||
|
|
||
|
[doc_parsing_5]
|
||
|
|
||
|
When building with C++20 or later, the use of
|
||
|
[@https://en.cppreference.com/w/cpp/language/aggregate_initialization#Designated_initializers designated initializers]
|
||
|
with __parse_options__ is possible:
|
||
|
|
||
|
[doc_parsing_6]
|
||
|
|
||
|
[caution When enabling comment support take extra care not to drop whitespace
|
||
|
when reading the input. For example, `std::getline` removes the
|
||
|
endline characters from the string it produces.]
|
||
|
|
||
|
[/-----------------------------------------------------------------------------]
|
||
|
|
||
|
[heading Parser]
|
||
|
|
||
|
Instances of __parser__ and __stream_parser__ offer functionality beyond
|
||
|
what is available when using the __parse__ free functions:
|
||
|
|
||
|
* More control over memory
|
||
|
* Streaming API, parse input JSON incrementally
|
||
|
* Improved performance when parsing multiple JSON texts
|
||
|
* Ignore non-JSON content after the end of a JSON text
|
||
|
|
||
|
The parser implementation uses temporary storage space to accumulate
|
||
|
values during parsing. When using the __parse__ free functions, this
|
||
|
storage is allocated and freed in each call. However, by declaring
|
||
|
an instance of __parser__ or __stream_parser__, this temporary storage
|
||
|
can be reused when parsing more than one JSON text, reducing the total
|
||
|
number of dynamic memory allocations.
|
||
|
|
||
|
To use the __parser__, declare an instance. Then call
|
||
|
[link json.ref.boost__json__parser.write `write`]
|
||
|
once with the buffer containing representing the input JSON.
|
||
|
Finally, call
|
||
|
[link json.ref.boost__json__parser.release `release`]
|
||
|
to take ownership of the resulting __value__ upon success.
|
||
|
This example persists the parser instance in a class member
|
||
|
to reuse across calls:
|
||
|
|
||
|
[doc_parsing_7]
|
||
|
|
||
|
Sometimes a protocol may have a JSON text followed by data that is in
|
||
|
a different format or specification. The JSON portion can still be parsed
|
||
|
by using the function
|
||
|
[link json.ref.boost__json__parser.write_some `write_some`].
|
||
|
Upon success, the return value will indicate the number of characters
|
||
|
consumed from the input, which will exclude the non-JSON characters:
|
||
|
|
||
|
[doc_parsing_8]
|
||
|
|
||
|
The parser instance may be constructed with parse options which
|
||
|
allow some non-standard JSON extensions to be recognized:
|
||
|
|
||
|
[doc_parsing_9]
|
||
|
|
||
|
[heading Streaming Parser]
|
||
|
|
||
|
The __stream_parser__ implements a
|
||
|
[@https://en.wikipedia.org/wiki/Online_algorithm ['streaming algorithm]];
|
||
|
it allows incremental processing of large JSON inputs using one or more
|
||
|
contiguous character buffers. The entire input JSON does not need to be
|
||
|
loaded into memory at once. A network server can use the streaming
|
||
|
interface to process incoming JSON in fixed-size amounts, providing
|
||
|
these benefits:
|
||
|
|
||
|
* CPU consumption per I/O cycle is bounded
|
||
|
* Memory consumption per I/O cycle is bounded
|
||
|
* Jitter, unfairness, and latency is reduced
|
||
|
* Less total memory is required to process the full input
|
||
|
|
||
|
To use the __stream_parser__, declare an instance. Then call
|
||
|
[link json.ref.boost__json__stream_parser.write `write`]
|
||
|
zero or more times with successive buffers representing the input JSON.
|
||
|
When there are no more buffers, call
|
||
|
[link json.ref.boost__json__stream_parser.finish `finish`].
|
||
|
The function
|
||
|
[link json.ref.boost__json__stream_parser.done `done`]
|
||
|
returns `true` after a successful call to `write` or `finish`
|
||
|
if parsing is complete.
|
||
|
|
||
|
In the following example a JSON text is parsed from standard input a line
|
||
|
at a time. Error codes are used instead. The function
|
||
|
[link json.ref.boost__json__stream_parser.finish `finish`]
|
||
|
is used to indicate the end of the input:
|
||
|
|
||
|
[caution This example will break, if comments are enabled, because of
|
||
|
`std::getline` use (see the warning in
|
||
|
[link json.input_output.parsing.non_standard_json Non-Standard JSON]
|
||
|
section). ]
|
||
|
|
||
|
[doc_parsing_10]
|
||
|
|
||
|
We can complicate the example further by excracting ['several] JSON values from
|
||
|
the sequence of lines.
|
||
|
|
||
|
[doc_parsing_14]
|
||
|
|
||
|
[/-----------------------------------------------------------------------------]
|
||
|
|
||
|
[heading Controlling Memory]
|
||
|
|
||
|
After default construction, or after
|
||
|
[link json.ref.boost__json__stream_parser.reset `reset`]
|
||
|
is called with no arguments, the __value__ produced after a successful
|
||
|
parse operation uses the default memory resource. To use a different
|
||
|
memory resource, call `reset` with the resource to use. Here we use
|
||
|
a __monotonic_resource__, which is optimized for parsing but not
|
||
|
subsequent modification:
|
||
|
|
||
|
[doc_parsing_11]
|
||
|
|
||
|
To achieve performance and memory efficiency, the parser uses a
|
||
|
temporary storage area to hold intermediate results. This storage
|
||
|
is reused when parsing more than one JSON text, reducing the total
|
||
|
number of calls to allocate memory and thus improving performance.
|
||
|
Upon construction, the memory resource used to perform allocations
|
||
|
for this temporary storage area may be specified. Otherwise, the
|
||
|
default memory resource is used. In addition to a memory resource,
|
||
|
the parser can make use of a caller-owned buffer for temporary
|
||
|
storage. This can help avoid dynamic allocations for small inputs.
|
||
|
The following example uses a four kilobyte temporary buffer for
|
||
|
the parser, and falls back to the default memory resource if needed:
|
||
|
|
||
|
[doc_parsing_12]
|
||
|
|
||
|
[section Avoiding Dynamic Allocations]
|
||
|
|
||
|
Through careful specification of buffers and memory resources,
|
||
|
it is possible to eliminate all dynamic allocation completely when
|
||
|
parsing JSON, for the case where the entire JSON text is available in
|
||
|
a single character buffer, as shown here:
|
||
|
|
||
|
[doc_parsing_13]
|
||
|
|
||
|
[endsect]
|
||
|
|
||
|
[/-----------------------------------------------------------------------------]
|
||
|
|
||
|
[heading Custom Parsers]
|
||
|
|
||
|
Users who wish to implement custom parsing strategies may create
|
||
|
their own handler to use with an instance of __basic_parser__.
|
||
|
The handler implements the function signatures required by SAX
|
||
|
event interface. In this example we define the "null" parser,
|
||
|
which does nothing with the parsed results, to use in the
|
||
|
implementation of a function that determines if a JSON
|
||
|
text is valid.
|
||
|
|
||
|
[example_validate]
|
||
|
|
||
|
[endsect]
|