250 lines
8.7 KiB
Plaintext
Raw Normal View History

2025-01-12 20:40:48 +08:00
[/
Copyright (c) 2019 Vinnie Falco (vinnie.falco@gmail.com)
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
Official repository: https://github.com/cppalliance/json
]
[/-----------------------------------------------------------------------------]
[section Parsing]
Parsing is the process where a serialized JSON text is validated
and decomposed into elements. The library provides these
functions and types to assist with parsing:
[table Parsing Functions and Types
[ [Name] [Description ]]
[
[__basic_parser__]
[
A SAX push parser implementation which converts a
serialized JSON text into a series of member function calls
to a user provided handler. This allows custom behaviors
to be implemented for representing the document in memory.
]
][
[__parse_options__]
[
A structure used to select which extensions are
enabled during parsing.
]
][
[__parse__]
[
Parse a string containing a complete serialized JSON text, and
return a __value__.
]
][
[__parser__]
[
A stateful DOM parser object which may be used
to efficiently parse a series of JSON texts each
contained in a single contiguous character buffer,
returning each result as a __value__.
]
][
[__stream_parser__]
[
A stateful DOM parser object which may be used to
efficiently parse a series of JSON texts incrementally,
returning each result as a __value__.
]
][
[__value_stack__]
[
A low level building block used for efficiently building
a __value__. The parsers use this internally, and users
may use it to adapt foreign parsers to produce this
library's containers.
]
]]
The __parse__ function offers a simple interface for converting
a serialized JSON text to a __value__ in a single function call.
This overload uses exceptions to indicate errors:
[doc_parsing_1]
Alternatively, an __error_code__ can be used:
[doc_parsing_2]
Even when using error codes, exceptions thrown from the underlying
__memory_resource__ are still possible:
[doc_parsing_3]
The __value__ returned in the preceding examples use the
__default_memory_resource__. The following code uses a __monotonic_resource__,
which results in faster parsing. `jv` is marked `const` to prevent
subsequent modification, because containers using a monotonic
resource waste memory when mutated.
[doc_parsing_4]
[/-----------------------------------------------------------------------------]
[heading Non-Standard JSON]
Unless otherwise specified, the parser in this library is strict.
It recognizes only valid, standard JSON. The parser can be configured
to allow certain non-standard extensions by filling in a __parse_options__
structure and passing it by value. By default all extensions are disabled:
[doc_parsing_5]
When building with C++20 or later, the use of
[@https://en.cppreference.com/w/cpp/language/aggregate_initialization#Designated_initializers designated initializers]
with __parse_options__ is possible:
[doc_parsing_6]
[caution When enabling comment support take extra care not to drop whitespace
when reading the input. For example, `std::getline` removes the
endline characters from the string it produces.]
[/-----------------------------------------------------------------------------]
[heading Parser]
Instances of __parser__ and __stream_parser__ offer functionality beyond
what is available when using the __parse__ free functions:
* More control over memory
* Streaming API, parse input JSON incrementally
* Improved performance when parsing multiple JSON texts
* Ignore non-JSON content after the end of a JSON text
The parser implementation uses temporary storage space to accumulate
values during parsing. When using the __parse__ free functions, this
storage is allocated and freed in each call. However, by declaring
an instance of __parser__ or __stream_parser__, this temporary storage
can be reused when parsing more than one JSON text, reducing the total
number of dynamic memory allocations.
To use the __parser__, declare an instance. Then call
[link json.ref.boost__json__parser.write `write`]
once with the buffer containing representing the input JSON.
Finally, call
[link json.ref.boost__json__parser.release `release`]
to take ownership of the resulting __value__ upon success.
This example persists the parser instance in a class member
to reuse across calls:
[doc_parsing_7]
Sometimes a protocol may have a JSON text followed by data that is in
a different format or specification. The JSON portion can still be parsed
by using the function
[link json.ref.boost__json__parser.write_some `write_some`].
Upon success, the return value will indicate the number of characters
consumed from the input, which will exclude the non-JSON characters:
[doc_parsing_8]
The parser instance may be constructed with parse options which
allow some non-standard JSON extensions to be recognized:
[doc_parsing_9]
[heading Streaming Parser]
The __stream_parser__ implements a
[@https://en.wikipedia.org/wiki/Online_algorithm ['streaming algorithm]];
it allows incremental processing of large JSON inputs using one or more
contiguous character buffers. The entire input JSON does not need to be
loaded into memory at once. A network server can use the streaming
interface to process incoming JSON in fixed-size amounts, providing
these benefits:
* CPU consumption per I/O cycle is bounded
* Memory consumption per I/O cycle is bounded
* Jitter, unfairness, and latency is reduced
* Less total memory is required to process the full input
To use the __stream_parser__, declare an instance. Then call
[link json.ref.boost__json__stream_parser.write `write`]
zero or more times with successive buffers representing the input JSON.
When there are no more buffers, call
[link json.ref.boost__json__stream_parser.finish `finish`].
The function
[link json.ref.boost__json__stream_parser.done `done`]
returns `true` after a successful call to `write` or `finish`
if parsing is complete.
In the following example a JSON text is parsed from standard input a line
at a time. Error codes are used instead. The function
[link json.ref.boost__json__stream_parser.finish `finish`]
is used to indicate the end of the input:
[caution This example will break, if comments are enabled, because of
`std::getline` use (see the warning in
[link json.input_output.parsing.non_standard_json Non-Standard JSON]
section). ]
[doc_parsing_10]
We can complicate the example further by excracting ['several] JSON values from
the sequence of lines.
[doc_parsing_14]
[/-----------------------------------------------------------------------------]
[heading Controlling Memory]
After default construction, or after
[link json.ref.boost__json__stream_parser.reset `reset`]
is called with no arguments, the __value__ produced after a successful
parse operation uses the default memory resource. To use a different
memory resource, call `reset` with the resource to use. Here we use
a __monotonic_resource__, which is optimized for parsing but not
subsequent modification:
[doc_parsing_11]
To achieve performance and memory efficiency, the parser uses a
temporary storage area to hold intermediate results. This storage
is reused when parsing more than one JSON text, reducing the total
number of calls to allocate memory and thus improving performance.
Upon construction, the memory resource used to perform allocations
for this temporary storage area may be specified. Otherwise, the
default memory resource is used. In addition to a memory resource,
the parser can make use of a caller-owned buffer for temporary
storage. This can help avoid dynamic allocations for small inputs.
The following example uses a four kilobyte temporary buffer for
the parser, and falls back to the default memory resource if needed:
[doc_parsing_12]
[section Avoiding Dynamic Allocations]
Through careful specification of buffers and memory resources,
it is possible to eliminate all dynamic allocation completely when
parsing JSON, for the case where the entire JSON text is available in
a single character buffer, as shown here:
[doc_parsing_13]
[endsect]
[/-----------------------------------------------------------------------------]
[heading Custom Parsers]
Users who wish to implement custom parsing strategies may create
their own handler to use with an instance of __basic_parser__.
The handler implements the function signatures required by SAX
event interface. In this example we define the "null" parser,
which does nothing with the parsed results, to use in the
implementation of a function that determines if a JSON
text is valid.
[example_validate]
[endsect]