summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
* qemu-error: introduce {error|warn}_report_oncePeter Xu2018-08-271-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are many error_report()s that can be used in frequently called functions, especially on IO paths. That can be unideal in that malicious guest can try to trigger the error tons of time which might use up the log space on the host (e.g., libvirt can capture the stderr of QEMU and put it persistently onto disk). In VT-d emulation code, we have trace_vtd_error() tracer. AFAIU all those places can be replaced by something like error_report() but trace points are mostly used to avoid the DDOS attack that mentioned above. However using trace points mean that errors are not dumped if trace not enabled. It's not a big deal in most modern server managements since we have things like logrotate to maintain the logs and make sure the quota is expected. However it'll still be nice that we just provide another way to restrict message generations. In most cases, this kind of error_report()s will only provide valid information on the first message sent, and all the rest of similar messages will be mostly talking about the same thing. This patch introduces *_report_once() helpers to allow a message to be dumped only once during one QEMU process's life cycle. It will make sure: (1) it's on by deffault, so we can even get something without turning the trace on and reproducing, and (2) it won't be affected by DDOS attack. To implement it, I stole the printk_once() macro from Linux. CC: Eric Blake <eblake@redhat.com> CC: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815095328.32414-2-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Whitespace adjusted, comments improved] Signed-off-by: Markus Armbruster <armbru@redhat.com>
* Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20180823' into ↵Peter Maydell2018-08-252-6/+37
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging pull-seccomp-20180823 # gpg: Signature made Thu 23 Aug 2018 15:46:13 BST # gpg: using RSA key DF32E7C0F0FFF9A2 # gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <otubo@redhat.com>" # Primary key fingerprint: D67E 1B50 9374 86B4 0723 DBAB DF32 E7C0 F0FF F9A2 * remotes/otubo/tags/pull-seccomp-20180823: seccomp: set the seccomp filter to all threads configure: require libseccomp 2.2.0 seccomp: prefer SCMP_ACT_KILL_PROCESS if available seccomp: use SIGSYS signal instead of killing the thread Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * seccomp: set the seccomp filter to all threadsMarc-André Lureau2018-08-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using "-seccomp on", the seccomp policy is only applied to the main thread, the vcpu worker thread and other worker threads created after seccomp policy is applied; the seccomp policy is not applied to e.g. the RCU thread because it is created before the seccomp policy is applied and SECCOMP_FILTER_FLAG_TSYNC isn't used. This can be verified with for task in /proc/`pidof qemu`/task/*; do cat $task/status | grep Secc ; done Seccomp: 2 Seccomp: 0 Seccomp: 0 Seccomp: 2 Seccomp: 2 Seccomp: 2 Starting with libseccomp 2.2.0 and kernel >= 3.17, we can use seccomp_attr_set(ctx, > SCMP_FLTATR_CTL_TSYNC, 1) to update the policy on all threads. libseccomp requirement was bumped to 2.2.0 in previous patch. libseccomp should fail to set the filter if it can't honour SCMP_FLTATR_CTL_TSYNC (untested), and thus -sandbox will now fail on kernel < 3.17. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
| * configure: require libseccomp 2.2.0Marc-André Lureau2018-08-231-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch is going to require TSYNC, which is only available since libseccomp 2.2.0. libseccomp 2.2.0 was released February 12, 2015. According to repology, libseccomp version in different distros: RHEL-7: 2.3.1 Debian (Stretch): 2.3.1 OpenSUSE Leap 15: 2.3.2 Ubuntu (Xenial): 2.3.1 This will drop support for -sandbox on: Debian (Jessie): 2.1.1 (but 2.2.3 in backports) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
| * seccomp: prefer SCMP_ACT_KILL_PROCESS if availableMarc-André Lureau2018-08-231-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The upcoming libseccomp release should have SCMP_ACT_KILL_PROCESS action (https://github.com/seccomp/libseccomp/issues/96). SCMP_ACT_KILL_PROCESS is preferable to immediately terminate the offending process, rather than having the SIGSYS handler running. Use SECCOMP_GET_ACTION_AVAIL to check availability of kernel support, as libseccomp will fallback on SCMP_ACT_KILL otherwise, and we still prefer SCMP_ACT_TRAP. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
| * seccomp: use SIGSYS signal instead of killing the threadMarc-André Lureau2018-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The seccomp action SCMP_ACT_KILL results in immediate termination of the thread that made the bad system call. However, qemu being multi-threaded, it keeps running. There is no easy way for parent process / management layer (libvirt) to know about that situation. Instead, the default SIGSYS handler when invoked with SCMP_ACT_TRAP will terminate the program and core dump. This may not be the most secure solution, but probably better than just killing the offending thread. SCMP_ACT_KILL_PROCESS has been added in Linux 4.14 to improve the situation, which I propose to use by default if available in the next patch. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1594456 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
* | Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20180823.1' ↵Peter Maydell2018-08-253-3/+18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging VFIO fixes 2018-08-23 - Fix coverity reported issue with use of realpath (Alex Williamson) - Cleanup file descriptor in error path (Alex Williamson) - Fix postcopy use of new balloon inhibitor (Alex Williamson) # gpg: Signature made Thu 23 Aug 2018 17:46:41 BST # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" # gpg: aka "Alex Williamson <alex@shazbot.org>" # gpg: aka "Alex Williamson <alwillia@redhat.com>" # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-fixes-20180823.1: postcopy: Synchronize usage of the balloon inhibitor vfio/pci: Fix failure to close file descriptor on error vfio/pci: Handle subsystem realpath() returning NULL Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | postcopy: Synchronize usage of the balloon inhibitorAlex Williamson2018-08-231-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the qemu_balloon_inhibit() interface appears rather general purpose, postcopy uses it in a last-caller-wins approach with no guarantee of balanced inhibits and de-inhibits. Wrap postcopy's usage of the inhibitor to give it one vote overall, using the same last-caller-wins approach as previously implemented at the balloon level. Fixes: 01ccbec7bdf6 ("balloon: Allow multiple inhibit users") Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio/pci: Fix failure to close file descriptor on errorAlex Williamson2018-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | A new error path fails to close the device file descriptor when triggered by a ballooning incompatibility within the group. Fix it. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio/pci: Handle subsystem realpath() returning NULLAlex Williamson2018-08-231-1/+1
| |/ | | | | | | | | | | | | | | | | | | Fix error reported by Coverity where realpath can return NULL, resulting in a segfault in strcmp(). This should never happen given that we're working through regularly structured sysfs paths, but trivial enough to easily avoid. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | Merge remote-tracking branch 'remotes/armbru/tags/pull-qobject-2018-08-24' ↵Peter Maydell2018-08-2534-1329/+1472
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging QObject patches for 2018-08-24 # gpg: Signature made Fri 24 Aug 2018 20:28:53 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qobject-2018-08-24: (58 commits) json: Update references to RFC 7159 to RFC 8259 json: Support %% in JSON strings when interpolating json: Improve safety of qobject_from_jsonf_nofail() & friends json: Keep interpolation state in JSONParserContext tests/drive_del-test: Fix harmless JSON interpolation bug json: Clean up headers qobject: Drop superfluous includes of qemu-common.h json: Make JSONToken opaque outside json-parser.c json: Unbox tokens queue in JSONMessageParser json: Streamline json_message_process_token() json: Enforce token count and size limits more tightly qjson: Have qobject_from_json() & friends reject empty and blank json: Assert json_parser_parse() consumes all tokens on success json: Fix streamer not to ignore trailing unterminated structures json: Fix latent parser aborts at end of input qjson: Fix qobject_from_json() & friends for multiple values json: Improve names of lexer states related to numbers json: Replace %I64d, %I64u by %PRId64, %PRIu64 json: Leave rejecting invalid interpolation to parser json: Pass lexical errors and limit violations to callback ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | json: Update references to RFC 7159 to RFC 8259Markus Armbruster2018-08-243-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | RFC 8259 (December 2017) obsoletes RFC 7159 (March 2014). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-59-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
| * | json: Support %% in JSON strings when interpolatingMarkus Armbruster2018-08-242-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous commit makes JSON strings containing '%' awkward to express in templates: you'd have to mask the '%' with an Unicode escape \u0025. No template currently contains such JSON strings. Support the printf conversion specification %% in JSON strings as a convenience anyway, because it's trivially easy to do. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-58-armbru@redhat.com>
| * | json: Improve safety of qobject_from_jsonf_nofail() & friendsMarkus Armbruster2018-08-242-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser optionally supports interpolation. This is used to build QObjects by parsing string templates. The templates are C literals, so parse errors (such as invalid interpolation specifications) are actually programming errors. Consequently, the functions providing parsing with interpolation (qobject_from_jsonf_nofail(), qobject_from_vjsonf_nofail(), qdict_from_jsonf_nofail(), qdict_from_vjsonf_nofail()) pass &error_abort to the parser. However, there's another, more dangerous kind of programming error: since we use va_arg() to get the value to interpolate, behavior is undefined when the variable argument isn't consistent with the interpolation specification. The same problem exists with printf()-like functions, and the solution is to have the compiler check consistency. This is what GCC_FMT_ATTR() is about. To enable this type checking for interpolation as well, we carefully chose our interpolation specifications to match printf conversion specifications, and decorate functions parsing templates with GCC_FMT_ATTR(). Note that this only protects against undefined behavior due to type errors. It can't protect against use of invalid interpolation specifications that happen to be valid printf conversion specifications. However, there's still a gaping hole in the type checking: GCC recognizes '%' as start of printf conversion specification anywhere in the template, but the parser recognizes it only outside JSON strings. For instance, if someone were to pass a "{ '%s': %d }" template, GCC would require a char * and an int argument, but the parser would va_arg() only an int argument, resulting in undefined behavior. Avoid undefined behavior by catching the programming error at run time: have the parser recognize and reject '%' in JSON strings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-57-armbru@redhat.com>
| * | json: Keep interpolation state in JSONParserContextMarkus Armbruster2018-08-241-29/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recursive descent parser passes along a pointer to JSONParserContext. It additionally passes a pointer to interpolation state (a va_alist *) as needed to reach its consumer parse_interpolation(). Stuffing the latter pointer into JSONParserContext saves us the trouble of passing it along, so do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-56-armbru@redhat.com>
| * | tests/drive_del-test: Fix harmless JSON interpolation bugMarkus Armbruster2018-08-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test_after_failed_device_add() does this: response = qmp("{'execute': 'device_add'," " 'arguments': {" " 'driver': 'virtio-blk-%s'," " 'drive': 'drive0'" "}}", qvirtio_get_dev_type()); Wrong. An interpolation specification must be a JSON token, it doesn't work within JSON string tokens. The code above doesn't use the value of qvirtio_get_dev_type(), and sends arguments {"driver": "virtio-blk-%s", "drive": "drive0"}} The command fails because there is no driver named "virtio-blk-%". Harmless, since the test wants the command to fail. Screwed up in commit 2f84a92ec63. Fix the obvious way. The command now fails because the drive is empty, like it did before commit 2f84a92ec63. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-55-armbru@redhat.com>
| * | json: Clean up headersMarkus Armbruster2018-08-2410-76/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser has three public headers, json-lexer.h, json-parser.h, json-streamer.h. They all contain stuff that is of no interest outside qobject/json-*.c. Collect the public interface in include/qapi/qmp/json-parser.h, and everything else in qobject/json-parser-int.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-54-armbru@redhat.com>
| * | qobject: Drop superfluous includes of qemu-common.hMarkus Armbruster2018-08-249-9/+0
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-53-armbru@redhat.com>
| * | json: Make JSONToken opaque outside json-parser.cMarkus Armbruster2018-08-244-14/+24
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-52-armbru@redhat.com>
| * | json: Unbox tokens queue in JSONMessageParserMarkus Armbruster2018-08-243-21/+12
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-51-armbru@redhat.com>
| * | json: Streamline json_message_process_token()Markus Armbruster2018-08-241-8/+5
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-50-armbru@redhat.com>
| * | json: Enforce token count and size limits more tightlyMarkus Armbruster2018-08-241-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Token count and size limits exist to guard against excessive heap usage. We check them only after we created the token on the heap. That's assigning a cowboy to the barn to lasso the horse after it has bolted. Close the barn door instead: check before we create the token. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-49-armbru@redhat.com>
| * | qjson: Have qobject_from_json() & friends reject empty and blankMarkus Armbruster2018-08-244-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last case where qobject_from_json() & friends return null without setting an error is empty or blank input. Callers: * block.c's parse_json_protocol() reports "Could not parse the JSON options". It's marked as a work-around, because it also covered actual bugs, but they got fixed in the previous few commits. * qobject_input_visitor_new_str() reports "JSON parse error". Also marked as work-around. The recent fixes have made this unreachable, because it currently gets called only for input starting with '{'. * check-qjson.c's empty_input() and blank_input() demonstrate the behavior. * The other callers are not affected since they only pass input with exactly one JSON value or, in the case of negative tests, one error. Fail with "Expecting a JSON value" instead of returning null, and simplify callers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-48-armbru@redhat.com>
| * | json: Assert json_parser_parse() consumes all tokens on successMarkus Armbruster2018-08-241-0/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-47-armbru@redhat.com>
| * | json: Fix streamer not to ignore trailing unterminated structuresMarkus Armbruster2018-08-244-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | json_message_process_token() accumulates tokens until it got the sequence of tokens that comprise a single JSON value (it counts curly braces and square brackets to decide). It feeds those token sequences to json_parser_parse(). If a non-empty sequence of tokens remains at the end of the parse, it's silently ignored. check-qjson.c cases unterminated_array(), unterminated_array_comma(), unterminated_dict(), unterminated_dict_comma() demonstrate this bug. Fix as follows. Introduce a JSON_END_OF_INPUT token. When the streamer receives it, it feeds the accumulated tokens to json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-46-armbru@redhat.com>
| * | json: Fix latent parser aborts at end of inputMarkus Armbruster2018-08-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | json-parser.c carefully reports end of input like this: token = parser_context_pop_token(ctxt); if (token == NULL) { parse_error(ctxt, NULL, "premature EOI"); goto out; } Except parser_context_pop_token() can't return null, it fails its assertion instead. Same for parser_context_peek_token(). Broken in commit 65c0f1e9558, and faithfully preserved in commit 95385fe9ace. Only a latent bug, because the streamer throws away any input that could trigger it. Drop the assertions, so we can fix the streamer in the next commit. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-45-armbru@redhat.com>
| * | qjson: Fix qobject_from_json() & friends for multiple valuesMarkus Armbruster2018-08-242-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qobject_from_json() & friends use the consume_json() callback to receive either a value or an error from the parser. When they are fed a string that contains more than either one JSON value or one JSON syntax error, consume_json() gets called multiple times. When the last call receives a value, qobject_from_json() returns that value. Any other values are leaked. When any call receives an error, qobject_from_json() sets the first error received. Any other errors are thrown away. When values follow errors, qobject_from_json() returns both a value and sets an error. That's bad. Impact: * block.c's parse_json_protocol() ignores and leaks the value. It's used to to parse pseudo-filenames starting with "json:". The pseudo-filenames can come from the user or from image meta-data such as a QCOW2 image's backing file name. * vl.c's parse_display_qapi() ignores and leaks the error. It's used to parse the argument of command line option -display. * vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves it in @err. main() will then pass a pointer to a non-null Error * to net_init_clients(), which is forbidden. It can lead to assertion failure or other misbehavior. * check-qjson.c's multiple_values() demonstrates the badness. * The other callers are not affected since they only pass strings with exactly one JSON value or, in the case of negative tests, one error. The impact on the _nofail() functions is relatively harmless. They abort when any call receives an error. Else they return the last value, and leak the others, if any. Fix consume_json() as follows. On the first call, save value and error as before. On subsequent calls, if any, don't save them. If the first call saved a value, the next call, if any, replaces the value by an "Expecting at most one JSON value" error. Take care not to leak values or errors that aren't saved. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-44-armbru@redhat.com>
| * | json: Improve names of lexer states related to numbersMarkus Armbruster2018-08-241-17/+17
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-43-armbru@redhat.com>
| * | json: Replace %I64d, %I64u by %PRId64, %PRIu64Markus Armbruster2018-08-242-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support for %I64d got added in commit 2c0d4b36e7f "json: fix PRId64 on Win32". We had to hard-code I64d because we used the lexer's finite state machine to check interpolations. No more, so clean this up. Additional conversion specifications would be easy enough to implement when needed. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-42-armbru@redhat.com>
| * | json: Leave rejecting invalid interpolation to parserMarkus Armbruster2018-08-243-39/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both lexer and parser reject invalid interpolation specifications. The parser's check is useless. The lexer ends the token right after the first bad character. This tends to lead to suboptimal error reporting. For instance, input [ %04d ] produces the tokens JSON_LSQUARE [ JSON_ERROR %0 JSON_INTEGER 4 JSON_KEYWORD d JSON_RSQUARE ] The parser then yields an error, an object and two more errors: error: Invalid JSON syntax object: 4 error: JSON parse error, invalid keyword error: JSON parse error, expecting value Dumb down the lexer to accept [A-Za-z0-9]*. The parser's check is now used. Emit a proper error there. The lexer now produces JSON_LSQUARE [ JSON_INTERP %04d JSON_RSQUARE ] and the parser reports just JSON parse error, invalid interpolation '%04d' Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-41-armbru@redhat.com>
| * | json: Pass lexical errors and limit violations to callbackMarkus Armbruster2018-08-247-25/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callback to consume JSON values takes QObject *json, Error *err. If both are null, the callback is supposed to make up an error by itself. This sucks. qjson.c's consume_json() neglects to do so, which makes qobject_from_json() null instead of failing. I consider that a bug. The culprit is json_message_process_token(): it passes two null pointers when it runs into a lexical error or a limit violation. Fix it to pass a proper Error object then. Update the callbacks: * monitor.c's handle_qmp_command(): the code to make up an error is now dead, drop it. * qga/main.c's process_event(): lumps the "both null" case together with the "not a JSON object" case. The former is now gone. The error message "Invalid JSON syntax" is misleading for the latter. Improve it to "Input must be a JSON object". * qobject/qjson.c's consume_json(): no update; check-qjson demonstrates qobject_from_json() now sets an error on lexical errors, but still doesn't on some other errors. * tests/libqtest.c's qmp_response(): the Error object is now reliable, so use it to improve the error message. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-40-armbru@redhat.com>
| * | json: Treat unwanted interpolation as lexical errorMarkus Armbruster2018-08-245-19/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser optionally supports interpolation. The lexer recognizes interpolation tokens unconditionally. The parser rejects them when interpolation is disabled, in parse_interpolation(). However, it neglects to set an error then, which can make json_parser_parse() fail without setting an error. Move the check for unwanted interpolation from the parser's parse_interpolation() into the lexer's finite state machine. When interpolation is disabled, '%' is now handled like any other unexpected character. The next commit will improve how such lexical errors are handled. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-39-armbru@redhat.com>
| * | json: Rename token JSON_ESCAPE & friends to JSON_INTERPMarkus Armbruster2018-08-243-37/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser optionally supports interpolation. The code calls it "escape". Awkward, because it uses the same term for escape sequences within strings. The latter usage is consistent with RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format" and ISO C. Call the former "interpolation" instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-38-armbru@redhat.com>
| * | json: Don't create JSON_ERROR tokens that won't be usedMarkus Armbruster2018-08-241-4/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-37-armbru@redhat.com>
| * | json: Don't pass null @tokens to json_parser_parse()Markus Armbruster2018-08-242-17/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | json_parser_parse() normally returns the QObject on success. Except it returns null when its @tokens argument is null. Its only caller json_message_process_token() passes null @tokens when emitting a lexical error. The call is a rather opaque way to say json = NULL then. Simplify matters by lifting the assignment to json out of the emit path: initialize json to null, set it to the value of json_parser_parse() when there's no lexical error. Drop the special case from json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-36-armbru@redhat.com>
| * | json: Redesign the callback to consume JSON valuesMarkus Armbruster2018-08-2410-54/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The classical way to structure parser and lexer is to have the client call the parser to get an abstract syntax tree, the parser call the lexer to get the next token, and the lexer call some function to get input characters. Another way to structure them would be to have the client feed characters to the lexer, the lexer feed tokens to the parser, and the parser feed abstract syntax trees to some callback provided by the client. This way is more easily integrated into an event loop that dispatches input characters as they arrive. Our JSON parser is kind of between the two. The lexer feeds tokens to a "streamer" instead of a real parser. The streamer accumulates tokens until it got the sequence of tokens that comprise a single JSON value (it counts curly braces and square brackets to decide). It feeds those token sequences to a callback provided by the client. The callback passes each token sequence to the parser, and gets back an abstract syntax tree. I figure it was done that way to make a straightforward recursive descent parser possible. "Get next token" becomes "pop the first token off the token sequence". Drawback: we need to store a complete token sequence. Each token eats 13 + input characters + malloc overhead bytes. Observations: 1. This is not the only way to use recursive descent. If we replaced "get next token" by a coroutine yield, we could do without a streamer. 2. The lexer reports errors by passing a JSON_ERROR token to the streamer. This communicates the offending input characters and their location, but no more. 3. The streamer reports errors by passing a null token sequence to the callback. The (already poor) lexical error information is thrown away. 4. Having the callback receive a token sequence duplicates the code to convert token sequence to abstract syntax tree in every callback. 5. Known bug: the streamer silently drops incomplete token sequences. This commit rectifies 4. by lifting the call of the parser from the callbacks into the streamer. Later commits will address 3. and 5. The lifting removes a bug from qjson.c's parse_json(): it passed a pointer to a non-null Error * in certain cases, as demonstrated by check-qjson.c. json_parser_parse() is now unused. It's a stupid wrapper around json_parser_parse_err(). Drop it, and rename json_parser_parse_err() to json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-35-armbru@redhat.com>
| * | json: Have lexer call streamer directlyMarkus Armbruster2018-08-244-18/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | json_lexer_init() takes the function to process a token as an argument. It's always json_message_process_token(). Makes the code harder to understand for no actual gain. Drop the indirection. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-34-armbru@redhat.com>
| * | json-parser: simplify and avoid JSONParserContext allocationMarc-André Lureau2018-08-241-32/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | parser_context_new/free() are only used from json_parser_parse(). We can fold the code there and avoid an allocation altogether. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-33-armbru@redhat.com>
| * | json: remove useless return value from lexer/parserMarc-André Lureau2018-08-244-23/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lexer always returns 0 when char feeding. Furthermore, none of the caller care about the return value. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-32-armbru@redhat.com>
| * | check-qjson: Fix and enable utf8_string()'s disabled partMarkus Armbruster2018-08-241-8/+3
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-31-armbru@redhat.com>
| * | json: Fix \uXXXX for surrogate pairsMarkus Armbruster2018-08-242-23/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser treats each half of a surrogate pair as unpaired surrogate. Fix it to recognize surrogate pairs. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-30-armbru@redhat.com>
| * | json: Reject invalid \uXXXX, fix \u0000Markus Armbruster2018-08-242-59/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser translates invalid \uXXXX to garbage instead of rejecting it, and swallows \u0000. Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8(). Valid surrogate pairs are now differently broken: they're rejected instead of translated to garbage. The next commit will fix them. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-29-armbru@redhat.com>
| * | json: Simplify parse_string()Markus Armbruster2018-08-241-23/+19
| | | | | | | | | | | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-28-armbru@redhat.com>
| * | json: Leave rejecting invalid escape sequences to parserMarkus Armbruster2018-08-242-91/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both lexer and parser reject invalid escape sequences in strings. The parser's check is useless. The lexer ends the token right after the first non-well-formed byte. This tends to lead to suboptimal error reporting. For instance, input {"abc\@ijk": 1} produces the tokens JSON_LCURLY { JSON_ERROR "abc\@ JSON_KEYWORD ijk JSON_ERROR ": 1}\n The parser then reports three errors Invalid JSON syntax JSON parse error, invalid keyword 'ijk' Invalid JSON syntax before it recovers at the newline. Drop the lexer's escape sequence checking, and make it accept the same characters after backslash it accepts elsewhere in strings. It now produces JSON_LCURLY { JSON_STRING "abc\@ijk" JSON_COLON : JSON_INTEGER 1 JSON_RCURLY and the parser reports just JSON parse error, invalid escape sequence in string While there, fix parse_string()'s inaccurate function comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-27-armbru@redhat.com>
| * | json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")Markus Armbruster2018-08-243-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the JSON grammer doesn't accept U+0000 anywhere, this merely exchanges one kind of parse error for another. It's purely for consistency with qobject_to_json(), which accepts \xC0\x80 (see commit e2ec3f97680). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-26-armbru@redhat.com>
| * | json: Leave rejecting invalid UTF-8 to parserMarkus Armbruster2018-08-241-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both the lexer and the parser (attempt to) validate UTF-8 in JSON strings. The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1, \xF5..\xFF. This rejects some, but not all invalid UTF-8. It also rejects ASCII control characters \x00..\x1F, in accordance with RFC 8259 (see recent commit "json: Reject unescaped control characters"). When the lexer rejects, it ends the token right after the first bad byte. Good when the bad byte is a newline. Not so good when it's something like an overlong sequence in the middle of a string. For instance, input {"abc\xC0\xAFijk": 1}\n produces the tokens JSON_LCURLY { JSON_ERROR "abc\xC0 JSON_ERROR \xAF JSON_KEYWORD ijk JSON_ERROR ": 1}\n The parser then reports four errors Invalid JSON syntax Invalid JSON syntax JSON parse error, invalid keyword 'ijk' Invalid JSON syntax before it recovers at the newline. The commit before previous made the parser reject invalid UTF-8 sequences. Since then, anything the lexer rejects, the parser would reject as well. Thus, the lexer's rejecting is unnecessary for correctness, and harmful for error reporting. However, we want to keep rejecting ASCII control characters in the lexer, because that produces the behavior we want for unclosed strings. We also need to keep rejecting \xFF in the lexer, because we documented that as a way to reset the JSON parser (docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which means we can't change how we recover from this error now. I wish we hadn't done that. I think we should treat \xFE the same as \xFF. Change the lexer to accept \xC0..\xC1 and \xF5..\xFD. It now rejects only \x00..\x1F and \xFE..\xFF. Error reporting for invalid UTF-8 in strings is much improved, except for \xFE and \xFF. For the example above, the lexer now produces JSON_LCURLY { JSON_STRING "abc\xC0\xAFijk" JSON_COLON : JSON_INTEGER 1 JSON_RCURLY and the parser reports just JSON parse error, invalid UTF-8 sequence in string Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-25-armbru@redhat.com>
| * | json: Report first rather than last parse errorMarkus Armbruster2018-08-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quiz time! When a parser reports multiple errors, but the user gets to see just one, which one is (on average) the least useful one? Yes, you're right, it's the last one! You're clearly familiar with compilers. Which one does QEMU report? Right again, the last one! You're clearly familiar with QEMU. Reproducer: feeding {"abc\xC2ijk": 1}\n to QMP produces {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}} Report the first error instead. The reproducer now produces {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}} Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-24-armbru@redhat.com>
| * | json: Reject invalid UTF-8 sequencesMarkus Armbruster2018-08-244-105/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1, \xF5..\xFF in the lexer. That's insufficient; there's plenty of invalid UTF-8 not containing these bytes, as demonstrated by check-qjson: * Malformed sequences - Unexpected continuation bytes - Missing continuation bytes after start bytes other than \xC0..\xC1, \xF5..\xFD. * Overlong sequences with start bytes other than \xC0..\xC1, \xF5..\xFD. * Invalid code points Fixing this in the lexer would be bothersome. Fixing it in the parser is straightforward, so do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-23-armbru@redhat.com>
| * | check-qjson: Document we expect invalid UTF-8 to be rejectedMarkus Armbruster2018-08-241-80/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The JSON parser rejects some invalid sequences, but accepts others without correcting the problem. We should either reject all invalid sequences, or minimize overlong sequences and replace all other invalid sequences by a suitable replacement character. A common choice for replacement is U+FFFD. I'm going to implement the former. Update the comments in utf8_string() to expect this. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-22-armbru@redhat.com>
| * | json: Tighten and simplify qstring_from_escaped_str()'s loopMarkus Armbruster2018-08-241-23/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify loop control, and assert that the string ends with the appropriate quote (the lexer ensures it does). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-21-armbru@redhat.com>