SPARC target fixes by gco · Pull Request #4 · artyom-tarasenko/qemu

gco · 2017-05-12T01:42:12Z

qemu's SPARC target doesn't run on big-endian hosts because it did not use the fprs register consistently
qemu was treating every warm reset as an eXternally-Initiated Reset, reverted
single-stepping was completely broken and then a bit broken in that single-stepping over certain instructions would skip over two instructions instead

Some of the code was treating it as 32-bit, some as 64-bit. When qemu is run on a little-endian host that sort of works, the relevant 32-bits just happens to be in the right place. On a big-endian host that places the relevant bits in the wrong place, the register contains zeros.

The debugger would not be entered for the instruction following the wrhpr %pstate, but rather the instruction after _that_.

artyom-tarasenko · 2017-05-12T16:30:06Z

Nice work! While you are absolutely right about XIR and warm reset, I wonder how did you hit it? Have you built the firmware/hypervisor from the source? I'm asking because it would be nice to have the firmware binaries which can be distributed with qemu.

Here, like in your previous request, the master branch is the upstream master of the qemu project (updated irregularly, my bad). I can not merge the pull request into the master branch, but may create a staging branch and merge it there.

The normal workflow for the sun4v patches is:

the patches are submitted to qemu-devel mailing list
after the review they got into my staging branch
from the staging branch they are merged into the upstream qemu

This may look like a bit of an overhead, but we have really a lot of subsystems in qemu, and some of them are shared over multiple targets and the steps above help organizing the development process a lot.

Most likely your previous pull request will be routed not via my sun4v tree, but via the build-related trees of other maintainers.

If you plan to submit your patches to qemu-devel@ mailing list, please add a 'Signed-off-by' line, and use the scripts/checkpatch.pl to check that the patches are correctly formatted (I see, no problems with the formatting, but I haven't run the script).

Ah, one more thing for the other pull request (not relevant for this one): please use scripts/get_maintainer.pl to add the appropriate maintainer to CC:. Feel free to cc me on any Solaris or SPARC- related patch even if I'm not listed by scripts/get_maintainer.pl.

gco · 2017-05-12T18:48:50Z

The XIR problem showed up with qemu's CLI system_reset command. XIR in the "dumb" reset (reset.bin) ends up causing a "guest watchdog reset" instead of restarting the hypervisor. However, the multi-strand synchronization code in reset.bin has a bug since it never expected to be re-entered after the first reset. It requires a small change in order to handle a second reset properly.

The source code for q.bin/reset.bin/openboot.bin/etc was distributed as part of the OpenSPARC sources. The q.bin in the S10 directory, however, does not appear to match any of the variants built in the hypervisor source directory.

Running postcopy-test with ASAN produces the following error: QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test ... ================================================================= ==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0 READ of size 8 at 0x7f1556600000 thread T6 #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528 #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665 #2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044 #3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976 #4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9) #5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e) 0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000) allocated by thread T0 here: #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980) #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106 #2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122 #3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214 #4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261 #5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697 #6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679 #7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400) Thread T6 created by T0 here: #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488) #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465 #2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096 #3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500 #4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87 #5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142 #6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88 #7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6) SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass index seems to be wrongly incremented, unless I miss something that would be worth a comment. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH in BDRV_POLL_WHILE. As a side effect this fixes a hang in block_job_detach_aio_context during system_reset when a block job is ready: #0 0x0000555555aa79f3 in bdrv_drain_recurse #1 0x0000555555aa825d in bdrv_drained_begin #2 0x0000555555aa8449 in bdrv_drain #3 0x0000555555a9c356 in blk_drain #4 0x0000555555aa3cfd in mirror_drain #5 0x0000555555a66e11 in block_job_detach_aio_context #6 0x0000555555a62f4d in bdrv_detach_aio_context #7 0x0000555555a63116 in bdrv_set_aio_context #8 0x0000555555a9d326 in blk_set_aio_context #9 0x00005555557e38da in virtio_blk_data_plane_stop #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd #13 0x00005555559f6a18 in virtio_pci_reset #14 0x00005555559139a9 in qdev_reset_one #15 0x0000555555916738 in qbus_walk_children #16 0x0000555555913318 in qdev_walk_children #17 0x0000555555916738 in qbus_walk_children #18 0x00005555559168ca in qemu_devices_reset #19 0x000055555581fcbb in pc_machine_reset #20 0x00005555558a4d96 in qemu_system_reset #21 0x000055555577157a in main_loop_should_exit #22 0x000055555577157a in main_loop #23 0x000055555577157a in main The rationale is that the loop in block_job_detach_aio_context cannot make any progress in pausing/completing the job, because bs->in_flight is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop BH. With this patch, it does. Reported-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-3-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>

ASAN detects an "unknown-crash" when running pxe-test: /ppc64/pxe/spapr-vlan: ================================================================= ==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0 READ of size 128 at 0x7f6dcd298d30 thread T2 #0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73 #1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289 #2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446 #3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82 #4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67 Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame #0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13 This frame has 3 object(s): [32, 48) '<unknown>' [96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable [160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable The sockaddr_storage pointer is the sockaddr_in6 lhost on the stack. Copy only the source addr size. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>

artyom-tarasenko · 2017-05-15T09:53:21Z

Hmm, system_reset is an equivalent of pressing a reset button. Isn't it XIR?

gco · 2017-05-15T10:23:37Z

On Mon, May 15, 2017 at 2:53 AM, Artyom Tarasenko ***@***.***> wrote: Hmm, system_reset is an equivalent of pressing a reset button. Isn't it XIR?

No, XIR is a separate debugging feature.

artyom-tarasenko · 2017-05-15T11:34:33Z

Is there a way to distinguish POR caused by Power-on and POR caused by pressing a reset button?

If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count returns 0, cause bitmap_new call abort. The abort can be triggered as follows: $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G (gdb) bt #0 0x00007ffff04e25f7 in raise () from /lib64/libc.so.6 #1 0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6 #2 0x00005555559de905 in bitmap_new (nbits=<optimized out>) at /home/root/git/qemu2.git/include/qemu/bitmap.h:96 #3 cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399 #4 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341 #5 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341 #6 0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560, type=type@entry=0x55555656bd30) at qom/object.c:376 #7 0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484 #8 0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu") at qom/object.c:494 #9 0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0, errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101 #10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc.c:1184 #11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121 #12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

gco added 4 commits May 11, 2017 18:40

Warm reset != XIR reset

04558cf

Raise EXCP_DEBUG in gen_goto_tb so single-step works on SPARC

785a8ef

Single-stepping over wrhpr %pstate skips following instruction

92f32d2

The debugger would not be entered for the instruction following the wrhpr %pstate, but rather the instruction after _that_.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARC target fixes#4

SPARC target fixes#4
gco wants to merge 4 commits into
artyom-tarasenko:masterfrom
gco:sparcfixes

gco commented May 12, 2017

Uh oh!

artyom-tarasenko commented May 12, 2017

Uh oh!

gco commented May 12, 2017 •

edited

Loading

Uh oh!

artyom-tarasenko commented May 15, 2017

Uh oh!

gco commented May 15, 2017 via email

Uh oh!

artyom-tarasenko commented May 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gco commented May 12, 2017

Uh oh!

artyom-tarasenko commented May 12, 2017

Uh oh!

gco commented May 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artyom-tarasenko commented May 15, 2017

Uh oh!

gco commented May 15, 2017 via email

Uh oh!

artyom-tarasenko commented May 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gco commented May 12, 2017 •

edited

Loading