aboutsummaryrefslogtreecommitdiff
path: root/virt/kvm
AgeCommit message (Collapse)Author
2016-05-20KVM: arm/arm64: vgic-new: Add GICv3 MMIO handling frameworkAndre Przywara
Create a new file called vgic-mmio-v3.c and describe the GICv3 distributor and redistributor registers there. This adds a special macro to deal with the split of SGI/PPI in the redistributor and SPIs in the distributor, which allows us to reuse the existing GICv2 handlers for those registers which are compatible. Also we provide a function to deal with the registration of the two separate redistributor frames per VCPU. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add SGIPENDR register handlersAndre Przywara
As this register is v2 specific, its implementation lives entirely in vgic-mmio-v2.c. This register allows setting the source mask of an IPI. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add SGIR register handlerAndre Przywara
Triggering an IPI via this register is v2 specific, so the implementation lives entirely in vgic-mmio-v2.c. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add TARGET registers handlersAndre Przywara
The target register handlers are v2 emulation specific, so their implementation lives entirely in vgic-mmio-v2.c. We copy the old VGIC behaviour of assigning an IRQ to the first VCPU set in the target mask instead of making it possibly pending on multiple VCPUs. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add CONFIG registers handlersAndre Przywara
The config register handlers are shared between the v2 and v3 emulation, so their implementation goes into vgic-mmio.c, to be easily referenced from the v3 emulation as well later. Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Add PRIORITY registers handlersAndre Przywara
The priority register handlers are shared between the v2 and v3 emulation, so their implementation goes into vgic-mmio.c, to be easily referenced from the v3 emulation as well later. There is a corner case when we change the priority of a pending interrupt which we don't handle at the moment. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add ACTIVE registers handlersAndre Przywara
The active register handlers are shared between the v2 and v3 emulation, so their implementation goes into vgic-mmio.c, to be easily referenced from the v3 emulation as well later. Since activation/deactivation of an interrupt may happen entirely in the guest without it ever exiting, we need some extra logic to properly track the active state. For clearing the active state, we basically have to halt the guest to make sure this is properly propagated into the respective VCPUs. Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Add PENDING registers handlersAndre Przywara
The pending register handlers are shared between the v2 and v3 emulation, so their implementation goes into vgic-mmio.c, to be easily referenced from the v3 emulation as well later. For level triggered interrupts the real line level is unaffected by this write, so we keep this state separate and combine it with the device's level to get the actual pending state. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add ENABLE registers handlersAndre Przywara
As the enable register handlers are shared between the v2 and v3 emulation, their implementation goes into vgic-mmio.c, to be easily referenced from the v3 emulation as well later. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add CTLR, TYPER and IIDR handlersMarc Zyngier
Those three registers are v2 emulation specific, so their implementation lives entirely in vgic-mmio-v2.c. Also they are handled in one function, as their implementation is pretty simple. When the guest enables the distributor, we kick all VCPUs to get potentially pending interrupts serviced. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add GICv2 MMIO handling frameworkAndre Przywara
Create vgic-mmio-v2.c to describe GICv2 emulation specific handlers using the initializer macros provided by the VGIC MMIO framework. Provide a function to register the GICv2 distributor registers to the kvm_io_bus framework. The actual handler functions are still stubs in this patch. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add MMIO handling frameworkMarc Zyngier
Add an MMIO handling framework to the VGIC emulation: Each register is described by its offset, size (or number of bits per IRQ, if applicable) and the read/write handler functions. We provide initialization macros to describe each GIC register later easily. Separate dispatch functions for read and write accesses are connected to the kvm_io_bus framework and binary-search for the responsible register handler based on the offset address within the region. We convert the incoming data (referenced by a pointer) to the host's endianess and use pass-by-value to hand the data over to the actual handler functions. The register handler prototype and the endianess conversion are courtesy of Christoffer Dall. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Implement kvm_vgic_vcpu_pending_irqEric Auger
Tell KVM whether a particular VCPU has an IRQ that needs handling in the guest. This is used to decide whether a VCPU is runnable. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Add GICv3 world switch backendMarc Zyngier
As the GICv3 virtual interface registers differ from their GICv2 siblings, we need different handlers for processing maintenance interrupts and reading/writing to the LRs. Implement the respective handler functions and connect them to existing code to be called if the host is using a GICv3. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add GICv2 world switch backendMarc Zyngier
Processing maintenance interrupts and accessing the list registers are dependent on the host's GIC version. Introduce vgic-v2.c to contain GICv2 specific functions. Implement the GICv2 specific code for syncing the emulation state into the VGIC registers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add IRQ sync/flush frameworkMarc Zyngier
Implement the framework for syncing IRQs between our emulation and the list registers, which represent the guest's view of IRQs. This is done in kvm_vgic_flush_hwstate and kvm_vgic_sync_hwstate, which gets called on guest entry and exit. The code talking to the actual GICv2/v3 hardware is added in the following patches. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add IRQ sortingChristoffer Dall
Adds the sorting function to cover the case where you have more IRQs to consider than you have LRs. We now consider priorities. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Implement virtual IRQ injectionChristoffer Dall
Provide a vgic_queue_irq_unlock() function which decides whether a given IRQ needs to be queued to a VCPU's ap_list. This should be called whenever an IRQ becomes pending or enabled, either as a result of userspace injection, from in-kernel emulated devices like the architected timer or from MMIO accesses to the distributor emulation. Also provides the necessary functions to allow userland to inject an IRQ to a guest. Since this is the first code that starts using our locking mechanism, we add some (hopefully) clear documentation of our locking strategy and requirements along with this patch. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Add acccessor to new struct vgic_irq instanceChristoffer Dall
The new VGIC implementation centers around a struct vgic_irq instance per virtual IRQ. Provide a function to retrieve the right instance for a given IRQ number and (in case of private interrupts) the right VCPU. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: move GICv2 emulation defines into arm-gic-v3.hAndre Przywara
As (some) GICv3 hosts can emulate a GICv2, some GICv2 specific masks for the list register definition also apply to GICv3 LRs. At the moment we have those definitions in the KVM VGICv3 implementation, so let's move them into the GICv3 header file to have them automatically defined. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: pmu: abstract access to number of SPIsAndre Przywara
Currently the PMU uses a member of the struct vgic_dist directly, which not only breaks abstraction, but will fail with the new VGIC. Abstract this access in the VGIC header file and refactor the validity check in the PMU code. Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Fix MMIO emulation data handlingChristoffer Dall
When the kernel was handling a guest MMIO read access internally, we need to copy the emulation result into the run->mmio structure in order for the kvm_handle_mmio_return() function to pick it up and inject the result back into the guest. Currently the only user of kvm_io_bus for ARM is the VGIC, which did this copying itself, so this was not causing issues so far. But with the upcoming new vgic implementation we need this done properly. Update the kvm_handle_mmio_return description and cleanup the code to only perform a single copying when needed. Code and commit message inspired by Andre Przywara. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Get rid of vgic_cpu->nr_lrChristoffer Dall
The number of list registers is a property of the underlying system, not of emulated VGIC CPU interface. As we are about to move this variable to global state in the new vgic for clarity, move it from the legacy implementation as well to make the merge of the new code easier. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Move timer IRQ map to latest possible timeChristoffer Dall
We are about to modify the VGIC to allocate all data structures dynamically and store mapped IRQ information on a per-IRQ struct, which is indeed allocated dynamically at init time. Therefore, we cannot record the mapped IRQ info from the timer at timer reset time like it's done now, because VCPU reset happens before timer init. A possible later time to do this is on the first run of a per VCPU, it just requires us to move the enable state to be a per-VCPU state and do the lookup of the physical IRQ number when we are about to run the VCPU. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: vgic: Remove irq_phys_map from interfaceAndre Przywara
Now that the virtual arch timer does not care about the irq_phys_map anymore, let's rework kvm_vgic_map_phys_irq() to return an error value instead. Any reference to that mapping can later be done by passing the correct combination of VCPU and virtual IRQ number. This makes the irq_phys_map handling completely private to the VGIC code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: arch_timer: Remove irq_phys_mapAndre Przywara
Now that the interface between the arch timer and the VGIC does not require passing the irq_phys_map entry pointer anymore, let's remove it from the virtual arch timer and use the virtual IRQ number instead directly. The remaining pointer returned by kvm_vgic_map_phys_irq() will be removed in the following patch. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: Remove the IRQ field from struct irq_phys_mapChristoffer Dall
The communication of a Linux IRQ number from outside the VGIC to the vgic was a leftover from the day when the vgic code cared about how a particular device injects virtual interrupts mapped to a physical interrupt. We can safely remove this notion, leaving all physical IRQ handling to be done in the device driver (the arch timer in this case), which makes room for a saner API for the new VGIC. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org>
2016-05-20KVM: arm/arm64: vgic: avoid map in kvm_vgic_unmap_phys_irq()Andre Przywara
kvm_vgic_unmap_phys_irq() only needs the virtual IRQ number, so let's just pass that between the arch timer and the VGIC to get rid of the irq_phys_map pointer. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic: avoid map in kvm_vgic_map_is_active()Andre Przywara
For getting the active state of a mapped IRQ, we actually only need the virtual IRQ number, not the pointer to the mapping entry. Pass the virtual IRQ number from the arch timer to the VGIC directly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic: avoid map in kvm_vgic_inject_mapped_irq()Andre Przywara
When we want to inject a hardware mapped IRQ into a guest, we actually only need the virtual IRQ number from the irq_phys_map. So let's pass this number directly from the arch timer to the VGIC to avoid using the map as a parameter. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic: streamline vgic_update_irq_pending() interfaceAndre Przywara
We actually don't use the irq_phys_map parameter in vgic_update_irq_pending(), so let's just remove it. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware tablesJulien Grall
Currently, the firmware tables are parsed 2 times: once in the GIC drivers, the other time when initializing the vGIC. It means code duplication and make more tedious to add the support for another firmware table (like ACPI). Use the recently introduced helper gic_get_kvm_info() to get information about the virtual GIC. With this change, the virtual GIC becomes agnostic to the firmware table and KVM will be able to initialize the vGIC on ACPI. Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03KVM: arm/arm64: arch_timer: Rely on the arch timer to parse the firmware tablesJulien Grall
The firmware table is currently parsed by the virtual timer code in order to retrieve the virtual timer interrupt. However, this is already done by the arch timer driver. To avoid code duplication, use the newly function arch_timer_get_kvm_info() which return all the information required by the virtual timer code. Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-04-06KVM: arm/arm64: Handle forward time correction gracefullyMarc Zyngier
On a host that runs NTP, corrections can have a direct impact on the background timer that we program on the behalf of a vcpu. In particular, NTP performing a forward correction will result in a timer expiring sooner than expected from a guest point of view. Not a big deal, we kick the vcpu anyway. But on wake-up, the vcpu thread is going to perform a check to find out whether or not it should block. And at that point, the timer check is going to say "timer has not expired yet, go back to sleep". This results in the timer event being lost forever. There are multiple ways to handle this. One would be record that the timer has expired and let kvm_cpu_has_pending_timer return true in that case, but that would be fairly invasive. Another is to check for the "short sleep" condition in the hrtimer callback, and restart the timer for the remaining time when the condition is detected. This patch implements the latter, with a bit of refactoring in order to avoid too much code duplication. Cc: <stable@vger.kernel.org> Reported-by: Alexander Graf <agraf@suse.de> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-04-01arm64: KVM: Add braces to multi-line if statement in virtual PMU codeWill Deacon
The kernel is written in C, not python, so we need braces around multi-line if statements. GCC 6 actually warns about this, thanks to the fantastic new "-Wmisleading-indentation" flag: | virt/kvm/arm/pmu.c: In function ‘kvm_pmu_overflow_status’: | virt/kvm/arm/pmu.c:198:3: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation] | reg &= vcpu_sys_reg(vcpu, PMCNTENSET_EL0); | ^~~ | arch/arm64/kvm/../../../virt/kvm/arm/pmu.c:196:2: note: ...this ‘if’ clause, but it is not | if ((vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) | ^~ As it turns out, this particular case is harmless (we just do some &= operations with 0), but worth fixing nonetheless. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-22KVM: Replace smp_mb() with smp_load_acquire() in the kvm_flush_remote_tlbs()Lan Tianyu
smp_load_acquire() is enough here and it's cheaper than smp_mb(). Adding a comment about reusing memory barrier of kvm_make_all_cpus_request() here to keep order between modifications to the page tables and reading mode. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: Replace smp_mb() with smp_mb_after_atomic() in the ↵Lan Tianyu
kvm_make_all_cpus_request() Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: fix spin_lock_init order on x86Paolo Bonzini
Moving the initialization earlier is needed in 4.6 because kvm_arch_init_vm is now using mmu_lock, causing lockdep to complain: [ 284.440294] INFO: trying to register non-static key. [ 284.445259] the code is fine but needs lockdep annotation. [ 284.450736] turning off the locking correctness validator. ... [ 284.528318] [<ffffffff810aecc3>] lock_acquire+0xd3/0x240 [ 284.533733] [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm] [ 284.541467] [<ffffffff81715581>] _raw_spin_lock+0x41/0x80 [ 284.546960] [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm] [ 284.554707] [<ffffffffa0305aa0>] kvm_page_track_register_notifier+0x20/0x60 [kvm] [ 284.562281] [<ffffffffa02ece70>] kvm_mmu_init_vm+0x20/0x30 [kvm] [ 284.568381] [<ffffffffa02dbf7a>] kvm_arch_init_vm+0x1ea/0x200 [kvm] [ 284.574740] [<ffffffffa02bff3f>] kvm_dev_ioctl+0xbf/0x4d0 [kvm] However, it also helps fixing a preexisting problem, which is why this patch is also good for stable kernels: kvm_create_vm was incrementing current->mm->mm_count but not decrementing it at the out_err label (in case kvm_init_mmu_notifier failed). The new initialization order makes it possible to add the required mmdrop without adding a new error label. Cc: stable@vger.kernel.org Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-20Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
2016-03-14Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - Make schedstats a runtime tunable (disabled by default) and optimize it via static keys. As most distributions enable CONFIG_SCHEDSTATS=y due to its instrumentation value, this is a nice performance enhancement. (Mel Gorman) - Implement 'simple waitqueues' (swait): these are just pure waitqueues without any of the more complex features of full-blown waitqueues (callbacks, wake flags, wake keys, etc.). Simple waitqueues have less memory overhead and are faster. Use simple waitqueues in the RCU code (in 4 different places) and for handling KVM vCPU wakeups. (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker, Marcelo Tosatti) - sched/numa enhancements (Rik van Riel) - NOHZ performance enhancements (Rik van Riel) - Various sched/deadline enhancements (Steven Rostedt) - Various fixes (Peter Zijlstra) - ... and a number of other fixes, cleanups and smaller enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) sched/cputime: Fix steal_account_process_tick() to always return jiffies sched/deadline: Remove dl_new from struct sched_dl_entity Revert "kbuild: Add option to turn incompatible pointer check into error" sched/deadline: Remove superfluous call to switched_to_dl() sched/debug: Fix preempt_disable_ip recording for preempt_disable() sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity time, acct: Drop irq save & restore from __acct_update_integrals() acct, time: Change indentation in __acct_update_integrals() sched, time: Remove non-power-of-two divides from __acct_update_integrals() sched/rt: Kick RT bandwidth timer immediately on start up sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug sched/debug: Move sched_domain_sysctl to debug.c sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c sched/rt: Fix PI handling vs. sched_setscheduler() sched/core: Remove duplicated sched_group_set_shares() prototype sched/fair: Consolidate nohz CPU load update code sched/fair: Avoid using decay_load_missed() with a negative value sched/deadline: Always calculate end of period on sched_yield() sched/cgroup: Fix cgroup entity load tracking tear-down rcu: Use simple wait queues where possible in rcutree ...
2016-03-09kvm: cap halt polling at exactly halt_poll_nsDavid Matlack
When growing halt-polling, there is no check that the poll time exceeds the limit. It's possible for vcpu->halt_poll_ns grow once past halt_poll_ns, and stay there until a halt which takes longer than vcpu->halt_poll_ns. For example, booting a Linux guest with halt_poll_ns=11000: ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000) ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0) ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000) Signed-off-by: David Matlack <dmatlack@google.com> Fixes: aca6ff29c4063a8d467cdee241e6b3bf7dc4a171 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-09Merge tag 'kvm-arm-for-4.6' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.6 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - Various optimizations to the vgic save/restore code Conflicts: include/uapi/linux/kvm.h
2016-03-09arm64: KVM: vgic-v3: Reset LRs at boot timeMarc Zyngier
In order to let the GICv3 code be more lazy in the way it accesses the LRs, it is necessary to start with a clean slate. Let's reset the LRs on each CPU when the vgic is probed (which includes a round trip to EL2...). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-03-09arm64: KVM: vgic-v3: Avoid accessing ICH registersMarc Zyngier
Just like on GICv2, we're a bit hammer-happy with GICv3, and access them more often than we should. Adopt a policy similar to what we do for GICv2, only save/restoring the minimal set of registers. As we don't access the registers linearly anymore (we may skip some), the convoluted accessors become slightly simpler, and we can drop the ugly indexing macro that tended to confuse the reviewers. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-03-09KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hitMarc Zyngier
The GICD_SGIR register lives a long way from the beginning of the handler array, which is searched linearly. As this is hit pretty often, let's move it up. This saves us some precious cycles when the guest is generating IPIs. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-03-09KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exitMarc Zyngier
So far, we're always writing all possible LRs, setting the empty ones with a zero value. This is obvious doing a lot of work for nothing, and we're better off clearing those we've actually dirtied on the exit path (it is very rare to inject more than one interrupt at a time anyway). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-03-09KVM: arm/arm64: vgic-v2: Reset LRs at boot timeMarc Zyngier
In order to let make the GICv2 code more lazy in the way it accesses the LRs, it is necessary to start with a clean slate. Let's reset the LRs on each CPU when the vgic is probed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-03-09KVM: arm/arm64: vgic-v2: Do not save an LR known to be emptyMarc Zyngier
On exit, any empty LR will be signaled in GICH_ELRSR*. Which means that we do not have to save it, and we can just clear its state in the in-memory copy. Take this opportunity to move the LR saving code into its own function. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-03-09KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own functionMarc Zyngier
In order to make the saving path slightly more readable and prepare for some more optimizations, let's move the GICH_ELRSR saving to its own function. No functional change. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>